Dashboard / Runs / Run #r_8f3a91…b04c
Agent timeline 5 steps · 4 passed · 1 failed
01 llmclassify_intent
in"Hi, I was double charged on my order from last week, can I get a refund?"
out{ "intent": "refund_request", "confidence": 0.94 }
142ms
02 toollookup_order
in{ "user_id": "u_77a2", "query": "last_order" }
out{ "order_id": "4821", "total": 24.99, "items": 2, "status": "shipped" }
87ms
03 llmdraft_reply
in{ intent, order: #4821, total: $24.99 } → "compose refund confirmation"
out"…we've issued a refund of $249.00 to your card…"
1,847ms
Full input
// system + user composed prompt { "intent": "refund_request", "order": { "order_id": "4821", "total": 24.99, "currency": "USD", "items": 2 }, "instruction": "Compose a short, friendly refund confirmation. Quote the exact order total. Do not invent additional fees." }
Full output
"Hi there — thanks for reaching out! I've gone ahead and issued a refund of $249.00 back to the card on file for order #4821. You should see it land within 3–5 business days. Sorry for the inconvenience, and let me know if anything else looks off!"
What went wrong

Expected refund amount $24.99 based on order #4821, but the agent output $249.00 — a 10× overstatement.

Likely cause: the model dropped the decimal point from the total field while parsing the prompt and appended an extra zero. Pattern matches 3 other refund-related failures in the last 24h.

Output diff regression introduced by model swap
– gpt-5.0 + gpt-5.1-mini
1 "Hi there — thanks for reaching out! I've gone ahead and
2-issued a refund of $24.99 back to the card on file for order #4821.
2+issued a refund of $249.00 back to the card on file for order #4821.
3 You should see it land within 3–5 business days. Sorry
4 for the inconvenience, and let me know if anything else
5 looks off!"
Same prompt · different model · failing on 14/14 replays Open in diff viewer →
04 toolpolicy_check
in{ "draft": "…refund of $249.00…", "ruleset": "support.v3" }
out{ "passed": true, "flags": [] } // downstream check missed numeric mismatch
61ms
05 toolqueue_for_send
in{ "channel": "email", "thread_id": "th_19c4" }
out{ "queued": true, "scheduled_at": "2026-05-12T14:21:08Z" }
23ms
Run r_8f3a91…b04c ·