Insurance IntelligenceMay 31, 2026

The 150× call: what one stuck LLM cost us, and why most brokers wouldn't notice

Vaibhav ChopraContributing Writer

The 150× call: what one stuck LLM cost us, and why most brokers wouldn't notice

Key Insights

4 min read

A single degenerate LLM call on a fleet motor policy cost Vaatun 150× a healthy call — the model latched onto two fields and emitted them in an infinite loop for ~64,000 tokens before the provider truncated.

Structured-output degeneracy hits roughly 3% of calls on this subtask. At broker scale that is a six-figure annual burn; at insurer scale the same bug becomes a seven- or eight-figure problem.

Vaatun caught it via routine cost-per-trace dashboards, not a 3 AM page — signal to fix took under an hour.

The permanent fix is three reinforcing layers: schema upper-bounds on list fields, mandatory per-call output-token budgets enforced by middleware, and per-subtask trace observability.

If your AI insurance platform can't tell you its degeneracy rate, 99th-percentile output tokens, or provider fallback chain, you have 150× calls happening right now and don't know it.

A healthy AI extraction of an Indian insurance policy costs one unit. Our most expensive single call in the last 30 days cost 150 of those units — same model, same kind of document.

The cause wasn't a billing error. The cause was a stuck model.

What happened

When Vaatun extracts data from a policy PDF, ten specialised AI subtasks run in parallel — one identifies the insurer, one finds the client, one decodes the coverage structure and instalment schedule. Strict JSON schemas, focused calls.

One of them — the one that breaks a policy into its commercial sections, components and instalments — went degenerate on a fleet motor policy. The model latched onto two fields, CORE_COVERAGES and STAMP_DUTY, and emitted them in an infinite loop:

{ "name": "CORE_COVERAGES", "sumInsured": 200000000 },
{ "name": "STAMP_DUTY", "sumInsured": 0 },
...repeated 1,055 times, ~64,000 tokens, until the provider truncated it.

A known failure mode of language models on structured output. Rare — roughly 3% of calls on this one subtask — but when it happens, you pay for every token. A single degenerate call costs 150× a healthy one.

The arithmetic of one mistake at scale

Don't fixate on what a single call costs us. Look at the slope.

Assume — purely illustratively — that one degenerate runaway costs ₹30. Now apply that to the 3% degeneracy rate at different organisation sizes:

Operation	Volume / month	Runaways / month	Annual burn at ₹30 each
Mid broker	~6K entries	180	₹65K
Large national broker	~50K entries	1,500	₹5.4L
Direct insurer	~500K entries	15,000	₹54L

The absolute rupee figure is a placeholder. The slope isn't. A six-figure problem at broker scale becomes a seven- or eight-figure problem at insurer scale, from the same underlying bug.

And this is one bug, in one of ten subtasks, in one of dozens of AI workflows a modern insurance operation runs. The same failure class can hit claims triage, fraud screening, customer drafts, quotations — anywhere a language model writes text.

Most teams running production AI don't audit token distributions per subtask. They look at the monthly invoice. By then the money is gone.

What Vaatun caught, and how

We didn't get paged at 3 AM. We noticed because cost-per-trace is one of the routine dashboards we look at, and a number that's supposed to be flat developed a fat tail. From signal to fix took under an hour.

The fix is now permanent infrastructure. Three layers of it.

Schema guardrails. Every list field in every extraction schema now declares an upper bound. The model can't emit 1,055 of the same record because the schema rejects the 31st.

Per-call output budgets. A middleware sits in front of every language-model call in the platform. No maxOutputTokens? The call doesn't go out. Every call site decides its worst-case cost in code review. The runaway is mechanically impossible.

Trace observability. Every AI call costed, tagged by subtask, indexed by trace. We can answer "most expensive call last month and what document caused it" in 30 seconds.

Three other layers — provider cascading, capability tiers, per-subtask Pareto evals — make the rest of the system safe to evolve. Switching a subtask to a cheaper model is a config change that runs evals on hundreds of real fixture documents before shipping.

What you're actually buying

When a broker buys an AI-powered platform, they think they're buying "the AI." They're actually buying someone whose full-time job is keeping the AI honest.

Models drift. Providers change pricing. New failure modes appear. Schemas get harder to satisfy as the policy industry adds new products. A competently-built platform absorbs all of that quietly. An incompetently-built one bleeds money quietly.

This 150× call is one example, in one subtask, in one workflow. Behind it: the bulk policy importer absorbing inconsistent broker export formats, the consolidated report streaming 72,000 policies into XLSX without OOMing, the bulk-import validating live against the schema, the claims escalation engine, the multi-tenant access model enforcing isolation on every query.

You don't see any of it on a typical Tuesday. That's the point.

The takeaway

If you're running AI in insurance production today and you can't tell us:

The cost ratio of your most expensive single call to your median call last 30 days
The 99th-percentile output tokens per subtask
The degeneracy rate on each structured-output endpoint
The fallback chain when the primary provider returns 429

…you have 150× calls happening right now and you don't know it.

We do. That's what running a serious platform looks like in 2026.

Vaatun runs policies, claims, finance and AI-assisted data entry for Indian insurance brokers and insurers — built by a team that spends as much time on the parts you don't see as on the parts you do. vaibhav@vaatun.com

Experience Vaatun in Action

See how Vaatun's platform can streamline your insurance operations. Book a free demo with our team today.

Book a Free Demo