Structured Output from LLMs — Parsing, Validation, and Recovery

April 6, 20268 min read

aillmtypescriptzodreliability

You ask the LLM for JSON. You get this back:

Sure! Here's the product data you requested:

```json
{
  "name": "Wireless Headphones",
  "price": 79.99,
  "features": ["noise cancelling", "bluetooth 5.3",],
  "inStock": true,
}

Hope that helps! Let me know if you need anything else.


Trailing commas. Markdown fences. A cheerful explanation nobody asked for. Your `JSON.parse()` throws, your pipeline crashes, your user sees a 500. This happens 10-20% of the time depending on the model and the prompt.

I've spent the last year building LLM-powered features (including a [multi-model website summarizer](/blog/building-website-summarizer-fastapi-nextjs-vercel-ai-sdk)) and I've landed on a layered approach that handles every failure mode I've encountered.

## The Naive Approach

Everyone writes this first:

```typescript
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Return JSON with name and age for a random person." }],
});
const data = JSON.parse(response.choices[0].message.content!);

This works in demos. It breaks in production because models regularly produce: markdown fences wrapping the JSON, trailing commas ({"a": 1,}), explanation text before or after the JSON, single quotes instead of double quotes, inline comments, and truncated responses that cut off mid-object.

Level 1: Robust Parsing

Build a utility that handles common quirks before reaching for heavier tools:

function extractJSON<T = unknown>(raw: string): T {
  try { return JSON.parse(raw); } catch { /* continue */ }
 
  // Extract from markdown code fences
  const fenceMatch = raw.match(/```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/);
  if (fenceMatch) {
    try { return JSON.parse(fenceMatch[1]); } catch { raw = fenceMatch[1]; }
  }
 
  // Find first { or [ and match its closing bracket
  const startObj = raw.indexOf("{");
  const startArr = raw.indexOf("[");
  let start = startObj === -1 ? startArr : startArr === -1 ? startObj : Math.min(startObj, startArr);
  if (start === -1) throw new Error("No JSON structure found in response");
 
  let depth = 0, end = -1;
  for (let i = start; i < raw.length; i++) {
    if (raw[i] === "{" || raw[i] === "[") depth++;
    if (raw[i] === "}" || raw[i] === "]") depth--;
    if (depth === 0) { end = i; break; }
  }
  if (end === -1) throw new Error("Unterminated JSON — likely truncated response");
 
  let jsonStr = raw.slice(start, end + 1);
  jsonStr = jsonStr.replace(/,\s*([}\]])/g, "$1"); // trailing commas
  jsonStr = jsonStr.replace(/\/\/.*$/gm, "");       // line comments
 
  return JSON.parse(jsonStr);
}

This handles ~90% of failures. Zero dependencies, fast. But it gives you a parsed object with no shape guarantees — the model might return { "nombre": "Juan" } and your code destructures undefined.

Level 2: Schema Validation with Zod

Define what you want, validate what you get:

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
 
const ProductSchema = z.object({
  name: z.string().min(1),
  price: z.number().positive(),
  currency: z.enum(["USD", "EUR", "GBP"]),
  features: z.array(z.string()).min(1),
  inStock: z.boolean(),
});
 
type Product = z.infer<typeof ProductSchema>;

Now tie together prompting, parsing, validation, and retry into one function:

async function generateStructuredData<T extends z.ZodType>(
  client: OpenAI,
  schema: T,
  prompt: string,
  maxRetries = 2
): Promise<z.infer<T>> {
  const schemaDesc = JSON.stringify(zodToJsonSchema(schema), null, 2);
  const systemPrompt = `Return ONLY valid JSON matching this schema — no explanation, no markdown:\n\n${schemaDesc}`;
  let lastError: Error | null = null;
 
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const messages: OpenAI.ChatCompletionMessageParam[] = [
      { role: "system", content: systemPrompt },
      { role: "user", content: prompt },
    ];
    if (lastError && attempt > 0) {
      messages.push({
        role: "user",
        content: `Your previous response was invalid: ${lastError.message}. Try again.`,
      });
    }
 
    const response = await client.chat.completions.create({ model: "gpt-4o", messages, temperature: 0 });
    try {
      return schema.parse(extractJSON(response.choices[0].message.content!));
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error));
    }
  }
  throw new Error(`Structured output failed after ${maxRetries + 1} attempts: ${lastError?.message}`);
}

The key insight: include the schema in the prompt, then validate against that same schema. If it fails, feed the Zod error back to the model. Models self-correct on the first retry about 90% of the time — "price: Expected number, received string" is all they need.

This gets you to ~98% reliability. But we can do better.

Level 3: Provider-Native Structured Output

Every major provider now constrains token generation at decode time — the model literally cannot produce invalid JSON.

OpenAI — JSON Schema mode:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: productDescription }],
  response_format: {
    type: "json_schema",
    json_schema: { name: "product", strict: true, schema: zodToJsonSchema(ProductSchema) },
  },
});
const product = ProductSchema.parse(JSON.parse(response.choices[0].message.content!));

Anthropic — tool use as structured output:

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  tool_choice: { type: "tool", name: "extract_product" },
  tools: [{
    name: "extract_product",
    description: "Extract structured product data",
    input_schema: zodToJsonSchema(ProductSchema),
  }],
  messages: [{ role: "user", content: productDescription }],
});
const product = ProductSchema.parse(response.content.find((b) => b.type === "tool_use")!.input);

Vercel AI SDK — the cleanest option:

import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
 
const { object: product } = await generateObject({
  model: openai("gpt-4o"),
  schema: ProductSchema,
  prompt: `Extract product data from: ${productDescription}`,
});

The SDK picks the best strategy per provider (JSON Schema for OpenAI, tool use for Anthropic) and handles parsing internally. I covered this in my summarizer post.

Approach	Success Rate	Latency Overhead
Raw `JSON.parse`	~80-85%	None
`extractJSON` + Zod	~95-98%	Negligible
OpenAI JSON Schema mode	~99.9%	~100ms first call
Anthropic tool use	~99.5%	Minimal
Vercel AI SDK `generateObject`	~99.5-99.9%	Minimal

Recovery Strategies

Even native structured output fails — network timeouts, rate limits, model refusals. Build defense in depth.

Retry with error feedback (already shown above in generateStructuredData). Feed the exact Zod error back to the model.

Partial recovery — get something rather than throw:

function parseWithDefaults<T extends z.ZodType>(raw: unknown, schema: T, defaults: z.infer<T>): z.infer<T> {
  const result = schema.safeParse(raw);
  if (result.success) return result.data;
  if (typeof raw === "object" && raw !== null) return schema.parse({ ...defaults, ...raw });
  return defaults;
}

Fallback chains — try native JSON mode, fall back to tool use, fall back to raw parse:

async function withFallback<T extends z.ZodType>(prompt: string, schema: T): Promise<z.infer<T>> {
  try {
    return (await generateObject({ model: openai("gpt-4o"), schema, prompt })).object;
  } catch { console.warn("Native mode failed, trying tool use"); }
 
  try {
    const resp = await anthropic.messages.create({
      model: "claude-sonnet-4-20250514", max_tokens: 1024,
      tool_choice: { type: "tool", name: "extract" },
      tools: [{ name: "extract", description: "Extract data", input_schema: zodToJsonSchema(schema) }],
      messages: [{ role: "user", content: prompt }],
    });
    return schema.parse(resp.content.find((b) => b.type === "tool_use")!.input);
  } catch { console.warn("Tool use failed, trying raw parse"); }
 
  return generateStructuredData(new OpenAI(), schema, prompt);
}

Three providers, three failure modes. The chance of all three failing on the same request is effectively zero.

Real-World Example: Product Data Pipeline

Putting it all together — extracting product data from unstructured text:

const ProductListingSchema = z.object({
  products: z.array(z.object({
    name: z.string().describe("Product name"),
    price: z.number().positive().describe("Price in USD"),
    description: z.string().max(200).describe("Brief description"),
    specs: z.record(z.string()).describe("Key-value specifications"),
    availability: z.enum(["in_stock", "out_of_stock", "pre_order"]),
  })),
  source: z.string().describe("Source URL"),
  extractedAt: z.string().datetime(),
});
 
async function extractProducts(rawText: string, sourceUrl: string) {
  const { object } = await generateObject({
    model: openai("gpt-4o"),
    schema: ProductListingSchema,
    prompt: `Extract all products from this text.\nSource: ${sourceUrl}\nTimestamp: ${new Date().toISOString()}\n\n${rawText}`,
    temperature: 0,
  });
  return object;
}

The .describe() calls on each field end up in the JSON Schema sent to the model — inline prompt engineering for every field.

Testing Structured Output

You can't call the LLM on every test run. Mock the responses, test the parsing and validation layers:

import { describe, it, expect } from "vitest";
import fc from "fast-check";
 
describe("extractJSON", () => {
  it("handles markdown-wrapped JSON", () => {
    const data = '{"name": "test"}';
    expect(extractJSON(`Here you go:\n\`\`\`json\n${data}\n\`\`\`\nDone!`)).toEqual({ name: "test" });
  });
 
  it("handles trailing commas", () => {
    expect(extractJSON('{"name": "test", "items": [1, 2,],}')).toEqual({ name: "test", items: [1, 2] });
  });
});
 
describe("ProductListingSchema", () => {
  it("rejects negative prices", () => {
    const bad = { products: [{ name: "X", price: -10, description: "y", specs: {}, availability: "in_stock" }],
      source: "test", extractedAt: "2026-04-06T00:00:00Z" };
    expect(() => ProductListingSchema.parse(bad)).toThrow();
  });
});
 
describe("extractJSON round-trip", () => {
  it("parses any valid JSON wrapped in fences", () => {
    fc.assert(fc.property(fc.object({ maxDepth: 2, maxKeys: 5 }), (obj) => {
      const wrapped = `\`\`\`json\n${JSON.stringify(obj)}\n\`\`\``;
      expect(extractJSON(wrapped)).toEqual(obj);
    }));
  });
});

Property-based testing with fast-check catches edge cases you'd never think to write — deeply nested objects, special characters, empty arrays.

The Takeaway

Don't trust LLM output. Validate it. The stack I use in production:

Provider-native structured output (or generateObject) as the primary path — eliminates ~99% of parsing issues at the source
Zod validation always — even native JSON mode can produce semantically wrong data
extractJSON as a fallback — for models that don't support constrained decoding
Retry with error feedback — models self-correct when you tell them what went wrong
Test the schema, not the model — mock responses, validate schemas, use property-based tests

Structured output isn't something you bolt on at the end. It's the foundation of every reliable LLM pipeline. Get it right early.

Comments

No comments yet. Be the first to comment!