Skip to content
digital garden
Back to Blog

Building a Website Summarizer with FastAPI, Next.js, and the Vercel AI SDK

8 min read
aifastapinextjsvercel-ai-sdkpythontypescriptllm

I recently built a website summarizer — paste a URL, pick a model, get a streamed summary. The interesting part isn't the summarization itself. It's the architecture: a FastAPI backend that handles scraping and LLM calls, a Next.js frontend, and the Vercel AI SDK to seamlessly switch between models (GPT-4o, Claude, Gemini) without rewriting any streaming logic.

Here's how I put it together.

Architecture Overview

Browser → Next.js Route Handler → FastAPI → Scrape URL → LLM (selectable) → Streamed response → UI

The separation is intentional. FastAPI handles the heavy lifting — scraping, cleaning HTML, and calling LLM APIs. Next.js owns the UI and acts as a proxy to the backend, which keeps API keys server-side and lets us use the Vercel AI SDK's React hooks for streaming.

The FastAPI Backend

Project Setup

mkdir summarizer-api && cd summarizer-api
python -m venv .venv && source .venv/bin/activate
pip install fastapi uvicorn httpx beautifulsoup4 openai anthropic google-genai

Scraping and Cleaning

The first job is turning a URL into clean text. Raw HTML is full of noise — nav bars, footers, scripts — that wastes tokens and confuses the model.

# scraper.py
import httpx
from bs4 import BeautifulSoup
 
async def scrape_url(url: str) -> str:
    async with httpx.AsyncClient(follow_redirects=True, timeout=15) as client:
        resp = await client.get(url, headers={
            "User-Agent": "Mozilla/5.0 (compatible; SummarizerBot/1.0)"
        })
        resp.raise_for_status()
 
    soup = BeautifulSoup(resp.text, "html.parser")
 
    # Remove non-content elements
    for tag in soup(["script", "style", "nav", "footer", "header", "aside", "form"]):
        tag.decompose()
 
    text = soup.get_text(separator="\n", strip=True)
 
    # Collapse excessive whitespace
    lines = [line.strip() for line in text.splitlines() if line.strip()]
    clean = "\n".join(lines)
 
    # Truncate to ~6000 tokens worth of text
    return clean[:12000]

Multi-Model Summarization

This is where it gets interesting. We define a unified interface that routes to different providers based on the model the user selects.

# summarizer.py
from openai import AsyncOpenAI
from anthropic import AsyncAnthropic
from google import genai
 
openai_client = AsyncOpenAI()
anthropic_client = AsyncAnthropic()
google_client = genai.Client()
 
SYSTEM_PROMPT = """You are a website content summarizer. Given the text content
of a web page, produce a clear, well-structured summary that captures the key
points. Use markdown formatting. Be concise but thorough."""
 
MODELS = {
    "gpt-4o": "openai",
    "gpt-4o-mini": "openai",
    "claude-sonnet-4-20250514": "anthropic",
    "claude-haiku-4-5-20251001": "anthropic",
    "gemini-2.0-flash": "google",
}
 
 
async def summarize_openai(text: str, model: str):
    stream = await openai_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Summarize this webpage:\n\n{text}"},
        ],
        stream=True,
    )
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content
 
 
async def summarize_anthropic(text: str, model: str):
    async with anthropic_client.messages.stream(
        model=model,
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=[
            {"role": "user", "content": f"Summarize this webpage:\n\n{text}"},
        ],
    ) as stream:
        async for text_chunk in stream.text_stream:
            yield text_chunk
 
 
async def summarize_google(text: str, model: str):
    response = google_client.models.generate_content_stream(
        model=model,
        contents=f"{SYSTEM_PROMPT}\n\nSummarize this webpage:\n\n{text}",
    )
    for chunk in response:
        if chunk.text:
            yield chunk.text
 
 
async def summarize(text: str, model: str):
    provider = MODELS.get(model)
    if provider == "openai":
        async for chunk in summarize_openai(text, model):
            yield chunk
    elif provider == "anthropic":
        async for chunk in summarize_anthropic(text, model):
            yield chunk
    elif provider == "google":
        async for chunk in summarize_google(text, model):
            yield chunk
    else:
        raise ValueError(f"Unknown model: {model}")

The API Endpoint

FastAPI's StreamingResponse makes it straightforward to pipe LLM output directly to the client.

# main.py
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, HttpUrl
 
from scraper import scrape_url
from summarizer import summarize, MODELS
 
app = FastAPI()
 
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000"],
    allow_methods=["POST"],
    allow_headers=["*"],
)
 
 
class SummarizeRequest(BaseModel):
    url: HttpUrl
    model: str = "gpt-4o-mini"
 
 
@app.post("/api/summarize")
async def summarize_endpoint(req: SummarizeRequest):
    if req.model not in MODELS:
        raise HTTPException(400, f"Unsupported model. Choose from: {list(MODELS.keys())}")
 
    try:
        content = await scrape_url(str(req.url))
    except Exception:
        raise HTTPException(422, "Failed to fetch or parse the URL")
 
    if len(content.strip()) < 100:
        raise HTTPException(422, "Page has too little text content to summarize")
 
    return StreamingResponse(
        summarize(content, req.model),
        media_type="text/plain",
    )

Run it:

uvicorn main:app --reload --port 8000

The Next.js Frontend

Why the Vercel AI SDK?

You could just fetch the stream and manually read chunks. But the Vercel AI SDK gives you:

  • useCompletion — a React hook that manages streaming state, loading indicators, and abort controllers
  • Model switching — change the model parameter and the hook handles the rest
  • Built-in error handling — no manual ReadableStream wrangling

Route Handler (API Proxy)

The Next.js route handler proxies requests to FastAPI. This keeps the backend URL out of client-side code and lets you add auth or rate limiting later.

// app/api/summarize/route.ts
import { NextRequest } from "next/server";
 
export async function POST(req: NextRequest) {
  const body = await req.json();
 
  const response = await fetch("http://localhost:8000/api/summarize", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body),
  });
 
  if (!response.ok) {
    const error = await response.text();
    return new Response(error, { status: response.status });
  }
 
  // Forward the stream directly
  return new Response(response.body, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

The Summarizer Component

This is where the Vercel AI SDK shines. The useCompletion hook manages the entire streaming lifecycle.

// components/summarizer.tsx
"use client";
 
import { useCompletion } from "ai/react";
import { useState } from "react";
 
const MODELS = [
  { id: "gpt-4o", label: "GPT-4o", provider: "OpenAI" },
  { id: "gpt-4o-mini", label: "GPT-4o Mini", provider: "OpenAI" },
  { id: "claude-sonnet-4-20250514", label: "Claude Sonnet", provider: "Anthropic" },
  { id: "claude-haiku-4-5-20251001", label: "Claude Haiku", provider: "Anthropic" },
  { id: "gemini-2.0-flash", label: "Gemini Flash", provider: "Google" },
];
 
export function Summarizer() {
  const [url, setUrl] = useState("");
  const [model, setModel] = useState("gpt-4o-mini");
 
  const { completion, complete, isLoading, error, stop } = useCompletion({
    api: "/api/summarize",
    body: { model },
  });
 
  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    if (!url.trim()) return;
    await complete(url, { body: { url, model } });
  };
 
  return (
    <div className="mx-auto max-w-2xl space-y-6">
      <form onSubmit={handleSubmit} className="space-y-4">
        <div>
          <label htmlFor="url" className="block text-sm font-medium">
            Website URL
          </label>
          <input
            id="url"
            type="url"
            required
            value={url}
            onChange={(e) => setUrl(e.target.value)}
            placeholder="https://example.com/article"
            className="mt-1 w-full rounded-md border px-3 py-2"
          />
        </div>
 
        <div>
          <label htmlFor="model" className="block text-sm font-medium">
            Model
          </label>
          <select
            id="model"
            value={model}
            onChange={(e) => setModel(e.target.value)}
            className="mt-1 w-full rounded-md border px-3 py-2"
          >
            {MODELS.map((m) => (
              <option key={m.id} value={m.id}>
                {m.label} ({m.provider})
              </option>
            ))}
          </select>
        </div>
 
        <div className="flex gap-2">
          <button
            type="submit"
            disabled={isLoading}
            className="rounded-md bg-blue-600 px-4 py-2 text-white disabled:opacity-50"
          >
            {isLoading ? "Summarizing..." : "Summarize"}
          </button>
          {isLoading && (
            <button
              type="button"
              onClick={stop}
              className="rounded-md border px-4 py-2"
            >
              Stop
            </button>
          )}
        </div>
      </form>
 
      {error && (
        <div className="rounded-md bg-red-50 p-4 text-red-700" role="alert">
          {error.message}
        </div>
      )}
 
      {completion && (
        <article className="prose max-w-none rounded-md border p-6">
          {completion}
        </article>
      )}
    </div>
  );
}

Switching models is just changing a <select> value. The useCompletion hook passes it through as part of the request body. No provider-specific code on the frontend at all.

Model Switching in Practice

The key insight is that the Vercel AI SDK doesn't care what's behind the stream. As long as your endpoint returns a text stream, the hooks work. This means your backend can route to any provider — OpenAI, Anthropic, Google, a local Ollama instance — and the frontend code stays identical.

Here's the flow when a user switches from GPT-4o to Claude Sonnet:

  1. User selects "Claude Sonnet" from the dropdown
  2. model state updates to claude-sonnet-4-20250514
  3. User clicks "Summarize"
  4. useCompletion sends { url, model: "claude-sonnet-4-20250514" } to /api/summarize
  5. Next.js route handler proxies to FastAPI
  6. FastAPI routes to the Anthropic client based on the model name
  7. The response streams back through the same chain
  8. completion updates in real-time — no code changes needed

Adding a New Model

Want to add Llama via Ollama? Three steps:

1. Add the provider function in FastAPI:

async def summarize_ollama(text: str, model: str):
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "http://localhost:11434/api/generate",
            json={"model": model, "prompt": f"{SYSTEM_PROMPT}\n\n{text}", "stream": True},
        ) as resp:
            async for line in resp.aiter_lines():
                data = json.loads(line)
                if data.get("response"):
                    yield data["response"]

2. Register it in the model map:

MODELS["llama3.1"] = "ollama"

3. Add it to the frontend dropdown:

{ id: "llama3.1", label: "Llama 3.1", provider: "Local" },

That's it. The Vercel AI SDK picks it up automatically because the streaming interface hasn't changed.

Production Considerations

A few things to handle before shipping this for real:

  • Rate limiting — Add middleware on both the FastAPI and Next.js sides. You don't want someone scraping your LLM budget.
  • Caching — Store summaries keyed by (url, model). If someone summarizes the same article with the same model, serve the cached version.
  • Content length limits — The scraper already truncates at 12,000 characters, but you should also count tokens properly using tiktoken or the provider's tokenizer.
  • Error boundaries — Wrap the summarizer component in a React error boundary. LLM streams can fail mid-response.
  • Robots.txt respect — Check the target site's robots.txt before scraping. Some sites explicitly disallow bots.

Wrapping Up

The combination of FastAPI + Next.js + Vercel AI SDK works well for any application where you want to give users model choice without writing provider-specific frontend code. FastAPI handles the backend complexity of multiple LLM providers. The Vercel AI SDK's useCompletion hook makes streaming feel like a simple state update. And the clean separation means you can swap, add, or remove models without touching the UI layer.

The pattern extends beyond summarizers — chatbots, code generators, translation tools — anywhere you want streaming LLM output with model flexibility, this architecture scales.

Comments

No comments yet. Be the first to comment!