Streaming AI responses in Next.js App Router using the Clau…

Most AI chat features look the same from the outside: you type a message, wait, and the full answer appears. That experience feels slow. Streaming fixes it -- tokens arrive as Claude generates them, the UI updates in real time, and the app feels responsive even for long answers.

This post covers the full pattern: a streaming API route using the Anthropic SDK, Server-Sent Events consumed by a custom hook, and the details that trip people up when they first wire it up.

The server route

Create app/api/ai/chat/route.ts. The route validates the request, calls the Claude API with streaming enabled, and pipes tokens to the response as SSE.

import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { getUserFromRequest } from "@/lib/auth";
import { HttpError, handleError } from "@/lib/errors";
 
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
 
export async function POST(req: NextRequest) {
  try {
    const user = await getUserFromRequest(req);
    const { message } = await req.json();
 
    if (!message || typeof message !== "string") {
      throw new HttpError(400, "message is required");
    }
 
    const stream = anthropic.messages.stream({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      messages: [{ role: "user", content: message }],
    });
 
    const encoder = new TextEncoder();
    const readable = new ReadableStream({
      async start(controller) {
        for await (const event of stream) {
          if (
            event.type === "content_block_delta" &&
            event.delta.type === "text_delta"
          ) {
            controller.enqueue(
              encoder.encode(
                `data: ${JSON.stringify({ text: event.delta.text })}\n\n`
              )
            );
          }
        }
        controller.enqueue(encoder.encode("data: [DONE]\n\n"));
        controller.close();
      },
    });
 
    return new Response(readable, {
      headers: {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache",
        Connection: "keep-alive",
      },
    });
  } catch (error: unknown) {
    return handleError(error);
  }
}

Three things worth noting:

anthropic.messages.stream() returns an async iterable -- iterate it to get token events.
content_block_delta with text_delta is the event type that carries actual text. Other event types (message start, ping, stop) can be ignored for a basic chat.
Wrap the iterable in a ReadableStream so Next.js flushes chunks as they arrive rather than buffering the full response.

The client hook

Do not use EventSource for authenticated streams -- it does not support custom request headers, so you cannot attach a Bearer token. Use fetch with a manual reader instead.

// hooks/useAiChat.ts
"use client";
 
import { useState } from "react";
 
interface Message {
  role: "user" | "assistant";
  content: string;
}
 
export function useAiChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [streaming, setStreaming] = useState(false);
 
  async function send(input: string) {
    setMessages((prev) => [...prev, { role: "user", content: input }]);
    setStreaming(true);
    setMessages((prev) => [...prev, { role: "assistant", content: "" }]);
 
    const res = await fetch("/api/ai/chat", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${localStorage.getItem("token")}`,
      },
      body: JSON.stringify({ message: input }),
    });
 
    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
 
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
 
      const lines = decoder
        .decode(value)
        .split("\n")
        .filter((l) => l.startsWith("data: "));
 
      for (const line of lines) {
        const payload = line.slice(6);
        if (payload === "[DONE]") {
          setStreaming(false);
          return;
        }
        const { text } = JSON.parse(payload) as { text: string };
        setMessages((prev) => {
          const next = [...prev];
          next[next.length - 1] = {
            role: "assistant",
            content: next[next.length - 1].content + text,
          };
          return next;
        });
      }
    }
 
    setStreaming(false);
  }
 
  return { messages, streaming, send };
}

The functional setMessages updater matters here. Because multiple chunks can arrive before React re-renders, using the previous-state form ((prev) => ...) ensures each token is appended to the latest accumulated string rather than an outdated snapshot.

Wiring it into a component

// components/AiChat.tsx
"use client";
 
import { useState } from "react";
import { useAiChat } from "@/hooks/useAiChat";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
 
export function AiChat() {
  const { messages, streaming, send } = useAiChat();
  const [input, setInput] = useState("");
 
  function handleSubmit(e: React.FormEvent) {
    e.preventDefault();
    if (!input.trim() || streaming) return;
    send(input.trim());
    setInput("");
  }
 
  return (
    <div className="flex flex-col gap-4">
      <div className="flex flex-col gap-2">
        {messages.map((msg, i) => (
          <div
            key={i}
            className={
              msg.role === "user"
                ? "text-right"
                : "text-left text-muted-foreground"
            }
          >
            {msg.content}
          </div>
        ))}
        {streaming && (
          <div className="animate-pulse text-muted-foreground">...</div>
        )}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2">
        <Input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask Claude something..."
          disabled={streaming}
        />
        <Button type="submit" disabled={streaming}>
          Send
        </Button>
      </form>
    </div>
  );
}

Add prompt caching for long system prompts

If your route includes a system prompt longer than 1000 tokens, cache it. The first call is full price; cache hits cost about 10% of that.

const stream = anthropic.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: LONG_SYSTEM_PROMPT,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: message }],
});

No other change needed -- the SDK sends the anthropic-beta: prompt-caching-2024-07-31 header automatically when it sees a cache_control block.

Next step: add credits

The most common follow-on is gating AI calls behind a credits balance. Add a credits integer column to your users table, then deduct before calling Claude:

if (user.credits < Number(process.env.AI_CREDITS_PER_MESSAGE ?? 10)) {
  throw new HttpError(402, "Insufficient credits");
}
await userRepo.deductCredits(user.id, credits);
// now call anthropic.messages.stream(...)

Fail fast before the Claude call -- you do not want to charge Claude for a request that fails authorization partway through. Once credits are wired up, pair it with a Stripe one-time checkout to let users top up, and you have a complete monetized AI feature with a working streaming UI.

The server route

Create app/api/ai/chat/route.ts. The route validates the request, calls the Claude API with streaming enabled, and pipes tokens to the response as SSE.

import { NextRequest } from "next/server"; import Anthropic from "@anthropic-ai/sdk"; import { getUserFromRequest } from "@/lib/auth"; import { HttpError, handleError } from "@/lib/errors"; const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); export async function POST(req: NextRequest) { try { const user = await getUserFromRequest(req); const { message } = await req.json(); if (!message || typeof message !== "string") { throw new HttpError(400, "message is required"); } const stream = anthropic.messages.stream({ model: "claude-sonnet-4-6", max_tokens: 1024, messages: [{ role: "user", content: message }], }); const encoder = new TextEncoder(); const readable = new ReadableStream({ async start(controller) { for await (const event of stream) { if ( event.type === "content_block_delta" && event.delta.type === "text_delta" ) { controller.enqueue( encoder.encode( `data: ${JSON.stringify({ text: event.delta.text })}\n\n` ) ); } } controller.enqueue(encoder.encode("data: [DONE]\n\n")); controller.close(); }, }); return new Response(readable, { headers: { "Content-Type": "text/event-stream", "Cache-Control": "no-cache", Connection: "keep-alive", }, }); } catch (error: unknown) { return handleError(error); } }

Three things worth noting:

anthropic.messages.stream() returns an async iterable -- iterate it to get token events.

content_block_delta with text_delta is the event type that carries actual text. Other event types (message start, ping, stop) can be ignored for a basic chat.

Wrap the iterable in a ReadableStream so Next.js flushes chunks as they arrive rather than buffering the full response.

The client hook

Do not use EventSource for authenticated streams -- it does not support custom request headers, so you cannot attach a Bearer token. Use fetch with a manual reader instead.

// hooks/useAiChat.ts "use client"; import { useState } from "react"; interface Message { role: "user" | "assistant"; content: string; } export function useAiChat() { const [messages, setMessages] = useState<Message[]>([]); const [streaming, setStreaming] = useState(false); async function send(input: string) { setMessages((prev) => [...prev, { role: "user", content: input }]); setStreaming(true); setMessages((prev) => [...prev, { role: "assistant", content: "" }]); const res = await fetch("/api/ai/chat", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${localStorage.getItem("token")}`, }, body: JSON.stringify({ message: input }), }); const reader = res.body!.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const lines = decoder .decode(value) .split("\n") .filter((l) => l.startsWith("data: ")); for (const line of lines) { const payload = line.slice(6); if (payload === "[DONE]") { setStreaming(false); return; } const { text } = JSON.parse(payload) as { text: string }; setMessages((prev) => { const next = [...prev]; next[next.length - 1] = { role: "assistant", content: next[next.length - 1].content + text, }; return next; }); } } setStreaming(false); } return { messages, streaming, send }; }

Wiring it into a component

// components/AiChat.tsx "use client"; import { useState } from "react"; import { useAiChat } from "@/hooks/useAiChat"; import { Button } from "@/components/ui/button"; import { Input } from "@/components/ui/input"; export function AiChat() { const { messages, streaming, send } = useAiChat(); const [input, setInput] = useState(""); function handleSubmit(e: React.FormEvent) { e.preventDefault(); if (!input.trim() || streaming) return; send(input.trim()); setInput(""); } return ( <div className="flex flex-col gap-4"> <div className="flex flex-col gap-2"> {messages.map((msg, i) => ( <div key={i} className={ msg.role === "user" ? "text-right" : "text-left text-muted-foreground" } > {msg.content} </div> ))} {streaming && ( <div className="animate-pulse text-muted-foreground">...</div> )} </div> <form onSubmit={handleSubmit} className="flex gap-2"> <Input value={input} onChange={(e) => setInput(e.target.value)} placeholder="Ask Claude something..." disabled={streaming} /> <Button type="submit" disabled={streaming}> Send </Button> </form> </div> ); }

Add prompt caching for long system prompts

If your route includes a system prompt longer than 1000 tokens, cache it. The first call is full price; cache hits cost about 10% of that.

const stream = anthropic.messages.stream({ model: "claude-sonnet-4-6", max_tokens: 1024, system: [ { type: "text", text: LONG_SYSTEM_PROMPT, cache_control: { type: "ephemeral" }, }, ], messages: [{ role: "user", content: message }], });

No other change needed -- the SDK sends the anthropic-beta: prompt-caching-2024-07-31 header automatically when it sees a cache_control block.

Next step: add credits

The most common follow-on is gating AI calls behind a credits balance. Add a credits integer column to your users table, then deduct before calling Claude:

if (user.credits < Number(process.env.AI_CREDITS_PER_MESSAGE ?? 10)) { throw new HttpError(402, "Insufficient credits"); } await userRepo.deductCredits(user.id, credits); // now call anthropic.messages.stream(...)

Streaming AI responses in Next.js App Router using the Claude API

The server route

The client hook

Wiring it into a component

Add prompt caching for long system prompts

Next step: add credits

Streaming AI responses in Next.js App Router using the Claude API

The server route

The client hook

Wiring it into a component

Add prompt caching for long system prompts

Next step: add credits