Most AI chat features look the same from the outside: you type a message, wait, and the full answer appears. That experience feels slow. Streaming fixes it -- tokens arrive as Claude generates them, the UI updates in real time, and the app feels responsive even for long answers.
This post covers the full pattern: a streaming API route using the Anthropic SDK, Server-Sent Events consumed by a custom hook, and the details that trip people up when they first wire it up.
Create app/api/ai/chat/route.ts. The route validates the request, calls the Claude API with streaming enabled, and pipes tokens to the response as SSE.
import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { getUserFromRequest } from "@/lib/auth";
import { HttpError, handleError } from "@/lib/errors";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
export async function POST(req: NextRequest) {
try {
const user = await getUserFromRequest(req);
const { message } = await req.json();
if (!message || typeof message !== "string") {
throw new HttpError(400, "message is required");
}
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: message }],
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
controller.enqueue(
encoder.encode(
`data: ${JSON.stringify({ text: event.delta.text })}\n\n`
)
);
}
}
controller.enqueue(encoder.encode("data: [DONE]\n\n"));
controller.close();
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
} catch (error: unknown) {
return handleError(error);
}
}
Three things worth noting:
anthropic.messages.stream() returns an async iterable -- iterate it to get token events.content_block_delta with text_delta is the event type that carries actual text. Other event types (message start, ping, stop) can be ignored for a basic chat.ReadableStream so Next.js flushes chunks as they arrive rather than buffering the full response.Do not use EventSource for authenticated streams -- it does not support custom request headers, so you cannot attach a Bearer token. Use fetch with a manual reader instead.
// hooks/useAiChat.ts
"use client";
import { useState } from "react";
interface Message {
role: "user" | "assistant";
content: string;
}
export function useAiChat() {
const [messages, setMessages] = useState<Message[]>([]);
const [streaming, setStreaming] = useState(false);
async function send(input: string) {
setMessages((prev) => [...prev, { role: "user", content: input }]);
setStreaming(true);
setMessages((prev) => [...prev, { role: "assistant", content: "" }]);
const res = await fetch("/api/ai/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${localStorage.getItem("token")}`,
},
body: JSON.stringify({ message: input }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder
.decode(value)
.split("\n")
.filter((l) => l.startsWith("data: "));
for (const line of lines) {
const payload = line.slice(6);
if (payload === "[DONE]") {
setStreaming(false);
return;
}
const { text } = JSON.parse(payload) as { text: string };
setMessages((prev) => {
const next = [...prev];
next[next.length - 1] = {
role: "assistant",
content: next[next.length - 1].content + text,
};
return next;
});
}
}
setStreaming(false);
}
return { messages, streaming, send };
}
The functional setMessages updater matters here. Because multiple chunks can arrive before React re-renders, using the previous-state form ((prev) => ...) ensures each token is appended to the latest accumulated string rather than an outdated snapshot.
// components/AiChat.tsx
"use client";
import { useState } from "react";
import { useAiChat } from "@/hooks/useAiChat";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
export function AiChat() {
const { messages, streaming, send } = useAiChat();
const [input, setInput] = useState("");
function handleSubmit(e: React.FormEvent) {
e.preventDefault();
if (!input.trim() || streaming) return;
send(input.trim());
setInput("");
}
return (
<div className="flex flex-col gap-4">
<div className="flex flex-col gap-2">
{messages.map((msg, i) => (
<div
key={i}
className={
msg.role === "user"
? "text-right"
: "text-left text-muted-foreground"
}
>
{msg.content}
</div>
))}
{streaming && (
<div className="animate-pulse text-muted-foreground">...</div>
)}
</div>
<form onSubmit={handleSubmit} className="flex gap-2">
<Input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask Claude something..."
disabled={streaming}
/>
<Button type="submit" disabled={streaming}>
Send
</Button>
</form>
</div>
);
}
If your route includes a system prompt longer than 1000 tokens, cache it. The first call is full price; cache hits cost about 10% of that.
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: LONG_SYSTEM_PROMPT,
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: message }],
});
No other change needed -- the SDK sends the anthropic-beta: prompt-caching-2024-07-31 header automatically when it sees a cache_control block.
The most common follow-on is gating AI calls behind a credits balance. Add a credits integer column to your users table, then deduct before calling Claude:
if (user.credits < Number(process.env.AI_CREDITS_PER_MESSAGE ?? 10)) {
throw new HttpError(402, "Insufficient credits");
}
await userRepo.deductCredits(user.id, credits);
// now call anthropic.messages.stream(...)
Fail fast before the Claude call -- you do not want to charge Claude for a request that fails authorization partway through. Once credits are wired up, pair it with a Stripe one-time checkout to let users top up, and you have a complete monetized AI feature with a working streaming UI.