Streaming gives Claude a voice. Tool use gives Claude hands.
With streaming, Claude generates text and you display it token by token. With tool use, Claude decides to call a function -- a database lookup, a search, a calculation -- and your server runs it. The result comes back, Claude reads it, and continues. This loop repeats until Claude decides it is done.
The boilerplate ships with two AI routes: POST /api/ai/chat for streaming text, and POST /api/ai/agent for tool use. This post covers the agent route.
The agent endpoint is thin. Validate input, deduct credits, hand off to the service:
// app/api/ai/agent/route.ts
import { NextRequest, NextResponse } from "next/server";
import { getUserFromRequest } from "@/lib/auth";
import { handleError } from "@/lib/errors";
import { agentService } from "@/modules/ai/ai.service";
export async function POST(req: NextRequest) {
try {
const user = await getUserFromRequest(req);
const { messages, tools } = await req.json();
const result = await agentService.run({ user, messages, tools });
return NextResponse.json(result);
} catch (error: unknown) {
return handleError(error);
}
}
The route never imports Anthropic directly. The agentic loop lives entirely in the service layer.
Tools are plain objects with a name, description, and JSON Schema for their inputs. Claude reads these and decides when to call them.
const searchPostsTool = {
name: "search_posts",
description:
"Search published blog posts by keyword. Returns titles, slugs, and descriptions.",
input_schema: {
type: "object",
properties: {
query: {
type: "string",
description: "The search keyword or phrase",
},
},
required: ["query"],
},
};
The description is the contract. Write it precisely -- Claude uses it to decide when and how to call the tool. Vague descriptions produce wrong calls.
Call Claude, check stop_reason, execute tools if needed, loop:
// modules/ai/ai.service.ts
import { claude } from "@/lib/claude";
import { HttpError } from "@/lib/errors";
import { userService } from "@/modules/user/user.service";
import type { MessageParam, Tool } from "@anthropic-ai/sdk/resources/messages";
const CREDITS_PER_CALL = Number(process.env.AI_CREDITS_PER_MESSAGE ?? 10);
export const agentService = {
async run({
user,
messages,
tools,
}: {
user: { id: string };
messages: MessageParam[];
tools: Tool[];
}) {
await userService.deductCredits(user.id, CREDITS_PER_CALL);
let currentMessages = [...messages];
while (true) {
const response = await claude.messages.create({
model: "claude-opus-4-8",
max_tokens: 4096,
tools,
messages: currentMessages,
});
if (response.stop_reason !== "tool_use") {
return { content: response.content, usage: response.usage };
}
const toolResults = await Promise.all(
response.content
.filter((block) => block.type === "tool_use")
.map(async (block) => {
if (block.type !== "tool_use") return null;
const output = await dispatchTool(block.name, block.input);
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: JSON.stringify(output),
};
})
);
currentMessages = [
...currentMessages,
{ role: "assistant", content: response.content },
{ role: "user", content: toolResults.filter(Boolean) },
];
}
},
};
The loop has one exit condition: stop_reason !== 'tool_use'. Do not break early. Claude may call several tools across multiple turns before producing a final answer.
dispatchTool is a switch over tool names. Each branch calls the appropriate service function:
async function dispatchTool(name: string, input: unknown): Promise<unknown> {
switch (name) {
case "search_posts": {
const { query } = input as { query: string };
return postService.search(query);
}
case "get_user_stats": {
const { userId } = input as { userId: string };
return userService.getStats(userId);
}
default:
throw new HttpError(400, `Unknown tool: ${name}`);
}
}
Keep this function in the service file. Routes dispatch to services, services dispatch to tools, tools call other services. The layering stays intact.
Deduct before the first Claude call, not after. If the user is out of credits, fail with a 402 before spending any tokens:
await userService.deductCredits(user.id, CREDITS_PER_CALL);
One deduction covers the entire agent run regardless of how many loop turns it takes. Agentic runs use more tokens than a single chat message, so tune AI_CREDITS_PER_MESSAGE in your env accordingly.
| Chat (streaming) | Agent (tool use) | |
|---|---|---|
| Response shape | SSE token stream | JSON when loop ends |
| Turns | One | One or many |
| Latency | Low | Higher |
| When to use | Conversational output | Tasks that need real data |
Tool use is slower and more expensive than streaming. Use it when Claude genuinely needs live data from your system -- not for generating text that needs no external context.
The pattern is always the same: call Claude, check stop_reason, execute tool calls, repeat. Wire the loop once in the service layer, add tool cases to dispatchTool as features grow, and keep the route as a thin entry point. Start with one tool Claude actually needs and extend from there.