Rate limiting protects your API from abuse, keeps costs predictable, and enforces fair usage per customer. In a Next.js SaaS running on serverless infrastructure, you need a strategy that works across multiple instances -- an in-memory map resets on every cold start and cannot coordinate across concurrent lambdas.
This post shows how to build a sliding-window rate limiter backed by Neon DB and Drizzle ORM that integrates cleanly with the JWT auth pattern already in this boilerplate.
Add a rate_limits table to track request counts per key and time window:
// modules/rateLimit/rateLimit.schema.ts
import { pgTable, text, integer, timestamp, primaryKey } from 'drizzle-orm/pg-core';
export const rateLimitTable = pgTable(
'rate_limits',
{
key: text('key').notNull(),
windowStart: timestamp('window_start', { withTimezone: true }).notNull(),
count: integer('count').notNull().default(0),
},
(table) => ({
pk: primaryKey({ columns: [table.key, table.windowStart] }),
})
);
Register the table in db/drizzle.ts, then run npm run db:generate && npm run db:migrate.
Create lib/rateLimit.ts. On each call it prunes stale rows, then atomically increments the counter for the current window:
// lib/rateLimit.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt, sql } from 'drizzle-orm';
interface RateLimitOptions {
key: string; // e.g. "user:abc123" or "ip:1.2.3.4"
limit: number; // max requests allowed per window
windowMs: number; // window size in milliseconds
}
interface RateLimitResult {
allowed: boolean;
remaining: number;
resetAt: Date;
}
export async function checkRateLimit({
key,
limit,
windowMs,
}: RateLimitOptions): Promise<RateLimitResult> {
const now = new Date();
const windowStart = new Date(Math.floor(now.getTime() / windowMs) * windowMs);
const windowEnd = new Date(windowStart.getTime() + windowMs);
// Prune rows older than one full window
await db
.delete(rateLimitTable)
.where(lt(rateLimitTable.windowStart, new Date(now.getTime() - windowMs)));
// Insert or increment atomically
const result = await db
.insert(rateLimitTable)
.values({ key, windowStart, count: 1 })
.onConflictDoUpdate({
target: [rateLimitTable.key, rateLimitTable.windowStart],
set: { count: sql`${rateLimitTable.count} + 1` },
})
.returning({ count: rateLimitTable.count });
const count = result[0]?.count ?? 1;
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count),
resetAt: windowEnd,
};
}
The onConflictDoUpdate with count + 1 happens inside a single Postgres statement, so concurrent requests from the same user increment correctly without a race condition.
Here is a protected AI chat endpoint limited to 20 requests per user per minute:
// app/api/ai/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { getUserFromRequest } from '@/lib/auth';
import { checkRateLimit } from '@/lib/rateLimit';
import { handleError } from '@/lib/errors';
export async function POST(req: NextRequest) {
try {
const user = await getUserFromRequest(req);
const { allowed, remaining, resetAt } = await checkRateLimit({
key: `chat:user:${user.id}`,
limit: 20,
windowMs: 60 * 1000,
});
if (!allowed) {
return NextResponse.json(
{ error: 'Rate limit exceeded. Try again shortly.' },
{
status: 429,
headers: {
'X-RateLimit-Limit': '20',
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': resetAt.toISOString(),
'Retry-After': String(Math.ceil((resetAt.getTime() - Date.now()) / 1000)),
},
}
);
}
// Your actual handler logic here...
return NextResponse.json({ ok: true });
} catch (error: unknown) {
return handleError(error);
}
}
The standard X-RateLimit-* headers let clients back off gracefully without guessing.
For unauthenticated endpoints like /api/auth/login, rate limit by IP instead of user ID:
const ip =
req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown';
const { allowed } = await checkRateLimit({
key: `login:ip:${ip}`,
limit: 10,
windowMs: 15 * 60 * 1000, // 15 minutes
});
On Vercel, x-forwarded-for is set by the edge network and contains the real client IP. On self-hosted infrastructure, only trust it when you control the upstream proxy.
The inline delete prunes one window of stale rows on every request -- fine for low-traffic endpoints. For busy routes, move the cleanup to a cron job so it runs once per hour instead:
// app/api/cron/cleanup-rate-limits/route.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt } from 'drizzle-orm';
export async function GET() {
const cutoff = new Date(Date.now() - 24 * 60 * 60 * 1000);
await db.delete(rateLimitTable).where(lt(rateLimitTable.windowStart, cutoff));
return Response.json({ ok: true });
}
Then register it in vercel.json:
{
"crons": [{ "path": "/api/cron/cleanup-rate-limits", "schedule": "0 * * * *" }]
}
This pairs with the background jobs pattern from the cron post -- the same vercel.json block, same thin route handler structure.
Three pieces wire together into production-grade rate limiting: a rate_limits table with a composite primary key for shared state across serverless instances, a checkRateLimit helper that increments atomically with onConflictDoUpdate, and per-route config for the key prefix, limit, and window. Apply the user-keyed variant to AI and billing endpoints; apply the IP-keyed variant to login and registration. Both use the same helper -- only the key string changes.