Rate limiting in Next.js App Router -- per-user and per-IP…

Rate limiting protects your API from abuse, keeps costs predictable, and enforces fair usage per customer. In a Next.js SaaS running on serverless infrastructure, you need a strategy that works across multiple instances -- an in-memory map resets on every cold start and cannot coordinate across concurrent lambdas.

This post shows how to build a sliding-window rate limiter backed by Neon DB and Drizzle ORM that integrates cleanly with the JWT auth pattern already in this boilerplate.

The schema

Add a rate_limits table to track request counts per key and time window:

// modules/rateLimit/rateLimit.schema.ts
import { pgTable, text, integer, timestamp, primaryKey } from 'drizzle-orm/pg-core';
 
export const rateLimitTable = pgTable(
  'rate_limits',
  {
    key: text('key').notNull(),
    windowStart: timestamp('window_start', { withTimezone: true }).notNull(),
    count: integer('count').notNull().default(0),
  },
  (table) => ({
    pk: primaryKey({ columns: [table.key, table.windowStart] }),
  })
);

The sliding window helper

Create lib/rateLimit.ts. On each call it prunes stale rows, then atomically increments the counter for the current window:

// lib/rateLimit.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt, sql } from 'drizzle-orm';
 
interface RateLimitOptions {
  key: string;      // e.g. "user:abc123" or "ip:1.2.3.4"
  limit: number;    // max requests allowed per window
  windowMs: number; // window size in milliseconds
}
 
interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: Date;
}
 
export async function checkRateLimit({
  key,
  limit,
  windowMs,
}: RateLimitOptions): Promise<RateLimitResult> {
  const now = new Date();
  const windowStart = new Date(Math.floor(now.getTime() / windowMs) * windowMs);
  const windowEnd = new Date(windowStart.getTime() + windowMs);
 
  // Prune rows older than one full window
  await db
    .delete(rateLimitTable)
    .where(lt(rateLimitTable.windowStart, new Date(now.getTime() - windowMs)));
 
  // Insert or increment atomically
  const result = await db
    .insert(rateLimitTable)
    .values({ key, windowStart, count: 1 })
    .onConflictDoUpdate({
      target: [rateLimitTable.key, rateLimitTable.windowStart],
      set: { count: sql`${rateLimitTable.count} + 1` },
    })
    .returning({ count: rateLimitTable.count });
 
  const count = result[0]?.count ?? 1;
 
  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt: windowEnd,
  };
}

The onConflictDoUpdate with count + 1 happens inside a single Postgres statement, so concurrent requests from the same user increment correctly without a race condition.

Applying it to an authenticated route

Here is a protected AI chat endpoint limited to 20 requests per user per minute:

// app/api/ai/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { getUserFromRequest } from '@/lib/auth';
import { checkRateLimit } from '@/lib/rateLimit';
import { handleError } from '@/lib/errors';
 
export async function POST(req: NextRequest) {
  try {
    const user = await getUserFromRequest(req);
 
    const { allowed, remaining, resetAt } = await checkRateLimit({
      key: `chat:user:${user.id}`,
      limit: 20,
      windowMs: 60 * 1000,
    });
 
    if (!allowed) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Try again shortly.' },
        {
          status: 429,
          headers: {
            'X-RateLimit-Limit': '20',
            'X-RateLimit-Remaining': '0',
            'X-RateLimit-Reset': resetAt.toISOString(),
            'Retry-After': String(Math.ceil((resetAt.getTime() - Date.now()) / 1000)),
          },
        }
      );
    }
 
    // Your actual handler logic here...
    return NextResponse.json({ ok: true });
  } catch (error: unknown) {
    return handleError(error);
  }
}

The standard X-RateLimit-* headers let clients back off gracefully without guessing.

IP-based limiting for public routes

For unauthenticated endpoints like /api/auth/login, rate limit by IP instead of user ID:

const ip =
  req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown';
 
const { allowed } = await checkRateLimit({
  key: `login:ip:${ip}`,
  limit: 10,
  windowMs: 15 * 60 * 1000, // 15 minutes
});

On Vercel, x-forwarded-for is set by the edge network and contains the real client IP. On self-hosted infrastructure, only trust it when you control the upstream proxy.

Keeping the table clean

The inline delete prunes one window of stale rows on every request -- fine for low-traffic endpoints. For busy routes, move the cleanup to a cron job so it runs once per hour instead:

// app/api/cron/cleanup-rate-limits/route.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt } from 'drizzle-orm';
 
export async function GET() {
  const cutoff = new Date(Date.now() - 24 * 60 * 60 * 1000);
  await db.delete(rateLimitTable).where(lt(rateLimitTable.windowStart, cutoff));
  return Response.json({ ok: true });
}

Then register it in vercel.json:

{
  "crons": [{ "path": "/api/cron/cleanup-rate-limits", "schedule": "0 * * * *" }]
}

This pairs with the background jobs pattern from the cron post -- the same vercel.json block, same thin route handler structure.

The takeaway

Three pieces wire together into production-grade rate limiting: a rate_limits table with a composite primary key for shared state across serverless instances, a checkRateLimit helper that increments atomically with onConflictDoUpdate, and per-route config for the key prefix, limit, and window. Apply the user-keyed variant to AI and billing endpoints; apply the IP-keyed variant to login and registration. Both use the same helper -- only the key string changes.

This post shows how to build a sliding-window rate limiter backed by Neon DB and Drizzle ORM that integrates cleanly with the JWT auth pattern already in this boilerplate.

The schema

Add a rate_limits table to track request counts per key and time window:

// modules/rateLimit/rateLimit.schema.ts
import { pgTable, text, integer, timestamp, primaryKey } from 'drizzle-orm/pg-core';
 
export const rateLimitTable = pgTable(
  'rate_limits',
  {
    key: text('key').notNull(),
    windowStart: timestamp('window_start', { withTimezone: true }).notNull(),
    count: integer('count').notNull().default(0),
  },
  (table) => ({
    pk: primaryKey({ columns: [table.key, table.windowStart] }),
  })
);

The sliding window helper

Create lib/rateLimit.ts. On each call it prunes stale rows, then atomically increments the counter for the current window:

// lib/rateLimit.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt, sql } from 'drizzle-orm';
 
interface RateLimitOptions {
  key: string;      // e.g. "user:abc123" or "ip:1.2.3.4"
  limit: number;    // max requests allowed per window
  windowMs: number; // window size in milliseconds
}
 
interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: Date;
}
 
export async function checkRateLimit({
  key,
  limit,
  windowMs,
}: RateLimitOptions): Promise<RateLimitResult> {
  const now = new Date();
  const windowStart = new Date(Math.floor(now.getTime() / windowMs) * windowMs);
  const windowEnd = new Date(windowStart.getTime() + windowMs);
 
  // Prune rows older than one full window
  await db
    .delete(rateLimitTable)
    .where(lt(rateLimitTable.windowStart, new Date(now.getTime() - windowMs)));
 
  // Insert or increment atomically
  const result = await db
    .insert(rateLimitTable)
    .values({ key, windowStart, count: 1 })
    .onConflictDoUpdate({
      target: [rateLimitTable.key, rateLimitTable.windowStart],
      set: { count: sql`${rateLimitTable.count} + 1` },
    })
    .returning({ count: rateLimitTable.count });
 
  const count = result[0]?.count ?? 1;
 
  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt: windowEnd,
  };
}

The onConflictDoUpdate with count + 1 happens inside a single Postgres statement, so concurrent requests from the same user increment correctly without a race condition.

Applying it to an authenticated route

Here is a protected AI chat endpoint limited to 20 requests per user per minute:

// app/api/ai/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { getUserFromRequest } from '@/lib/auth';
import { checkRateLimit } from '@/lib/rateLimit';
import { handleError } from '@/lib/errors';
 
export async function POST(req: NextRequest) {
  try {
    const user = await getUserFromRequest(req);
 
    const { allowed, remaining, resetAt } = await checkRateLimit({
      key: `chat:user:${user.id}`,
      limit: 20,
      windowMs: 60 * 1000,
    });
 
    if (!allowed) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Try again shortly.' },
        {
          status: 429,
          headers: {
            'X-RateLimit-Limit': '20',
            'X-RateLimit-Remaining': '0',
            'X-RateLimit-Reset': resetAt.toISOString(),
            'Retry-After': String(Math.ceil((resetAt.getTime() - Date.now()) / 1000)),
          },
        }
      );
    }
 
    // Your actual handler logic here...
    return NextResponse.json({ ok: true });
  } catch (error: unknown) {
    return handleError(error);
  }
}

The standard X-RateLimit-* headers let clients back off gracefully without guessing.

IP-based limiting for public routes

For unauthenticated endpoints like /api/auth/login, rate limit by IP instead of user ID:

const ip =
  req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown';
 
const { allowed } = await checkRateLimit({
  key: `login:ip:${ip}`,
  limit: 10,
  windowMs: 15 * 60 * 1000, // 15 minutes
});

On Vercel, x-forwarded-for is set by the edge network and contains the real client IP. On self-hosted infrastructure, only trust it when you control the upstream proxy.

Keeping the table clean

The inline delete prunes one window of stale rows on every request -- fine for low-traffic endpoints. For busy routes, move the cleanup to a cron job so it runs once per hour instead:

// app/api/cron/cleanup-rate-limits/route.ts
import { db } from '@/db/drizzle';
import { rateLimitTable } from '@/modules/rateLimit/rateLimit.schema';
import { lt } from 'drizzle-orm';
 
export async function GET() {
  const cutoff = new Date(Date.now() - 24 * 60 * 60 * 1000);
  await db.delete(rateLimitTable).where(lt(rateLimitTable.windowStart, cutoff));
  return Response.json({ ok: true });
}

Then register it in vercel.json:

{
  "crons": [{ "path": "/api/cron/cleanup-rate-limits", "schedule": "0 * * * *" }]
}

This pairs with the background jobs pattern from the cron post -- the same vercel.json block, same thin route handler structure.

Rate limiting in Next.js App Router -- per-user and per-IP protection for SaaS API routes

The schema

The sliding window helper

Applying it to an authenticated route

IP-based limiting for public routes

Keeping the table clean

The takeaway

Rate limiting in Next.js App Router -- per-user and per-IP protection for SaaS API routes

The schema

The sliding window helper

Applying it to an authenticated route

IP-based limiting for public routes

Keeping the table clean

The takeaway