Skip to content
back to writing
7 min readnodejs · performance · scalability

How to Handle 1M+ Requests/Second in Node.js

A practical guide to pushing Node.js beyond the usual comfort zone with load shedding, cluster-aware architecture, kernel tuning, proxy offload, and the operational discipline required to survive seven-figure request rates.

RG
Rahul Gupta
Senior Software Engineer
share

Hitting 1M+ requests per second in Node.js is absolutely possible. Hitting it with predictable latency, graceful degradation, and an architecture your team can still operate six months later is the part that matters.

The first thing to get clear: a single Node.js process is not doing 1M RPS on its own. A serious setup reaches that number through horizontal concurrency, ruthless request-path simplification, aggressive offload to the edge, and disciplined backpressure. Node can own the control plane and still sit on the hot path for the data plane, but only if you stop treating it like a monolith.

1. Start with the real unit of scale

When teams say “our Node service should do 1M RPS”, they usually mean one of three different things:

  1. A single process should do 1M RPS. That is unrealistic for almost every non-trivial workload.
  2. A single host should do 1M RPS. That is sometimes possible for extremely small responses on tuned hardware.
  3. A fleet should sustain 1M RPS. That is the target most teams actually need.

For planning purposes, think in this model:

Text
global_rps = edge_cache_hits
           + load_balancer_terminated_fast_paths
           + Node fleet handled requests

If you need a million requests every second, you should expect a large chunk of those requests to die early:

  • at the CDN
  • at the L4/L7 proxy
  • in a local memory cache
  • in a rate limiter before application logic runs

If every request reaches your database or even your full Express/Fastify middleware chain, the architecture is already wrong.

2. Pick the right Node stack

At this scale, framework overhead matters. You do not want a deep middleware stack with dynamic object creation on every request.

My rule of thumb:

  • Use Fastify over Express for hot-path APIs
  • Prefer bare node:http or uWebSockets.js for ultra-thin edge-style handlers
  • Keep JSON schemas precompiled instead of validating ad hoc per request
  • Avoid runtime-heavy abstractions in the request path

A thin Fastify route is often enough:

TypeScript
import Fastify from "fastify";
 
const app = Fastify({
  logger: false,
  disableRequestLogging: true,
});
 
app.get("/healthz", async () => ({ ok: true }));
 
app.get("/lookup/:id", {
  schema: {
    params: {
      type: "object",
      properties: { id: { type: "string" } },
      required: ["id"],
    },
  },
}, async (request, reply) => {
  const data = await cache.get(request.params.id);
  if (!data) return reply.code(404).send({ error: "not_found" });
  return data;
});
 
await app.listen({ port: 3000, host: "0.0.0.0" });

That is already materially better than a generic stack with ten middlewares, string-based logging, and ORM work on every request.

3. Use every core, but isolate responsibilities

Node is single-threaded per event loop. That means one worker per core for network handling, and separate execution domains for CPU-heavy work.

Use cluster or a process manager that gives you equivalent behavior:

TypeScript
import cluster from "node:cluster";
import os from "node:os";
 
if (cluster.isPrimary) {
  const workers = os.availableParallelism();
 
  for (let i = 0; i < workers; i++) {
    cluster.fork();
  }
 
  cluster.on("exit", (worker) => {
    console.error(`worker ${worker.process.pid} exited`);
    cluster.fork();
  });
} else {
  await startHttpServer();
}

But don’t stop there. Separate workloads by type:

  • HTTP workers handle network I/O only
  • Worker threads handle CPU-bound transforms, compression, or crypto if unavoidable
  • Queues handle async work that does not belong in the request lifecycle

The easiest way to kill throughput is to mix low-latency request serving with expensive JSON reshaping, PDF generation, image work, or synchronous crypto in the same event loop.

4. Make the hot path boring

The request path that sustains very high RPS is usually offensively simple:

  1. Accept request
  2. Parse minimal headers
  3. Authenticate cheaply
  4. Read from local or remote cache
  5. Serialize fixed-shape response
  6. Return

Everything else should be pushed away from the hot path.

Good signs:

  • Small payloads
  • Stable response shapes
  • Cacheable reads
  • Precomputed aggregates
  • No synchronous filesystem access
  • No per-request regex-heavy parsing

Bad signs:

  • ORMs doing lazy-loading under load
  • request-scoped dependency injection containers
  • dynamic feature flag evaluation from remote systems
  • constructing giant objects and deleting fields before response

At 1M+ RPS, “just one extra millisecond” is not a rounding error. It is a full second of aggregate compute per thousand requests.

5. Cache like you mean it

You do not scale to seven-figure RPS by making your origin faster. You scale there by ensuring the origin sees far fewer requests than users generate.

Think in layers:

  • CDN cache for public and semi-public content
  • reverse proxy cache for short-lived internal API responses
  • local in-process cache for tiny, immutable, frequently-read reference data
  • Redis or KeyDB for shared cacheable reads and counters

One useful pattern is stale-while-revalidate at the proxy or application layer:

TypeScript
type CacheEntry<T> = {
  value: T;
  expiresAt: number;
  staleUntil: number;
};
 
function readWithStale<T>(entry: CacheEntry<T> | null, now = Date.now()) {
  if (!entry) return { hit: false, stale: false };
  if (now <= entry.expiresAt) return { hit: true, stale: false, value: entry.value };
  if (now <= entry.staleUntil) return { hit: true, stale: true, value: entry.value };
  return { hit: false, stale: false };
}

That pattern matters because a cache miss storm is one of the fastest ways to melt your entire fleet.

6. Treat backpressure as a feature

A system capable of 1M RPS does not try to serve every request equally. It decides what to drop, what to delay, and what to protect.

That means:

  • rate limits at the edge
  • bounded queues
  • connection caps
  • fast 429 and 503 responses
  • circuit breakers around dependencies

If Redis or your database begins timing out, the Node layer should degrade immediately instead of waiting for a full pile-up.

A simple adaptive shedder can be enough:

TypeScript
let rejectNewRequests = false;
 
setInterval(() => {
  const lag = getEventLoopP99Ms();
  const cpu = getCpuUtilization();
  rejectNewRequests = lag > 40 || cpu > 0.85;
}, 1000);
 
app.addHook("onRequest", async (_request, reply) => {
  if (!rejectNewRequests) return;
  reply.code(503).send({ error: "overloaded_try_again" });
});

That looks blunt, and it is. Under real overload, blunt is often better than elegant.

7. Offload compression, TLS, and connection churn

Do not make Node handle work your proxy is better at.

The high-throughput layout usually looks like:

Text
client
  -> CDN
  -> edge/load balancer
  -> NGINX / Envoy / HAProxy
  -> Node workers
  -> cache / queue / database

Let the fronting proxy absorb:

  • TLS termination
  • HTTP/2 or HTTP/3 session handling
  • gzip/brotli compression
  • request buffering where appropriate
  • connection pooling to the application tier
  • coarse rate limiting

Node should spend its budget on application semantics, not certificate handshakes and compression bookkeeping.

8. Fix your data layer before tuning Node

Most high-scale Node bottlenecks are not actually in Node.

They are:

  • a database doing point reads without the right index
  • a cache with low hit ratio
  • cross-region latency
  • noisy-neighbor effects in Kubernetes
  • oversized payloads
  • a chatty service-to-service graph

If every request triggers a database roundtrip, your maximum RPS is the database’s problem, not the runtime’s.

The core strategy is:

  1. Move repeated reads into cache
  2. Precompute expensive joins and aggregates
  3. Partition traffic by tenant, key, or geography
  4. Keep data close to where the request is served

If you have to choose between shaving 5% off Node runtime overhead and eliminating one backend hop, eliminate the hop every time.

9. Watch the event loop, not just latency

Latency dashboards alone hide the reason a Node service is degrading. Event loop lag tells you when the process is losing control of its scheduling budget.

TypeScript
import { monitorEventLoopDelay } from "node:perf_hooks";
 
const histogram = monitorEventLoopDelay({ resolution: 20 });
histogram.enable();
 
setInterval(() => {
  metrics.gauge("node.event_loop.p99_ms", histogram.percentile(99) / 1e6);
  histogram.reset();
}, 5000);

Track at least:

  • requests per second
  • p50, p95, p99 latency
  • event loop lag
  • open connections
  • active handles
  • GC pause time
  • cache hit ratio
  • upstream dependency latency
  • rejection rate from load shedding

If you are not graphing rejected requests, you are blind to whether your overload controls are helping or merely hiding failure.

10. Tune the host, not just the app

At very high throughput, OS and network defaults matter.

Typical areas worth reviewing:

  • file descriptor limits
  • socket backlog
  • ephemeral port exhaustion
  • NIC offload settings
  • IRQ balancing
  • somaxconn
  • TCP TIME_WAIT behavior
  • container CPU throttling

For example, your server backlog and open file limits must align with real traffic:

Bash
ulimit -n 1048576
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="2000 65000"

Do not cargo-cult kernel flags from random blog posts. Benchmark each change on hardware and traffic patterns that resemble production.

11. Benchmark with production realism

Most “Node handled 1M RPS” demos measure the easiest possible thing:

  • tiny static payload
  • no auth
  • no cache miss
  • no database
  • one route
  • localhost networking

Those benchmarks are useful only for upper-bound mechanics. They are not proof your service design is ready.

When load testing, vary:

  • payload sizes
  • hot keys versus cold keys
  • keep-alive versus connection churn
  • success versus failure paths
  • cache-hit and cache-miss ratios
  • multi-tenant noisy-neighbor scenarios

And test failure explicitly:

  1. Kill a cache node
  2. Add 50 ms latency to a dependency
  3. Reduce one availability zone
  4. Force a deploy during peak load

If the system only survives the happy path, it does not really survive 1M RPS.

12. A reference architecture that works

For a practical high-throughput Node deployment, this is the shape I would start from:

Text
CDN
  -> Envoy / NGINX layer with TLS termination and edge rate limiting
  -> Kubernetes service or L4 balancer
  -> Node Fastify pods, one worker per core
  -> Redis/KeyDB for hot reads and counters
  -> Kafka/NATS/SQS for non-blocking async work
  -> partitioned database tier for true source-of-truth reads

Operational rules:

  • stateless app pods
  • no in-memory sessions
  • no blocking work in handlers
  • all expensive side effects through queues
  • all hot reads cache-first
  • autoscaling driven by CPU, event-loop lag, and saturation signals together

This is not glamorous architecture. That is exactly why it scales.

13. The uncomfortable truth

The trick to handling 1M+ requests per second in Node.js is not a secret V8 flag or a different promise library.

It is mostly this:

  • simplify the request path
  • drop work early
  • push work to caches and proxies
  • scale horizontally
  • instrument overload before users find it
  • keep Node away from CPU-heavy and blocking tasks

Node is very good at high-concurrency I/O. It is unforgiving when teams ask it to behave like an all-purpose compute engine under extreme load.

Respect that boundary and it will scale much further than most teams expect.


If you’re designing for traffic in this range, start by measuring where requests die today: CDN, proxy, cache, origin, database. The path to 1M RPS is usually not “make Node faster.” It is “stop spending Node on work the rest of the stack should have absorbed already.”

Rahul Gupta
share