Skip to content
back to writing
9 min readgolang · api-gateway · performance

Building an API Gateway in Go: A Thought Process Focused on Performance and Concurrency

A practical walkthrough of how to think about building an API gateway in Go when throughput, latency, concurrency, and operational simplicity matter more than framework fashion.

RG
Rahul Gupta
Senior Software Engineer
share

If I had to build an API gateway in Go from scratch, I would not start with authentication, rate limiting, dashboards, or YAML parsers.

I would start with one much less glamorous question:

What is the hottest path in the system, and how do I stop myself from making it slow?

That is the right starting point because an API gateway sits in front of everything. Every extra allocation, every extra context switch, every accidental blocking operation, every unnecessary network hop multiplies across all requests.

So this post is not “how to build a full gateway product in 24 hours.” It is the thought process I would follow if the main requirement was:

Make it stupidly fast, concurrency-safe, and boring to operate.

Go is a very strong fit for this kind of system, but only if you design around the right constraints.

1. First principle: the gateway is on the critical path

A gateway is not a side service. It is the front door.

That means every request usually goes through:

  1. connection accept
  2. request parse
  3. route match
  4. policy checks
  5. upstream forwarding
  6. response write

Even if each step adds only a little overhead, the total cost becomes painful very quickly under load.

That is why my default rule would be:

  • keep the request path short
  • keep memory churn low
  • avoid blocking work in handlers
  • push non-essential work out of band

A gateway should not behave like an application server with twenty layers of business logic. It should behave like a traffic engine.

2. Why Go is such a strong choice here

Go gives you a set of properties that map very well to gateways:

  • fast networking primitives
  • cheap goroutines
  • good scheduler behavior
  • strong standard library support
  • simple concurrency model
  • solid observability ecosystem

You can absolutely build a gateway in other languages. But Go hits a very practical sweet spot:

  • easier to operate than a JVM-heavy stack
  • more concurrency-friendly than a lot of scripting runtimes
  • simpler memory model than C/C++

Most importantly, Go makes it straightforward to write code that is both high-throughput and understandable by regular backend engineers.

That matters more than people admit.

3. Before writing code, define what the gateway is responsible for

This is where teams get into trouble.

They start with “build an API gateway” and then keep stuffing features into it until it becomes a distributed monolith at the edge.

I would define a strict boundary early.

Core responsibilities:

  • routing
  • load balancing
  • retries and timeouts
  • TLS termination
  • auth hooks
  • rate limiting hooks
  • observability

Danger zone responsibilities:

  • business-rule orchestration
  • heavy payload transformations
  • report generation
  • custom per-tenant workflow engines
  • synchronous calls to five different side systems on every request

The more domain logic you pack into the gateway, the harder it becomes to keep latency predictable.

So the first design decision is not technical. It is architectural discipline.

4. The hot path should be mostly lock-free reads

One of the biggest design choices is how request-time configuration is accessed.

A gateway needs fast access to:

  • route tables
  • upstream cluster definitions
  • auth policy settings
  • timeout and retry policies
  • rate-limiting config

If every request has to acquire coarse locks to read this data, throughput will collapse under concurrency.

The ideal model is:

  • configuration updates are relatively infrequent
  • request reads are extremely frequent

So optimize for reads.

In Go, that usually means immutable snapshots plus atomic swap patterns, instead of constantly mutating shared maps under contention.

Example idea:

Go
type GatewayConfig struct {
    Routes   []Route
    Clusters map[string]Cluster
}
 
var activeConfig atomic.Pointer[GatewayConfig]
 
func GetConfig() *GatewayConfig {
    return activeConfig.Load()
}
 
func UpdateConfig(cfg *GatewayConfig) {
    activeConfig.Store(cfg)
}

That pattern is powerful because request handlers can read config without grabbing a big global lock on every request.

5. Route matching has to be fast, not just flexible

If your route lookup is slow, everything built on top of it is already compromised.

This is why I would think very carefully before supporting every fancy matching pattern on day one.

Start with the route types that matter most:

  • exact match
  • prefix match
  • host-based match
  • method-based match

And use data structures that keep lookup cheap.

For example:

  • hash maps for exact routes
  • trie-like structures for prefixes if the rule set is large
  • precompiled match trees instead of evaluating dynamic rule chains every time

A common mistake is storing routes in a slice and scanning them linearly on every request because “we only have 40 routes right now.” That becomes technical debt the first time the config becomes large or tenant-specific.

6. Upstream proxying is where performance wins or dies

Most gateway work is not just deciding where traffic should go. It is forwarding traffic efficiently.

This means transport configuration matters a lot.

In Go, I would spend real attention on:

  • connection pooling
  • keep-alive reuse
  • max idle conns
  • per-host pool tuning
  • dial timeouts
  • header handling
  • streaming behavior

A basic http.Transport setup can make a huge difference:

Go
transport := &http.Transport{
    MaxIdleConns:        10000,
    MaxIdleConnsPerHost: 1000,
    MaxConnsPerHost:     0,
    IdleConnTimeout:     90 * time.Second,
    DisableCompression:  false,
    ForceAttemptHTTP2:   true,
}
 
client := &http.Client{
    Transport: transport,
    Timeout:   3 * time.Second,
}

The exact numbers depend on your traffic shape, but the bigger idea is simple:

Do not let the gateway create needless connection churn.

If every forwarded request causes new upstream connections, you are burning performance for no good reason.

7. Goroutines are cheap, but not free

This is where Go gets misused.

People learn that goroutines are lightweight and conclude they can spawn them casually inside the request path for every side task.

Bad instinct.

In a gateway, goroutines should be used intentionally:

  • one per request is normal through the server model
  • async workers for background tasks are normal
  • bounded pools for expensive side jobs may be useful

But I would avoid spraying extra goroutines in hot handlers unless there is a clear need.

Why?

Because concurrency does not automatically mean speed.

Too much uncontrolled concurrency creates:

  • scheduler pressure
  • higher memory usage
  • harder cancellation handling
  • more complicated debugging

The correct question is not “Can this run concurrently?” It is “Should this be concurrent on the request path at all?“

8. Timeouts, retries, and cancellation are first-class features

Gateways live in front of unreliable upstreams. So handling failure well is not optional.

I would consider these part of the first real version:

  • per-route timeout
  • connect timeout
  • upstream response timeout
  • retry budget
  • context propagation

In Go, context.Context should be part of the design from the start:

Go
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
 
req = req.WithContext(ctx)
resp, err := client.Do(req)

If an upstream hangs, the gateway should stop waiting. If the client disconnects, the gateway should stop work. If retries happen, they should be bounded and policy-driven.

Otherwise the system slowly turns into a resource leak generator under failure.

9. Rate limiting and auth should not poison the request path

Yes, gateways often need:

  • JWT validation
  • API key checks
  • quota enforcement
  • tenant policy enforcement

But these must be designed with hot-path cost in mind.

For example:

  • prefer local validation where possible
  • cache public keys carefully
  • avoid remote auth calls on every request unless absolutely necessary
  • keep rate-limiter lookups efficient and bounded

If the auth service becomes a blocking dependency for every request, you have simply moved the bottleneck.

The same applies to rate limiting. If every request needs multiple slow external checks, the gateway will collapse before the backend does.

10. Logging can destroy throughput if you are careless

This is one of the easiest self-inflicted wounds.

If the gateway writes huge synchronous logs on every request, you are paying for serialization and I/O right on the hot path.

I would keep request logging:

  • structured
  • minimal
  • non-blocking where possible
  • sampled when necessary

You do need visibility, but not at the cost of turning the gateway into a log formatting service.

Metrics matter even more here:

  • RPS
  • p50, p95, p99 latency
  • upstream error rate
  • per-route volume
  • open connections
  • active goroutines
  • GC pause time

For a gateway, metrics usually tell you operational truth faster than verbose logs do.

11. Memory and allocation discipline matter a lot

High concurrency means small waste gets amplified.

Things I would watch carefully:

  • unnecessary string copies
  • repeated header cloning
  • per-request JSON work where not needed
  • large temporary buffers
  • frequent map allocations

This is why I would avoid a design where every policy stage allocates a brand new context object with copied request metadata unless there is a very strong reason.

Sometimes the real optimization is not some clever trick. It is simply:

  • fewer allocations
  • fewer object lifetimes
  • fewer copies

That translates directly into lower GC pressure.

12. Config reloads must not stall traffic

Gateways need config updates:

  • new routes
  • changed upstreams
  • timeout policy updates
  • feature rollout changes

The mistake is applying config in a way that blocks live traffic or partially mutates shared state.

My preferred mental model would be:

  1. build a new config snapshot off the hot path
  2. validate it fully
  3. atomically swap it in
  4. let old requests finish on the old snapshot

This keeps traffic handling stable while config evolves.

That is much safer than “edit shared structures live and hope request handlers do not collide with updates.”

13. Concurrency safety is mostly about choosing the right shared state strategy

When people say “Go is good at concurrency,” they sometimes jump too quickly to mutexes and channels everywhere.

That is not the real lesson.

The real lesson is to be deliberate about shared state:

  • avoid sharing when possible
  • make hot-path state immutable
  • use channels for coordination when they actually fit
  • use locks narrowly, not as a blanket design

For a gateway, common shared state includes:

  • route config
  • connection health state
  • counters
  • limiter state
  • circuit breaker state

Each of these may need a different concurrency strategy.

If one giant global lock protects all of it, the system will absolutely show contention under load.

14. The best performance feature is saying no to unnecessary work

A high-performance gateway is not just one with fast code. It is one that refuses to do work it should never have taken on.

Examples:

  • reject malformed requests early
  • fail auth early
  • stop retrying hopeless upstreams
  • do not transform bodies unless required
  • do not buffer huge payloads unless required
  • offload analytics or async side effects out of band

The gateway should be ruthless about conserving compute on the critical path.

That is often more valuable than low-level micro-optimizations.

15. How I would phase the build

If I were building this seriously, I would phase it like this:

Phase 1: minimal fast core

  • HTTP server
  • route match
  • upstream forwarding
  • timeout support
  • basic metrics

Phase 2: production safety

  • retries
  • circuit breakers
  • health-aware upstream selection
  • structured access logs
  • graceful shutdown

Phase 3: policy layer

  • auth hooks
  • rate limiting
  • per-route policy config

Phase 4: dynamic control plane

  • hot config reload
  • distributed config sync
  • admin APIs
  • staged rollout support

This order matters.

Too many teams start with “let’s support every gateway feature” before proving the proxy core is fast and correct.

That is backwards.

16. The actual thought process

So if I compress the entire design mindset into a few lines, it would be this:

  1. Keep the request path tiny.
  2. Optimize for read-heavy traffic and immutable config snapshots.
  3. Reuse connections aggressively.
  4. Treat goroutines as a tool, not a magic performance button.
  5. Build around cancellation, timeouts, and failure from day one.
  6. Make observability cheap and useful.
  7. Avoid putting domain logic into the gateway.
  8. Add features only after the hot proxy path is already solid.

That is how you build a gateway that earns the right to sit in front of serious traffic.

Go helps a lot, but Go does not save you from bad architecture. It just gives you a runtime that rewards clear thinking around concurrency and hot-path cost.

And that is exactly what an API gateway needs.


If I had to summarize the whole thing brutally: building an API gateway in Go is not about showing off goroutines. It is about protecting the request path from everything that does not deserve to be on it. Once you internalize that, most of the design decisions become much clearer.

Rahul Gupta
share