Building an API Gateway in Go: A Thought Process Focused on Performance and Concurrency

If I had to build an API gateway in Go from scratch, I would not start with authentication, rate limiting, dashboards, or YAML parsers.

I would start with one much less glamorous question:

What is the hottest path in the system, and how do I stop myself from making it slow?

That is the right starting point because an API gateway sits in front of everything. Every extra allocation, every extra context switch, every accidental blocking operation, every unnecessary network hop multiplies across all requests.

So this post is not "how to build a full gateway product in 24 hours." It is the thought process I would follow if the main requirement was:

Make it stupidly fast, concurrency-safe, and boring to operate.

Go is a very strong fit for this kind of system, but only if you design around the right constraints.

1. First principle: the gateway is on the critical path

A gateway is not a side service. It is the front door.

That means every request usually goes through:

connection accept
request parse
route match
policy checks
upstream forwarding
response write

Even if each step adds only a little overhead, the total cost becomes painful very quickly under load.

That is why my default rule would be:

keep the request path short
keep memory churn low
avoid blocking work in handlers
push non-essential work out of band

A gateway should not behave like an application server with twenty layers of business logic. It should behave like a traffic engine.

2. Why Go is such a strong choice here

Go gives you a set of properties that map very well to gateways:

fast networking primitives
cheap goroutines
good scheduler behavior
strong standard library support
simple concurrency model
solid observability ecosystem

You can absolutely build a gateway in other languages. But Go hits a very practical sweet spot:

easier to operate than a JVM-heavy stack
more concurrency-friendly than a lot of scripting runtimes
simpler memory model than C/C++

Most importantly, Go makes it straightforward to write code that is both high-throughput and understandable by regular backend engineers.

That matters more than people admit.

3. Before writing code, define what the gateway is responsible for

This is where teams get into trouble.

They start with "build an API gateway" and then keep stuffing features into it until it becomes a distributed monolith at the edge.

I would define a strict boundary early.

Core responsibilities:

routing
load balancing
retries and timeouts
TLS termination
auth hooks
rate limiting hooks
observability

Danger zone responsibilities:

business-rule orchestration
heavy payload transformations
report generation
custom per-tenant workflow engines
synchronous calls to five different side systems on every request

The more domain logic you pack into the gateway, the harder it becomes to keep latency predictable.

So the first design decision is not technical. It is architectural discipline.

4. The hot path should be mostly lock-free reads

One of the biggest design choices is how request-time configuration is accessed.

A gateway needs fast access to:

route tables
upstream cluster definitions
auth policy settings
timeout and retry policies
rate-limiting config

If every request has to acquire coarse locks to read this data, throughput will collapse under concurrency.

The ideal model is:

configuration updates are relatively infrequent
request reads are extremely frequent

So optimize for reads.

In Go, that usually means immutable snapshots plus atomic swap patterns, instead of constantly mutating shared maps under contention.

Example idea:

type GatewayConfig struct {
    Routes   []Route
    Clusters map[string]Cluster
}

var activeConfig atomic.Pointer[GatewayConfig]

func GetConfig() *GatewayConfig {
    return activeConfig.Load()
}

func UpdateConfig(cfg *GatewayConfig) {
    activeConfig.Store(cfg)
}

That pattern is powerful because request handlers can read config without grabbing a big global lock on every request.

5. Route matching has to be fast, not just flexible

If your route lookup is slow, everything built on top of it is already compromised.

This is why I would think very carefully before supporting every fancy matching pattern on day one.

Start with the route types that matter most:

exact match
prefix match
host-based match
method-based match

And use data structures that keep lookup cheap.

For example:

hash maps for exact routes
trie-like structures for prefixes if the rule set is large
precompiled match trees instead of evaluating dynamic rule chains every time

A common mistake is storing routes in a slice and scanning them linearly on every request because "we only have 40 routes right now." That becomes technical debt the first time the config becomes large or tenant-specific.

6. Upstream proxying is where performance wins or dies

Most gateway work is not just deciding where traffic should go. It is forwarding traffic efficiently.

This means transport configuration matters a lot.

In Go, I would spend real attention on:

connection pooling
keep-alive reuse
max idle conns
per-host pool tuning
dial timeouts
header handling
streaming behavior

A basic http.Transport setup can make a huge difference:

transport := &http.Transport{
    MaxIdleConns:        10000,
    MaxIdleConnsPerHost: 1000,
    MaxConnsPerHost:     0,
    IdleConnTimeout:     90 * time.Second,
    DisableCompression:  false,
    ForceAttemptHTTP2:   true,
}

client := &http.Client{
    Transport: transport,
    Timeout:   3 * time.Second,
}

The exact numbers depend on your traffic shape, but the bigger idea is simple:

Do not let the gateway create needless connection churn.

If every forwarded request causes new upstream connections, you are burning performance for no good reason.

7. Goroutines are cheap, but not free

This is where Go gets misused.

People learn that goroutines are lightweight and conclude they can spawn them casually inside the request path for every side task.

Bad instinct.

In a gateway, goroutines should be used intentionally:

one per request is normal through the server model
async workers for background tasks are normal
bounded pools for expensive side jobs may be useful

But I would avoid spraying extra goroutines in hot handlers unless there is a clear need.

Why?

Because concurrency does not automatically mean speed.

Too much uncontrolled concurrency creates:

scheduler pressure
higher memory usage
harder cancellation handling
more complicated debugging

The correct question is not "Can this run concurrently?" It is "Should this be concurrent on the request path at all?"

8. Timeouts, retries, and cancellation are first-class features

Gateways live in front of unreliable upstreams. So handling failure well is not optional.

I would consider these part of the first real version:

per-route timeout
connect timeout
upstream response timeout
retry budget
context propagation

In Go, context.Context should be part of the design from the start:

ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()

req = req.WithContext(ctx)
resp, err := client.Do(req)

If an upstream hangs, the gateway should stop waiting. If the client disconnects, the gateway should stop work. If retries happen, they should be bounded and policy-driven.

Otherwise the system slowly turns into a resource leak generator under failure.

9. Rate limiting and auth should not poison the request path

Yes, gateways often need:

JWT validation
API key checks
quota enforcement
tenant policy enforcement

But these must be designed with hot-path cost in mind.

For example:

prefer local validation where possible
cache public keys carefully
avoid remote auth calls on every request unless absolutely necessary
keep rate-limiter lookups efficient and bounded

If the auth service becomes a blocking dependency for every request, you have simply moved the bottleneck.

The same applies to rate limiting. If every request needs multiple slow external checks, the gateway will collapse before the backend does.

10. Logging can destroy throughput if you are careless

This is one of the easiest self-inflicted wounds.

If the gateway writes huge synchronous logs on every request, you are paying for serialization and I/O right on the hot path.

I would keep request logging:

structured
minimal
non-blocking where possible
sampled when necessary

You do need visibility, but not at the cost of turning the gateway into a log formatting service.

Metrics matter even more here:

RPS
p50, p95, p99 latency
upstream error rate
per-route volume
open connections
active goroutines
GC pause time

For a gateway, metrics usually tell you operational truth faster than verbose logs do.

11. Memory and allocation discipline matter a lot

High concurrency means small waste gets amplified.

Things I would watch carefully:

unnecessary string copies
repeated header cloning
per-request JSON work where not needed
large temporary buffers
frequent map allocations

This is why I would avoid a design where every policy stage allocates a brand new context object with copied request metadata unless there is a very strong reason.

Sometimes the real optimization is not some clever trick. It is simply:

fewer allocations
fewer object lifetimes
fewer copies

That translates directly into lower GC pressure.

12. Config reloads must not stall traffic

Gateways need config updates:

new routes
changed upstreams
timeout policy updates
feature rollout changes

The mistake is applying config in a way that blocks live traffic or partially mutates shared state.

My preferred mental model would be:

build a new config snapshot off the hot path
validate it fully
atomically swap it in
let old requests finish on the old snapshot

This keeps traffic handling stable while config evolves.

That is much safer than "edit shared structures live and hope request handlers do not collide with updates."

13. Concurrency safety is mostly about choosing the right shared state strategy

When people say "Go is good at concurrency," they sometimes jump too quickly to mutexes and channels everywhere.

That is not the real lesson.

The real lesson is to be deliberate about shared state:

avoid sharing when possible
make hot-path state immutable
use channels for coordination when they actually fit
use locks narrowly, not as a blanket design

For a gateway, common shared state includes:

route config
connection health state
counters
limiter state
circuit breaker state

Each of these may need a different concurrency strategy.

If one giant global lock protects all of it, the system will absolutely show contention under load.

14. The best performance feature is saying no to unnecessary work

A high-performance gateway is not just one with fast code. It is one that refuses to do work it should never have taken on.

Examples:

reject malformed requests early
fail auth early
stop retrying hopeless upstreams
do not transform bodies unless required
do not buffer huge payloads unless required
offload analytics or async side effects out of band

The gateway should be ruthless about conserving compute on the critical path.

That is often more valuable than low-level micro-optimizations.

15. How I would phase the build

If I were building this seriously, I would phase it like this:

Phase 1: minimal fast core

HTTP server
route match
upstream forwarding
timeout support
basic metrics

Phase 2: production safety

retries
circuit breakers
health-aware upstream selection
structured access logs
graceful shutdown

Phase 3: policy layer

auth hooks
rate limiting
per-route policy config

Phase 4: dynamic control plane

hot config reload
distributed config sync
admin APIs
staged rollout support

This order matters.

Too many teams start with "let’s support every gateway feature" before proving the proxy core is fast and correct.

That is backwards.

16. The actual thought process

So if I compress the entire design mindset into a few lines, it would be this:

Keep the request path tiny.
Optimize for read-heavy traffic and immutable config snapshots.
Reuse connections aggressively.
Treat goroutines as a tool, not a magic performance button.
Build around cancellation, timeouts, and failure from day one.
Make observability cheap and useful.
Avoid putting domain logic into the gateway.
Add features only after the hot proxy path is already solid.

That is how you build a gateway that earns the right to sit in front of serious traffic.

Go helps a lot, but Go does not save you from bad architecture. It just gives you a runtime that rewards clear thinking around concurrency and hot-path cost.

And that is exactly what an API gateway needs.

If I had to summarize the whole thing brutally: building an API gateway in Go is not about showing off goroutines. It is about protecting the request path from everything that does not deserve to be on it. Once you internalize that, most of the design decisions become much clearer.