Nginx 502 Bad Gateway Fix: Root Causes and Solutions

Nginx returns 502 when it can't get a valid response from the upstream. The fix depends on the exact error — there are four different root causes and each needs a different solution.

TL;DR: Read the Error Log First

tail -f /var/log/nginx/error.log | grep -i "upstream\|connect"

Error in log	Cause	Fix
`connect() failed (111: Connection refused)`	Upstream not running or port exhaustion	Start app or add keepalive
`upstream timed out (110)`	Upstream too slow	Tune `proxy_read_timeout`
`no live upstreams while connecting`	All upstream servers failed	Check all backend health
`upstream sent invalid header`	App crashed mid-response	Check app logs

Fix 1: Upstream Not Running

# Is the app running?
systemctl status myapp
ss -tlnp | grep :8080

# Not running → start it
systemctl start myapp

# Test directly
curl -sf http://127.0.0.1:8080/health && echo "OK"

Fix 2: Port Exhaustion (Most Common Under Load)

502 errors that only appear under high traffic are almost always port exhaustion caused by missing upstream keepalive.

# Check TIME-WAIT count while under load
watch -n 0.5 'ss -s | grep time-wait'
# If climbing above 5000: this is your problem

Fix — add keepalive to your nginx upstream block:

upstream app_backend {
    server 127.0.0.1:8080;
    keepalive 64;               # idle keepalive connections per worker
    keepalive_requests 1000;
    keepalive_timeout 75s;
}

server {
    location / {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;         # REQUIRED for keepalive
        proxy_set_header Connection ""; # REQUIRED: clear Connection: close
    }
}

Both proxy_http_version 1.1 and the cleared Connection header are mandatory. Without them, keepalive is configured but silently ignored.

Fix 3: Upstream Timeout

tail /var/log/nginx/error.log | grep "timed out"
# upstream timed out (110: Connection timed out) while reading response header

The app is alive but slow. Fix by tuning timeouts:

location /api/ {
    proxy_pass http://app_backend;
    proxy_connect_timeout  5s;   # time to connect to upstream
    proxy_read_timeout    60s;   # time to wait for response
    proxy_send_timeout    30s;   # time to send request
}

Also investigate why the app is slow — check app logs, database query times, and downstream dependencies.

Fix 4: Worker Connection Limit

# Error in nginx log:
# worker_connections are not enough while connecting to upstream

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;    # increase from default 1024
    use epoll;
    multi_accept on;
}

Full Diagnostic Workflow

# 1. Read exact error
tail -f /var/log/nginx/error.log

# 2. Is upstream alive?
curl -sf http://127.0.0.1:8080/health

# 3. Are there TIME-WAIT sockets?
ss -s | grep time-wait

# 4. Check nginx limits
nginx -T | grep -E "worker_connections|worker_processes"

# 5. Test config after changes
nginx -t && systemctl reload nginx

Real Example: 0.3% Error Rate Under Load

# Error log during load test:
connect() failed (111: Connection refused) while connecting to upstream,
client: 10.0.1.42, upstream: "http://127.0.0.1:8080/api/v2/events"

The upstream (Node.js) was running and healthy. Direct curl worked fine. But under 200 req/s, 0.3% returned 502.

watch -n 0.5 'ss -s'
# TCP: TIME-WAIT 14823 and climbing

Root cause: no keepalive in upstream block. Nginx opened a new TCP connection for every request. At 200 req/s, TIME-WAIT sockets exhausted the ephemeral port range.

Fix: added keepalive 64 to upstream block with proxy_http_version 1.1. TIME-WAIT dropped to under 50. 502 errors disappeared.

Common Mistakes

Restarting nginx without reading the error log. The error message tells you exactly what's wrong. Read it before changing anything.

Adding keepalive without proxy_http_version 1.1. This is the most common mistake. The keepalive directive does nothing without HTTP/1.1.

Setting proxy_read_timeout too low. Default is 60s. If your app has legitimate slow requests (reports, exports), dropping it to 5s creates 502s that weren't there before.

Not testing upstream health directly. Always run curl http://127.0.0.1:<port>/health from the nginx server before blaming nginx.

Conclusion

Read the exact error from /var/log/nginx/error.log first. Connection refused under load = missing keepalive. Timed out = slow upstream or wrong timeout value. After any nginx config change, always run nginx -t before systemctl reload.

Related: NGINX 502 Bad Gateway Under Load: Full Guide — complete deep dive including TIME-WAIT diagnosis. NGINX Upstream Keepalive Explained — why keepalive eliminates 502s.