Docker Ate My Disk: Fixing Log Rotation Before It Kills Production
How a single verbose container filled a 500GB disk in 72 hours, and the exact daemon.json config that stops it from ever happening again.
3am, Disk Full, Everything Down
The alert came in at 3:17am: disk utilization at 100%, multiple services failing. SSH into the host:
df -h
# Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 500G 500G 0 100% /
Find the culprit:
du -sh /* 2>/dev/null | sort -rh | head -10
# 487G /var
du -sh /var/* 2>/dev/null | sort -rh | head -5
# 487G /var/lib/docker
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -5
# 484G /var/lib/docker/containers
There it was. Docker container logs โ unrotated, unbounded, growing forever.
ls -lah /var/lib/docker/containers/a3f9b1c.../
# -rw-r----- 1 root root 484G Nov 3 03:17 a3f9b1c...-json.log
484 gigabytes. One log file. One container. 72 hours of verbose output with no rotation configured.
The Emergency Fix
You can't just rm the file while Docker holds it open โ the inode stays allocated. The right move:
# Truncate the file (Docker keeps the handle, disk space is freed immediately)
truncate -s 0 /var/lib/docker/containers/<container-id>/<container-id>-json.log
Services came back up within seconds. But this was a symptom, not the problem.
Why This Happens
By default, Docker's json-file logging driver has no size limit and no rotation. Every byte your container writes to stdout/stderr goes into that file and stays there forever. On a verbose app, that's a disaster.
The dangerous default:
{
"log-driver": "json-file"
}
That's it. No max-size. No max-file. No expiry. Pure chaos at scale.
The Permanent Fix: daemon.json
Edit /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "50m",
"max-file": "5",
"compress": "true"
}
}
This sets a global default for all containers on the host:
max-size: 50mโ rotate when the log hits 50MBmax-file: 5โ keep 5 rotated files (250MB max total per container)compress: trueโ gzip rotated files to save space
Restart Docker to apply:
systemctl restart docker
Warning: This restarts all running containers. Do this during a maintenance window or roll it out carefully.
Per-Container Override in Compose
For containers that need different limits, set it per-service in docker-compose.yml:
services:
app:
image: my-app:latest
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
compress: "true"
debug-service:
image: my-debug:latest
logging:
driver: json-file
options:
max-size: "200m" # verbose in dev, generous limit
max-file: "3"
Monitoring Disk Usage by Container
Add this to your monitoring toolkit:
#!/bin/bash
# check-docker-logs.sh
# Alerts when any container log exceeds threshold
THRESHOLD_GB=5
find /var/lib/docker/containers -name '*-json.log' | while read logfile; do
size_gb=$(du -BG "$logfile" | awk '{print $1}' | tr -d 'G')
container_id=$(echo "$logfile" | cut -d'/' -f6 | cut -c1-12)
if [ "$size_gb" -gt "$THRESHOLD_GB" ]; then
echo "ALERT: Container $container_id log is ${size_gb}GB"
docker inspect --format='{{.Name}}' "$container_id" 2>/dev/null
fi
done
Run it from cron every 30 minutes until you have proper observability set up.
Consider a Centralized Logging Driver
For production, json-file with rotation is a band-aid. The real solution is shipping logs somewhere:
{
"log-driver": "fluentd",
"log-opts": {
"fluentd-address": "localhost:24224",
"fluentd-async": "true",
"tag": "docker.{{.Name}}"
}
}
Or use the loki driver if you're in the Grafana ecosystem. Central log aggregation means:
- No local disk pressure from logs
- Queryable log history across all containers
- Retention policies enforced centrally
Quick Reference
| Problem | Fix |
|---|---|
| Log file too large right now | truncate -s 0 /path/to/container.log |
| Global log limits | Edit /etc/docker/daemon.json |
| Per-container limits | Use logging: block in compose |
| Ongoing monitoring | Script + cron or Prometheus node exporter |
Don't wait for 3am to learn this one.
Centralized Log Collection: The Right Long-Term Fix
json-file with rotation is a band-aid. For production Docker fleets, you want logs shipped off the host entirely โ no local disk pressure, queryable history, retention policies enforced centrally.
Option 1: Fluentd / Fluent Bit
{
"log-driver": "fluentd",
"log-opts": {
"fluentd-address": "localhost:24224",
"fluentd-async": "true",
"tag": "docker.{{.Name}}"
}
}
Fluent Bit is lighter than Fluentd โ use it as a DaemonSet on Kubernetes or as a sidecar on each Docker host.
Option 2: Grafana Loki
{
"log-driver": "loki",
"log-opts": {
"loki-url": "http://loki:3100/loki/api/v1/push",
"loki-external-labels": "host={{.Host}},container={{.Name}}"
}
}
Requires the Loki Docker driver plugin:
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
Option 3: AWS CloudWatch (for ECS/EC2)
{
"log-driver": "awslogs",
"log-opts": {
"awslogs-group": "/docker/myapp",
"awslogs-region": "us-east-1",
"awslogs-stream": "{{.Name}}"
}
}
Monitoring Log Size in Production
Add this to your monitoring regardless of which driver you use:
#!/bin/bash
# /usr/local/bin/docker-log-monitor.sh
THRESHOLD_GB=2
docker ps -q | while read cid; do
NAME=$(docker inspect --format='{{.Name}}' "$cid" | tr -d '/')
LOG_PATH=$(docker inspect --format='{{.LogPath}}' "$cid")
if [ -f "$LOG_PATH" ]; then
SIZE_MB=$(du -m "$LOG_PATH" | cut -f1)
if [ "$SIZE_MB" -gt $((THRESHOLD_GB * 1024)) ]; then
echo "ALERT: Container $NAME log is ${SIZE_MB}MB"
fi
fi
done
Run from cron every 15 minutes. Alert before the disk fills โ not after.
FAQ
Does max-size apply to existing containers?
No. Log driver options apply at container creation time. You must recreate existing containers for the new limits to take effect. For docker-compose, run docker-compose down && docker-compose up -d after updating logging: config.
What happens if I truncate a log file while Docker is running?
truncate -s 0 /path/to/container.log frees disk space immediately โ Docker keeps the file handle open and continues writing. The next write goes to offset 0. This is safe as an emergency fix. The permanent fix is rotation configuration.
Can I set different log limits for different services in Compose?
Yes โ per-service logging: blocks override the global daemon.json defaults. See the "Per-Container Override in Compose" section above.
What is the difference between json-file and local log driver?
local is a newer driver that uses a more efficient binary format and automatically compresses rotated files. It has less tooling support (you cannot docker logs rotated files). Use json-file with rotation for most cases โ it is more portable and compatible with log shipping tools.
Related reading: Linux Log Analysis: How to Debug Issues Like a Senior Engineer โ once logs are centralized, how to query them effectively. Linux Debugging Tools Every Engineer Should Know โ broader debugging toolkit.