Docker Ate My Disk: Fixing Log Rotation Before It Kills Production

How a single verbose container filled a 500GB disk in 72 hours, and the exact daemon.json config that stops it from ever happening again.

October 3, 2024·4 min read·Damon

3am, Disk Full, Everything Down

The alert came in at 3:17am: disk utilization at 100%, multiple services failing. SSH into the host:

df -h
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/sda1       500G  500G     0 100% /

Find the culprit:

du -sh /* 2>/dev/null | sort -rh | head -10
# 487G  /var
du -sh /var/* 2>/dev/null | sort -rh | head -5
# 487G  /var/lib/docker
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -5
# 484G  /var/lib/docker/containers

There it was. Docker container logs — unrotated, unbounded, growing forever.

ls -lah /var/lib/docker/containers/a3f9b1c.../
# -rw-r----- 1 root root 484G Nov 3 03:17 a3f9b1c...-json.log

484 gigabytes. One log file. One container. 72 hours of verbose output with no rotation configured.

The Emergency Fix

You can't just rm the file while Docker holds it open — the inode stays allocated. The right move:

# Truncate the file (Docker keeps the handle, disk space is freed immediately)
truncate -s 0 /var/lib/docker/containers/<container-id>/<container-id>-json.log

Services came back up within seconds. But this was a symptom, not the problem.

Why This Happens

By default, Docker's json-file logging driver has no size limit and no rotation. Every byte your container writes to stdout/stderr goes into that file and stays there forever. On a verbose app, that's a disaster.

The dangerous default:

{
  "log-driver": "json-file"
}

That's it. No max-size. No max-file. No expiry. Pure chaos at scale.

The Permanent Fix: daemon.json

Edit /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "5",
    "compress": "true"
  }
}

This sets a global default for all containers on the host:

  • max-size: 50m — rotate when the log hits 50MB
  • max-file: 5 — keep 5 rotated files (250MB max total per container)
  • compress: true — gzip rotated files to save space

Restart Docker to apply:

systemctl restart docker

Warning: This restarts all running containers. Do this during a maintenance window or roll it out carefully.

Per-Container Override in Compose

For containers that need different limits, set it per-service in docker-compose.yml:

services:
  app:
    image: my-app:latest
    logging:
      driver: json-file
      options:
        max-size: "100m"
        max-file: "10"
        compress: "true"

  debug-service:
    image: my-debug:latest
    logging:
      driver: json-file
      options:
        max-size: "200m"   # verbose in dev, generous limit
        max-file: "3"

Monitoring Disk Usage by Container

Add this to your monitoring toolkit:

#!/bin/bash
# check-docker-logs.sh
# Alerts when any container log exceeds threshold

THRESHOLD_GB=5

find /var/lib/docker/containers -name '*-json.log' | while read logfile; do
  size_gb=$(du -BG "$logfile" | awk '{print $1}' | tr -d 'G')
  container_id=$(echo "$logfile" | cut -d'/' -f6 | cut -c1-12)
  
  if [ "$size_gb" -gt "$THRESHOLD_GB" ]; then
    echo "ALERT: Container $container_id log is ${size_gb}GB"
    docker inspect --format='{{.Name}}' "$container_id" 2>/dev/null
  fi
done

Run it from cron every 30 minutes until you have proper observability set up.

Consider a Centralized Logging Driver

For production, json-file with rotation is a band-aid. The real solution is shipping logs somewhere:

{
  "log-driver": "fluentd",
  "log-opts": {
    "fluentd-address": "localhost:24224",
    "fluentd-async": "true",
    "tag": "docker.{{.Name}}"
  }
}

Or use the loki driver if you're in the Grafana ecosystem. Central log aggregation means:

  • No local disk pressure from logs
  • Queryable log history across all containers
  • Retention policies enforced centrally

Quick Reference

Problem Fix
Log file too large right now truncate -s 0 /path/to/container.log
Global log limits Edit /etc/docker/daemon.json
Per-container limits Use logging: block in compose
Ongoing monitoring Script + cron or Prometheus node exporter

Don't wait for 3am to learn this one.