top Command Linux: Real-World Guide to CPU and Process Monitoring
Learn how to use the top command on Linux for production troubleshooting — reading CPU, load average, memory metrics, interpreting process states, and diagnosing high CPU and stuck processes in real incidents.
TL;DR
topshows a live, auto-refreshing view of CPU, memory, and running processes- Load average (top-right) is the most important number to understand — it tells you if the system is overloaded
%CPUin top is a recent interval average — more accurate thanpsfor current spikes- Press
1inside top to see per-core CPU breakdown - Press
Mto sort by memory,Pto sort by CPU (default) Dstate processes in top indicate I/O problems —kill -9will not help- Use
topto watch behavior over time; usepsfor snapshots you can grep - For interactive process management at scale, consider
htop— easier navigation, better visuals
Introduction: When top Is the Right Tool
A service is degraded. CPU is high. Something is consuming resources and you do not know what. You SSH in and run top.
The top command on Linux gives you an auto-refreshing view of system-wide CPU and memory usage, plus a live process list sorted by resource consumption. It is the closest thing to a real-time health dashboard available from a terminal.
top answers a different question than ps. Where ps gives you a snapshot — a fixed capture of process state — top shows you what is happening right now and how it changes over time. That distinction determines which tool to reach for first.
This guide covers how engineers actually use top during production incidents: reading the header correctly, sorting processes, interpreting CPU and memory metrics, and diagnosing the problems that show up most often on real servers.
Authority Note
I have used top during incidents involving runaway Java GC threads, NFS-induced D-state storms, and multi-tenant servers where one process quietly consumed 80% of CPU for hours before anyone noticed. The interpretation tricks in this guide come from those incidents, not from documentation.
What Is the top Command in Linux?
top (table of processes) reads /proc continuously and updates its display every 3 seconds by default. It shows two sections:
- System summary — CPU, memory, load average, uptime, task counts
- Process list — sorted by CPU by default, showing per-process resource usage
# Launch top
top
# Launch and immediately sort by memory
top -o %MEM
# Monitor a specific PID
top -p 1234
# Monitor multiple PIDs
top -p 1234,5678,9012
# Run in batch mode (for scripting)
top -b -n 1 | head -30
Unlike ps, top is interactive. You can sort, filter, kill, and renice processes without leaving the view.
How to Use top to Check CPU Usage
Reading the CPU line
The first thing to look at after opening top is the CPU line:
%Cpu(s): 12.3 us, 2.1 sy, 0.0 ni, 84.1 id, 1.2 wa, 0.0 hi, 0.3 si, 0.0 st
| Field | Meaning | What triggers it |
|---|---|---|
us |
User space CPU | Your application code |
sy |
Kernel/system CPU | System calls, I/O operations |
ni |
Niced processes | Processes with adjusted priority |
id |
Idle | How much CPU is free |
wa |
I/O wait | Waiting on disk, NFS, network I/O |
hi |
Hardware interrupts | Network cards, disks generating interrupts |
si |
Software interrupts | Kernel softirq processing |
st |
Steal time | VM hypervisor taking CPU from this VM |
Reading this in production:
- High
us→ your application is using CPU. Find which process. - High
wa→ I/O bottleneck. Check disk, NFS, network. Process count inDstate will be high. - High
sy→ lots of system calls. Could be excessive file I/O, context switching, or network traffic. - High
st→ your VM's hypervisor is stealing CPU. This is infrastructure-level, not application-level. - Low
id→ system is under load. How low depends on your workload.
See per-core CPU (the most useful key in top)
By default, top shows aggregate CPU across all cores. Press 1 to toggle per-core view:
%Cpu0 : 95.0 us, 3.0 sy, 0.0 ni, 2.0 id, 0.0 wa
%Cpu1 : 2.1 us, 1.5 sy, 0.0 ni, 96.4 id, 0.0 wa
%Cpu2 : 1.8 us, 1.2 sy, 0.0 ni, 97.0 id, 0.0 wa
%Cpu3 : 2.0 us, 1.1 sy, 0.0 ni, 96.9 id, 0.0 wa
This matters: A single-threaded process can peg one core at 100% while the system aggregate shows only 25% on a 4-core server. Without per-core view, you might not notice the problem.
Understanding Key Metrics in top
Load Average
load average: 2.41, 1.87, 1.23
Three numbers: 1-minute, 5-minute, 15-minute averages.
Load average represents the average number of processes either running or waiting to run. On a single-core system, load average of 1.0 means the CPU is exactly at capacity. On a 4-core system, load average of 4.0 means all cores are fully occupied.
How to interpret it:
Number of CPU cores: 4
Load 0.8 → 20% utilization — healthy
Load 4.0 → 100% utilization — at capacity
Load 6.0 → 150% utilization — overloaded, processes queuing
Load 12.0 → 300% utilization — severely overloaded
# Check core count
nproc
grep -c processor /proc/cpuinfo
Reading the trend:
- Load rising (1-min > 15-min): system is getting more stressed
- Load falling (1-min < 15-min): system is recovering
- Load stable high: sustained overload — not a spike
I/O wait and load average: D-state processes (waiting on I/O) count toward load average even though they are not consuming CPU. A system with load 8 on 4 cores might have 4 processes running and 4 stuck on I/O — not a CPU problem, a storage problem.
Memory section
MiB Mem : 15258.9 total, 8124.3 free, 4212.8 used, 2921.8 buff/cache
MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 9847.2 avail Mem
| Field | What it means |
|---|---|
total |
Physical RAM installed |
free |
Unused RAM |
used |
RAM actively in use by processes |
buff/cache |
RAM used by kernel buffers and page cache |
avail Mem |
RAM available for new processes (free + reclaimable cache) |
Critical insight: avail Mem is the number that matters, not free.
Linux deliberately uses free RAM for disk cache (buff/cache). That cache is reclaimable — the kernel releases it when a process needs RAM. A server with free: 200MB and avail Mem: 8GB is not running low on memory. A server with free: 200MB and avail Mem: 300MB is.
Swap usage:
- Any active swap usage (
used> 0) means the system has needed more RAM than is physically available at some point - Growing swap usage during an incident = memory pressure increasing
- If swap is full and
avail Memis near zero, the system will start OOM-killing processes
Process list columns
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 appuser 20 0 512000 98304 12288 R 45.2 0.6 1:23.45 myapp
5678 root 20 0 89344 12288 8192 S 0.3 0.1 0:00.12 sshd
| Column | What it means |
|---|---|
PID |
Process ID |
PR |
Scheduling priority (lower = higher priority) |
NI |
Nice value (-20 to 19, lower = higher priority) |
VIRT |
Virtual memory — same as VSZ in ps, often misleading |
RES |
Resident memory — actual physical RAM used |
SHR |
Shared memory (shared libraries, etc.) |
S |
Process state (R=running, S=sleeping, D=I/O wait, Z=zombie) |
%CPU |
CPU usage in the last interval (not lifetime average) |
%MEM |
Physical RAM percentage |
TIME+ |
Total CPU time consumed (hours:minutes:seconds.tenths) |
Key insight: %CPU in top is the last refresh interval, not lifetime. This makes it much more useful than ps for catching current spikes. A process that just started pegging CPU shows 90% in top immediately.
Interactive Keys: Using top Efficiently
You do not need to exit and re-run top to change what you see.
| Key | Action |
|---|---|
1 |
Toggle per-core CPU view |
P |
Sort by CPU (default) |
M |
Sort by memory (RES) |
T |
Sort by TIME+ (total CPU consumed) |
k |
Kill a process (prompts for PID and signal) |
r |
Renice a process |
f |
Field selector — add/remove columns |
u |
Filter by user |
o |
Filter by condition (e.g. %CPU>10) |
H |
Toggle thread view (shows individual threads) |
d |
Change refresh interval |
q |
Quit |
W |
Save current configuration |
The o filter is underused. In a busy system with many processes, you can filter to only show what you care about:
# Inside top, press 'o', then type:
%CPU>5.0 # only show processes using >5% CPU
USER=appuser # only show processes owned by appuser
Real Troubleshooting Scenarios
Scenario 1: High CPU — Finding the Culprit
Load average is 7.8 on a 4-core server. Users are reporting slowness.
top
Press P (should be default) to sort by CPU. Look at the top 3–5 processes.
PID USER %CPU %MEM COMMAND
8823 app 89.4 2.1 java
8901 app 4.2 0.3 java
1 root 0.1 0.0 systemd
PID 8823 is consuming 89% CPU. Now check if it is sustained:
# Watch this specific PID
top -p 8823
- If
%CPUstays high: sustained load — check application logs for what it is processing - If
%CPUfluctuates: bursty — check TIME+ to see how long it has been active - Press
Hinside top to see per-thread breakdown for that process
# Outside top: see what the process is actually doing
cat /proc/8823/wchan # what kernel function it is in
strace -p 8823 -c -e trace=all # summary of syscalls
For deeper process investigation beyond top, see strace, lsof, and ss: The Trio That Solves Every Mystery.
Scenario 2: High Load but Low CPU Usage
Load average is 12.0 but CPU %us is only 20% and %id is 60%. This is counterintuitive.
The answer is %wa (I/O wait):
%Cpu(s): 20.1 us, 5.2 sy, 0.0 ni, 60.0 id, 14.3 wa, 0.0 hi, 0.3 si
High %wa + high load + low CPU = I/O bottleneck. Processes are queued waiting for disk or NFS.
# Confirm: look for D-state processes in top
# Press 'o' inside top, type: S=D
# Or from command line:
ps -eo pid,stat,comm | grep ' D'
# Identify what the D-state process is waiting on
cat /proc/<pid>/wchan
# nfs_... = NFS problem
# ext4_... = local disk problem
# jbd2_... = journal I/O
This is not a CPU problem. Killing processes or adding application servers will not help. Fix the I/O source.
Scenario 3: Memory Pressure Building
avail Mem is dropping over hours. Swap usage is increasing.
top
Press M to sort by RES (resident memory). Look at the top consumers.
PID USER %CPU %MEM RES COMMAND
2341 app 2.1 28.4 4.3g java
4521 mysql 0.8 15.2 2.3g mysqld
7823 app 1.2 8.1 1.2g node
Compare RSS across processes of the same type:
# Are all java instances using similar memory?
ps aux | grep -v grep | grep java | awk '{print $2, $6}' | sort -k2 -rn
If one instance has significantly higher RSS than others of the same type, that is your leak candidate.
# Track it over time
while true; do
echo "$(date): $(ps -p 2341 -o rss= | tr -d ' ') KB"
sleep 60
done | tee /tmp/mem_track.log
For a detailed ps-based memory tracking workflow, see ps Command Linux: The Engineer's Troubleshooting Guide.
Scenario 4: Zombie Process Accumulation
In the top header line:
Tasks: 287 total, 3 running, 281 sleeping, 0 stopped, 3 zombie
Three zombies. In top's process list, sort and look for S = Z:
# Press 'o' inside top, type: S=Z
Or from outside:
ps -eo pid,ppid,stat,comm | grep ' Z'
Zombies cannot be killed — they are already dead. Find the parent and investigate why it is not reaping its children.
Common Mistakes Engineers Make with top
Mistake 1: Panicking at high memory used in the header.
The used field includes buff/cache, which the kernel reclaims freely. Always look at avail Mem — that is what is actually available for new processes. A server showing "14GB used of 16GB" with "12GB avail Mem" is not in memory trouble.
Mistake 2: Treating %CPU in top as instantaneous.
%CPU is averaged over the refresh interval (default 3 seconds). It is more current than ps's lifetime average, but it is not a real-time per-second reading. A brief 100ms spike will be diluted across the interval. For genuine spike detection, reduce the interval with d or use tools like perf and sar.
Mistake 3: Ignoring %wa (I/O wait).
Engineers focus on %us and miss that 20% %wa is causing load average of 8. High I/O wait means processes are queuing for disk or network I/O — the CPU is not the problem, the storage is. Adding CPU cores or killing application processes will not fix it.
Mistake 4: Not pressing 1 to see per-core breakdown.
On a multi-core server, aggregate CPU is often misleading. One thread pegging a single core shows as 25% on a 4-core system. That 25% might not look alarming — but that thread is completely bottlenecked and all work assigned to it is queued.
Mistake 5: Using top to kill processes in production.
The k key in top sends a signal to a process. The problem: you are working with a constantly-refreshing display, and process list position shifts as you type. It is easy to kill the wrong PID. Use kill or pkill from the command line where you can confirm the exact PID before acting.
Mistake 6: Confusing load average with CPU percentage.
Load average of 4.0 on a 4-core server does not mean 100% CPU. It means there are 4 processes on average either running or waiting (including I/O-waiting processes). CPU at 100% utilization would show id near 0 — that is a different and complementary signal.
top vs ps: When to Use Which
Both tools read /proc and report process information. The right tool depends on what question you are asking.
| Situation | Use |
|---|---|
| Need to watch CPU/memory changing in real time | top |
| Need to find which process just spiked | top (sorts by current interval) |
| Need to grep, pipe, or script against process list | ps |
| Need to see full command line with arguments | ps -ef |
| Need to trace parent-child relationships (PPID) | ps -ef |
| Need to capture process state at an exact moment | ps |
| Need to kill or renice interactively | top (though kill/renice are safer) |
| Need to monitor a single PID over time | top -p <pid> |
| Need to filter processes interactively | top with o filter |
| Need sorted output for a report or postmortem | ps -eo --sort |
General rule: Use top first to get situational awareness. Once you have identified the problem process, switch to ps for detailed inspection and to strace/lsof for deeper investigation.
For a complete
psworkflow guide, see ps Command Linux: The Engineer's Troubleshooting Guide.
top vs htop: Brief Comparison
htop is a modern alternative to top with a better interface and more features. It is not installed by default but is available in all major Linux package managers.
# Install htop
apt install htop # Ubuntu/Debian
dnf install htop # RHEL/Fedora
| Feature | top | htop |
|---|---|---|
| Default install | ✅ Always available | ❌ Requires install |
| Per-core CPU bars | Toggle with 1 |
Visual bars by default |
| Mouse support | ❌ | ✅ |
| Kill/renice UI | Basic | Visual with confirmation |
| Process tree view | V key |
Built-in toggle |
| Color coding | Minimal | Full color |
| Filter/search | o key |
/ key — simpler |
| Scrolling | Limited | Full horizontal/vertical |
Use top when: You are on a system without htop, in a restricted environment, or in a script with -b batch mode.
Use htop when: You have it available and you prefer a more navigable interface.
Quick Reference
# ── LAUNCH ──────────────────────────────────────────────────────
top # standard launch
top -p 1234 # monitor specific PID
top -u appuser # filter by user at launch
top -b -n 1 # batch mode, single snapshot (for scripting)
top -o %MEM # launch sorted by memory
# ── INTERACTIVE KEYS ────────────────────────────────────────────
1 # toggle per-core CPU view
P # sort by CPU
M # sort by memory
T # sort by total CPU time
k # kill process
r # renice process
H # toggle thread view
u # filter by user
o # filter by condition (%CPU>5.0, USER=root, etc.)
f # field selector
d # change refresh interval
q # quit
W # save configuration
# ── READ THE HEADER ─────────────────────────────────────────────
# Load average > number of cores = overloaded
# %wa > 10% = I/O bottleneck, check disk/NFS
# %us high = application CPU
# %sy high = system calls, context switching
# avail Mem low = memory pressure (not 'free')
# Zombie count > 5 = parent process problem
Conclusion
The top command on Linux is the fastest way to answer "what is happening on this system right now?" It gives you CPU utilization, memory pressure, load average, and a live process list in a single view.
The engineers who use top effectively know how to read the header correctly — understanding load average relative to core count, distinguishing I/O wait from CPU usage, and recognizing that avail Mem matters more than free. They press 1 to see per-core breakdown. They use the o filter instead of scrolling through hundreds of processes.
top is situational awareness. ps is surgical investigation. Use both, in that order.
Learn the header first. Learn the CPU breakdown. Learn what load average actually means on your hardware. Those three things alone will make you faster at diagnosing the majority of Linux production incidents.
Related reading: ps Command Linux: The Engineer's Troubleshooting Guide — for snapshots, scripting, and process tree investigation. strace, lsof, and ss: The Trio That Solves Every Mystery — for the next layer of investigation after top identifies the culprit.