Linux & DevOps Troubleshooting Blog
Practical guides for Linux engineers — NGINX debugging, process troubleshooting, CIS hardening, and production incident response.
Browse by category
Filter by topic
79 posts · page 8 of 9
Linux High CPU Usage: Step-by-Step Troubleshooting Guide
Step-by-step guide to diagnosing Linux high CPU usage — using ps, top, and htop to identify the culprit, distinguish user vs kernel vs I/O wait CPU, and resolve the issue in production.
Top Linux Debugging Tools Every Engineer Should Know
The essential Linux debugging tools for production troubleshooting — ps, top, htop, lsof, strace, iotop, vmstat, dmesg, and more — with real use cases and a comparison table.
htop vs top: Which Should You Use in Production?
htop vs top — a practical comparison for Linux engineers. When to use each, key differences in UI and usability, performance overhead, and real production scenarios where one beats the other.
CIS RHEL Level 1 Hardening: What Actually Breaks in Production
CIS RHEL Level 1 hardening guide for production Red Hat systems — what breaks, what to apply first, and how to avoid SSH lockouts, auditd disk exhaustion, and PAM-related service outages.
How to Check Running Processes in Linux: Complete Guide
How to check running processes in Linux using ps, top, and htop — with filtering techniques, real troubleshooting workflows, and common mistakes engineers make when investigating process issues.
Check Open Ports in Linux: ss vs netstat Explained
How to check open ports in Linux using ss and netstat — with real troubleshooting scenarios, filtering techniques, and a clear comparison of when to use each tool.
CIS Windows Server Level 1 Hardening: What Actually Matters in Production
CIS Windows Server Level 1 hardening in production — what breaks, what to apply first, and how to avoid NTLM lockouts, audit log disk exhaustion, and service account outages.
CIS Level 1 Ubuntu Hardening: A Field-Tested Production Guide
CIS Level 1 Ubuntu hardening guide covering filesystem, SSH, sysctl, and audit logging — with real production pitfalls, configs, and a compliance checklist. Tested in enterprise environments.
Linux TIME_WAIT Explained: Why It Causes Connection Failures and How to Fix It
Linux TIME_WAIT exhausts ephemeral ports and causes ECONNREFUSED under load — even when your app is healthy. Learn what TIME_WAIT is, how to detect port exhaustion with ss and netstat, and the exact sysctl fixes that resolve it.