Senior L3 Infrastructure & Security Engineer

Linux Troubleshooting & Performance — From Production

If you've ever stared at a server with 100% CPU, a 502 waterfall, or a hung process that won't die — this blog is for you. Real incidents, real commands, real fixes. No padding.

Read the guides →Start troubleshooting

About Damon

Senior Technical Support Engineer · L3

OPSWAT · Ho Chi Minh City

7+ years working on Linux systems in production — from running server fleets at an Anti-DDoS company to L3 support for enterprise deployments at OPSWAT, where the bugs are always someone else's OS-level problem.

Most of what I write comes from incidents that took too long to diagnose the first time. The goal is to make it faster for you.

Full background →

What This Blog Covers

Written for engineers who are mid-incident, not mid-tutorial. Every post has the exact commands and the reasoning behind each one — not just the fix, but why it works.

▸Linux performance troubleshooting — CPU, memory, load average, I/O wait
▸Process debugging — ps, top, strace, lsof, process states (D, Z, R)
▸NGINX production issues — 502 errors, upstream keepalive, SSL hardening
▸Security hardening — CIS benchmarks for Ubuntu, RHEL, Windows Server
▸Incident response — log analysis, root cause, postmortem workflows

Common searches that land here: linux high cpu usage, debug linux server, load average explained, nginx 502 under load, linux process monitoring.

Start Here

The three guides most engineers need first

all articles →

Featured

Linux Performance Troubleshooting: Complete Engineer's Guide

CPU, memory, I/O, process states — the full diagnostic workflow with real commands and decision trees. Start here when a server is slow and you don't know why.

Read article →

Featured

NGINX 502 Bad Gateway Under Load: Causes, Debugging, and Fixes

The most common NGINX failure pattern in production. Covers port exhaustion, missing keepalive, and proxy timeouts — with the exact config that fixes each one.

Read article →

Featured

Linux Security Hardening: CIS Benchmarks for Production

CIS Level 1 hardening for Ubuntu, RHEL, and Windows Server. What each control does, what breaks in production, and how to apply it safely.

Read article →

Browse by Topic

Grouped by what you are actually trying to do

Linux Commands

ps, top, htop, strace, ss — the tools you reach for first

Troubleshooting

High CPU, memory leaks, zombie processes, 502 errors

Monitoring & Debugging

strace, lsof, auditd, log analysis

Security & Infrastructure

CIS hardening, firewall, Docker, NGINX config

All Topics

linux troubleshooting · nginx debugging · security hardening · infrastructure

view all →

#NGINX #Linux #Docker #Networking #Debugging #Security #Logs #Monitoring #SSL/TLS #Firewall

Latest Articles

Most recent Linux and DevOps troubleshooting guides

all articles →

NGINX 502 Bad Gateway Under Load: Root Causes and Fixes

NGINX 502 errors under load are almost never a simple app crash. This guide covers the real root causes — connection backlog overflow, keepalive misconfiguration, ephemeral port exhaustion — with diagnostic commands and config fixes from production incidents.

#nginx#debugging

12 min read

Log Analysis for Security Investigations: Windows Event Logs and Web Server Access Logs

A practical guide to log analysis for security investigations — Windows Event Viewer, critical Event IDs, Apache access log parsing, and the Linux command-line tools that make manual log analysis fast and effective.

#security#linux

9 min read

Diamond Model of Intrusion Analysis: 4 Core Components Explained (2026)

A technical breakdown of the Diamond Model of Intrusion Analysis — adversary, victim, capability, and infrastructure — with real attack examples, meta-features, and how it compares to the Cyber Kill Chain and MITRE ATT&CK.

#cybersecurity#threatintel

19 min read

Cyber Kill Chain: All 7 Phases Explained with Real Attack Examples (2026)

A technical deep-dive into the Cyber Kill Chain — all 7 phases mapped with real attacker techniques, detection indicators, and defensive controls. Includes a full real-world attack walkthrough and Kill Chain vs MITRE ATT&CK comparison.

#cybersecurity#threatintel

20 min read

How to Trace Route in Linux: traceroute Examples

Use traceroute in Linux to diagnose network path issues — read hop output, interpret timeouts, use TCP mode to bypass firewalls, and identify where packets are being dropped.

#linux#networking

5 min read

All Linux & DevOps articles →

Tools & Resources

Beyond the blog — scripts, CLI tools, and guides built from the same production experience. If you find yourself doing the same thing manually three times, it becomes a tool.

CLI Tools

sys-monitor and seo-pro-audit — open-source utilities from this blog

Browse tools →

Troubleshooting Guides

Deep-dive walkthroughs for the incidents that take the longest to debug

Start with Linux →

Security Hardening

CIS benchmark implementation guides for Ubuntu, RHEL, and Windows Server

Read the guide →