Incident Report: A Blameless Post-Mortem of 2025
Why we need to stop making "Resolutions" and start writing "Incident Reports."
If there is one thing we architects respect more than a perfectly designed system, it’s a well-documented Post-Mortem.
When a production database crashes or a deployment fails, we don’t stand around pointing fingers. We don’t say, “Well, the server was just lazy” or “The load balancer didn’t have enough willpower.”
We write a Blameless Post-Mortem. We look at the system, identify the bottleneck, and engineer a fix so it doesn’t happen again.
Yet, on December 30th-31st, we tend to do the exact opposite with our lives. We look back at 2025, identify where we failed (our “outages”), and we blame the operator.
“I was too lazy.”
“I wasn’t disciplined enough.”
This year, I want you to try something different. Put down the “New Year’s Resolutions” list and pick up an Incident Report. Let’s debug 2025.
Phase 1: Identify the “Major Outage”
Every system has downtime. Look back at your year. What was your SEV-1 incident?
Was it high latency? (Did you burn out and your productivity slowed to a crawl?)
Was it a deployment failure? (Did you try to launch a side project or learn a new language, but rolled it back after two weeks?)
Was it a connection timeout? (Did you lose touch with family or friends because the “network” was congested with work?)
Action: Pick one specific failure. Don’t judge it. Just log it.
Example Log: “I planned to read 12 technical books this year. I only read two.”
Phase 2: The Root Cause Analysis (RCA)
In engineering, we use the “5 Whys” to find the root cause. If you blame yourself (”I was lazy”), you aren’t fixing the system. You are just yelling at the server.
Let’s apply the 5 Whys to the example above:
Why did I only read two books? Because I never had time in the evenings.
Why didn’t I have time? Because I was working until 9 PM every night.
Why were you working late? Because I said ‘yes’ to every meeting request.
Why did you say yes? Because I don’t have a filtering mechanism for my calendar.
Root Cause: The system (your schedule) lacked Rate Limiting. It wasn’t a discipline problem; it was a traffic control problem.
Final Reminder - Flat 10% discount on books expires tomorrow
The Architect’s Complete Career Kit (Best Value)
Individual Volumes:
Other Books:
Phase 3: Implementing Guardrails (The Fix)
Resolutions rely on willpower. Willpower is like RAM—it’s volatile and clears when you sleep. Systems rely on architecture.
Once you know the Root Cause, you don’t promise to “try harder.” You install a Guardrail or a Circuit Breaker.
If the issue was Burnout: Implement a Circuit Breaker. If you work past 7 PM three days in a row, the breaker trips: strict ban on laptops for the weekend.
If the issue was Overcommitment: Implement Throttling. Reject all non-critical requests that don’t align with your core OKRs.
If the issue was Health: Remove the Single Point of Failure. If your gym habit relies on waking up at 5 AM (and you hate mornings), that’s a brittle design. Redundancy is key—move the workout to lunch.
The Deployment Plan for 2026
As we roll over into the new year, remember that you are a complex, distributed system. You have limited compute power and limited storage.
If 2025 had outages, that’s okay. That is expected behavior for any system under heavy load. The goal isn’t to be perfect; the goal is to be resilient.
Don’t just wish for a better year. Architect one.
Happy New Year in advance,
Amit Raghuvanshi




Really well written, the analogy of SRE tasks to real life was astounding to read through 😄
And happy new year to you, Amit and all my fellow learners!