Ep #47: System Design Roadmap: Your Step-by-Step Journey
A roadmap of essential topics — from networking basics to real-world case studies — adapted from roadmap.sh.
Breaking the complex System Design Components
By Amit Raghuvanshi | The Architect’s Notebook
🗓️ Oct 07, 2025 · Free Post ·
Why I Wrote This Article
I’ve often received questions like “Where should I start my System Design journey?” or “How do I make sense of such a vast topic?”
That’s exactly why I wrote this piece - to help you build a solid foundation and navigate System Design with clarity.
You can use the roadmap at the end to track your learning journey step by step. It’s designed to guide you from fundamentals to advanced concepts.
Through my blog, I’m also doing deep dives into real-world system design case studies — covering critical topics, architectural decisions, and trade-offs that matter in practice. If you’re serious about mastering System Design, consider upgrading to a paid subscription to be part of this journey.
With 10+ years of industry experience, I share not just theory but insights drawn from real projects and hard-learned lessons — so you can learn faster and design better.
Here’s a quick TL;DR video summary of this article — perfect if you prefer watching instead of reading
What is System Design?
System design is like creating a blueprint for a complex software system, similar to how an architect plans a house before building it. It’s about figuring out how all the pieces of a system - software, hardware, databases, and networks, work together to achieve a goal, like running an app or website smoothly.
Think of it as planning a big party:
Goal: Everyone has fun, eats, and dances without chaos.
Components: Food tables, music system, guest list app.
Plan: Where to place tables, how to serve food fast, how to avoid music cutting out if the speaker fails.
In system design, the goal is to make sure the system (like an app or website) works well, even when thousands or millions of people use it. It involves:
Functional Requirements: What the system does (e.g., WhatsApp lets you send messages or photos).
Non-Functional Requirements: How it behaves (e.g., it’s fast, reliable, and doesn’t crash).
Key Elements of System Design
To design a system, you need to think about several parts:
Components: These are the building blocks of the system, like servers, databases, or storage.
Example: In WhatsApp, components include:
Messaging servers: Handle sending and receiving messages.
Databases: Store your chats.
Media storage: Save photos and videos.
APIs: Let the app talk to the servers.
Interfaces: These define how components communicate.
Example: WhatsApp uses REST APIs (for fetching data) and WebSockets (for real-time messaging).
Data Flow: This is how information moves through the system.
Example: When you send a message on WhatsApp:
Your phone sends the message to a server.
The server stores it in a database temporarily.
The server pushes it to the recipient’s phone instantly.
Beyond Just Diagrams
System design isn’t just about drawing boxes and arrows (like a flowchart). It’s about making trade-offs choosing one option over another based on what’s best for the system. Here are common trade-offs:
SQL vs. NoSQL Databases:
SQL: Organized, like a neat spreadsheet (good for structured data, like user profiles).
NoSQL: Flexible, like a folder of notes (better for unstructured data, like posts or comments).
Example: Instagram uses NoSQL (like Cassandra) for millions of photos/videos because it’s faster for large, varied data.
Monolith vs. Microservices:
Monolith: One big program with everything (simpler to start but hard to scale).
Microservices: Many small programs working together (harder to manage but easier to scale).
Example: Netflix uses microservices. Each feature (streaming, recommendations) runs separately, so one part can grow without affecting others.
In-memory vs. Disk-based Caching:
In-memory (like Redis): Super fast but expensive and limited in size.
Disk-based: Slower but cheaper and can store more.
Example: Twitter uses Redis to cache tweets for fast loading.
👉 Twitter’s evolution:
Started as a monolith → scaled into microservices → introduced event-driven architecture for real-time timelines.
The Non-Functional Pillars
Non-Functional Requirements (NFRs) define the quality attributes, system characteristics, and operational constraints that determine how well a system performs its intended functions. Unlike functional requirements that describe what a system does, NFRs specify how the system should behave under various conditions.
NFRs are often referred to as “quality attributes” or “system qualities” and include aspects such as performance, security, scalability, maintainability, and availability.
These qualities make a system great, even if they’re not about what it does. They’re the “behind-the-scenes” features that keep users happy.
Scalability: Can the system handle more users or data?
Vertical Scaling: Add power to one server (like upgrading a computer’s RAM).
Horizontal Scaling: Add more servers (like hiring more chefs for a busy restaurant).
Example: Instagram uses database sharding (splitting data across servers) and CDNs (Content Delivery Networks) to deliver photos quickly worldwide.
Reliability: Does the system stay correct even if something fails?
Example: Uber saves trip data in multiple data centers. If one crashes, the ride continues because data is backed up.
Performance: Is the system fast and efficient?
Example: WhatsApp uses the XMPP protocol (for real-time messaging) and client-side caching (storing chats on your phone) for fast messaging.
Maintainability: Is the system easy to update or fix?
Example: Google Search splits its system into modules (indexing, crawling, ranking). Engineers can update one part without touching the rest.
Security: Protecting the System
Security ensures the system is safe from attacks and protects user data. It’s like locking the doors and windows of your house.
Authentication: Verify users are who they say they are (e.g., passwords, two-factor authentication).
Authorization: Control what users can do (e.g., only you can edit your profile).
Encryption: Protect data during transfer (e.g., HTTPS) and storage (e.g., encrypted passwords).
Example: WhatsApp uses end-to-end encryption, so only you and the recipient can read your messages, not even WhatsApp’s servers.
Why it matters: A security breach (like a hacker stealing data) can ruin trust and cost millions. For example, Equifax’s 2017 data breach exposed sensitive data of 147 million people due to poor security design.
Cost Optimization: Saving Money
Good system design balances performance with cost. It’s like planning a party on a budget - great food, but not caviar for everyone.
Efficient Resource Use: Use only what you need (e.g., scale servers up/down based on traffic).
Cloud Services: Use cost-effective options like AWS Lambda (pay-per-use) instead of always-on servers.
Example: Netflix uses AWS auto-scaling to add servers during peak hours (like new show releases) and remove them when traffic drops, saving money.
Why it matters: Poor design can lead to skyrocketing costs. For example, a badly designed system might use 100 servers when 10 would do.
Testing and Monitoring: Keeping the System Healthy
Testing ensures the system works as expected, and monitoring catches issues before they affect users. It’s like a doctor checking your health regularly.
Testing:
Unit Tests: Check individual components (e.g., does the login function work?).
Integration Tests: Ensure components work together.
Load Tests: Simulate millions of users to check scalability.
Example: Netflix’s Chaos Monkey randomly shuts down servers to test if the system stays reliable.
Monitoring:
Track metrics like response time, error rates, or server usage.
Use tools like Prometheus or Grafana for real-time alerts.
Example: Amazon monitors its shopping cart system to catch slowdowns before they frustrate users.
Why it matters: Without testing and monitoring, small issues (like a slow database) can become big problems (like a website crash during Black Friday sales).
Common Design Patterns
Design patterns are proven solutions to common problems, like recipes for your favorite dishes. They save time and ensure reliability.
Load Balancer Pattern: Distribute traffic across servers to avoid overload.
Example: Google uses load balancers to send search queries to available servers.
CQRS (Command Query Responsibility Segregation): Separate read (query) and write (command) operations for efficiency.
Example: Twitter uses CQRS to handle reading timelines (fast) separately from posting tweets (consistent).
Event-Driven Architecture: Systems react to events (like a user clicking “buy”).
Example: Amazon’s shopping cart uses events (add item, remove item) to update your cart in real time.
Why it matters: Patterns help you avoid reinventing the wheel and build systems faster and more reliably.
Real-World Applications of System Design
Here’s how big companies use system design:
Facebook’s Newsfeed:
Shows posts in real time.
Uses fan-out-on-write (posts sent to friends’ feeds instantly) and fan-out-on-read (fetches posts when you open the app).
Redis caching for speed, Graph APIs for fetching connections.
Amazon’s Shopping Cart:
Keeps your cart consistent across devices.
Uses event sourcing: Every action (add/remove item) is saved as an event, so the cart can be rebuilt if needed.
YouTube:
Streams videos smoothly worldwide.
Uses distributed storage, CDNs, and adaptive bitrate streaming (adjusts video quality based on internet speed).
Why System Design Matters
A poorly designed system might work for 1,000 users but crash at 1 million. Good system design ensures:
Efficiency at scale: Fast responses, low costs.
Fault tolerance: Automatic recovery from failures.
User experience: Consistent, fast service worldwide.
Security: Protects user data and trust.
Cost savings: Avoids wasteful spending.
Example: Netflix’s Chaos Engineering (deliberately breaking servers) ensures users can watch shows even if parts of the system fail.
Learning Focus (for Students + Interviews)
When preparing for system design interviews, focus on:
Scalability: Sharding, load balancing, queue systems.
Reliability: Replication, leader election, consensus algorithms (Raft, Paxos).
Performance: Caching (Redis, Memcached), CDNs, asynchronous processing.
Maintainability: Microservices, modular APIs, CI/CD pipelines.
Security: Authentication, authorization, encryption.
Cost Optimization: Auto-scaling, efficient resource use.
Testing/Monitoring: Load testing, real-time alerts.
Example Exercise:
Design a Ride-sharing system (Uber/Lyft) → Consider:
Real-time location updates: Use WebSockets for driver/rider tracking.
Matching algorithm: Pair riders with drivers quickly using location data.
Data consistency: Ensure payments and driver availability are accurate.
Reliability: Switch to backup data centers if one fails.
Security: Encrypt payment data and verify user identities.
Career Impact
System Design separates junior engineers (feature builders) from senior engineers/architects (system thinkers).
In interviews → Determines if you can design beyond code.
In jobs → Impacts cost, reliability, and scalability of real systems.
Example:
At Amazon, designing a scalable recommendation engine affects millions of shoppers. If designed smartly, it increases engagement → revenue → company growth.
Summary
System Design = building scalable, reliable, performant, and maintainable systems.
System Design is about planning a system to be scalable, reliable, performant, maintainable, secure, and cost-effective.
It’s more than diagrams - it’s about trade-offs (e.g., SQL vs. NoSQL, monolith vs. microservices).
Key techniques: Caching, load balancing, sharding, CDNs, encryption, auto-scaling, monitoring.
Real-world examples: WhatsApp’s chats, Netflix’s streaming, Uber’s rides.
It’s the skill behind apps that handle millions of users, ensuring they’re fast, safe, and reliable.
Here is the roadmap to start your System Design Journey (taken from roadmap.sh). Use below link to access it
https://roadmap.sh/system-design
If you have any questions about the membership or content suggestions, feel free to reply or leave a comment below. Thanks for supporting
Amit Raghuvanshi
Author, The Architect’s Notebook
Stay tuned 👀