DETAILED GUIDE: Improve Your System Design Interview Skills

The tech recession was beginning, so it was a difficult time to find a new job.

I remember the stress and anxiety I experienced when studying both system design and the system design interview a couple of months ago (and yes, there’s a difference between the two). I haven’t done much substantial system design work at my prior company, and I’ve never done a system design interview before.

I realized how important the system design was going to be for my targeted software engineer role. Without a decent system design interview performance, I was likely going to be rejected or down-leveled.

Unlike preparing for the coding interview, where information is abundant about every topic and the solutions to the problems are more straightforward, learning about system design and preparing for the system design interview is a different beast.

In system design interviews, there isn’t a single optimal solution (unlike usually what’s in coding interviews), there’s an infinite depth and breadth of information to understand, and the information online about the system design interview isn’t as abundant as in other interviews.

In this article, I’m breaking down some of the biggest lessons and tips that I learned from studying system design over a period of 3 to 4 months (without prior knowledge). The results of my studying and learnings helped me land a mid-level software engineer role at my current job, during one of the most difficult hiring periods for tech workers (aka hiring freezes and layoffs in Q3 2022).

With the system design tips in this article, I hope to short-circuit a lot of your struggle with the lessons from my struggles. So, let’s begin!

Make sure to BOOKMARK this page to come back to later.
This is a lot of valuable information, in my opinion anyways.

Table Of Contents

Learn The Fundamental Components of System Design 

If you don’t understand what the fundamental components of distributed system design are, then you’re priming yourself to not understand anything. These fundamental system design components can be used together to create solutions to system design interview questions.

The fundamental components of system design are servers, databases, caches, message queues, load balancers, blob storage, CDNs, rate-limiters, polling, and WebSockets. This is based on patterns I’ve seen in the most popular types of system design interview questions (which are covered later in this article).

Here’s the truth about learning about these components that I wish I knew when I was starting out with learning about system design. Each of these individual components has so much depth to them that isn’t necessary to understand them initially to make the best use of your studying. For instance, a whole series of University courses could be created just about databases.

Before you dive into any of these components in detail, I think it’s important to at least have at least a working high-level understanding of these topics.


Servers are the basic building blocks of system design. They handle tasks like orchestrating API requests as services and tasks like processing data as workers for message queues.

From my experience in researching the system design interview, these servers tend to be stateless and scaled horizontally.

What Are Stateful vs. Stateless Servers

Stateful servers are servers that have data such as a user’s login session on a particular server. This means that the user’s requests must always be routed to that particular server, otherwise, their data isn’t available to them. In practice, this is also known as sticky sessions. Issues arise when a server is offline or added because there needs to be additional logic to handle user routing and state recovery/replication. 

Stateless servers are servers that do not contain data for requests. Instead, they rely on other services or databases to contain the state, and it’s fetched during runtime. The main advantage to this is that you can add and remove servers based on the system’s load, and it doesn’t matter which server a user is accessing because the servers don’t contain a state.

When to Scale Servers Horizontally vs. Vertically

Assuming that servers are stateless, it’s almost always preferred to scale horizontally.

Scaling horizontally means just adding more servers. This allows for theoretical infinite scaling on the server side. This is at the expense of the time (which often is negligible) for making network calls to retrieve a request’s state which isn’t located on the server.

Scaling vertically means buying and upgrading to more powerful machines. This becomes exponentially more expensive for substantial diminishing returns on computational power. Additionally, there are upper-bound limits to how powerful a machine can get. 

Scaling servers vertically can be used in certain situations like developing one-off systems, prototypes, or developing within a startup where a handful of very powerful servers are enough for an MVP. Vertical scaling can also be used (to an extent) with horizontal scaling for an optimal value per-cost ratio.

Load Balancers

At a high level, load balancers are easy to understand as a topic. Load balancers, as the name suggests, route requests to multiple endpoints (ideally in a balanced way). In order to have a highly available system, you’ll need multiple servers and databases which load balancers routing traffic to them.

The fun with load balancers comes with defining routing policies. 

Simple Load Balancing with Round Robin

Round Robin routing essentially routes each request to a server in a circular-looping order. If the servers are stateless and the requests are relatively fast, then this could be a simple and effective choice.

Issues with Load Balancer Hashing Algorithms System Design

Hashing is probably what most people would do when picking a load-balancing routing policy. Basically, you hash the request into a number and mod it by the number of servers to pick a server to assign a request.

There are actually a lot of issues that could come up if you were doing something like this for sticky sessions. One of the issues is the hashing function not being uniform (effectively making the load unevenly distributed). Another one of the issues is that you essentially have to remap all the requests to a new server if you add a new server or a server goes down.

Why Consistent Hashing is Better Than Regular Hashing

Consistent hashing is a way to evenly distribute load across a dynamic number of multiple servers with minimal readjusting of the request to server mappings. This is a large topic that could be the topic of another video, but at a high level, performance-wise it’s better than standard request hashing for server choice.

Depending on the consistent hashing algorithm, you can also add tunability to the hashing function to route more requests to specific servers over others.


As I’ve mentioned before, databases in system design interviews are a really heavy topic. In discussing databases, there is so much that can be discussed.

For the purposes of a system design interview, the most important discussions I found helpful include:

  1. Choosing a SQL vs. noSQL database
  2. How to shard a database across many instances
  3. How to ensure data availability and durability

How Important is The CAP Theorem for System Design

The CAP Theorem states that you can only have two of three desired qualities within a system. The desired qualities of a system are being highly consistent, highly available, and highly partition tolerant. In practice, you can’t afford to not be partition tolerant (CAP), so you have the choice of picking high consistency (CAP) or high availability (CAP).

The CAP theorem is a guiding principle for a system’s design and is very important to understand before you look into what kind of database you should use. The choice of availability vs. consistency could help you choose SQL vs. noSQL databases.

An example of a highly consistent system (CA), is a banking system. If we need to deduct $5 from one person and send that $5 to another person, we can’t have an inaccurate record of how much either person has before the transaction occurs. 

We need to ensure that the person sending the $5 has enough money to do so. In computing how much the person receiving the $5 at the end should have, we need to ensure that the math of the person’s existing balance + $5 is accurate.

Typically, to achieve highly consistent systems you have to trade off high availability as the CAP theorem suggests. In practice, this means that when the transaction is going through between either of the people involved, another transaction cannot go through. This essentially means that the system is “frozen” until a transaction goes through to ensure data accuracy. This is a simplistic way of explaining it.

Conversely, an example of having a highly available system is something like timelines on Facebook or Instagram. If someone uploads a post, we don’t necessarily care that it shows up for everyone as soon as possible. A small delay is acceptable in this case for data accuracy amongst reading the latest posts. 

However, we want to ensure that the platform is always able to accept new posts and have its posts readable. We want the posts system overall to be highly available (AP) for the users over highly consistent (CA). 

SQL vs. noSQL in a System Design Interview

When you begin to talk about databases within a system design interview, this is probably the first topic you’ll talk about after understanding the data you’ll be storing. 

Should you use a SQL or noSQL database within a system design interview? How should you know?

I’d use SQL databases if you have input data that can be modeled with a clear schema, needs ACID database guarantees, and/or needs to have complex queries run on it. Otherwise, noSQL databases tend to provide better out-of-the-box solutions for scaling a database out horizontally.

Fundamentally, you can use either a SQL or noSQL document database for any problem. However, you’ll run into certain issues.

For instance, you can manually enforce a schema and write data into a noSQL document database like MongoDB. However, that would require some additional code and logic on your part. It’s like forcing a square through a circle peg. You could probably do it with enough effort, but you didn’t have to do it, to begin with.

Something that’s also worth mentioning about noSQL databases is the fact that there are a lot of different types of them compared to noSQL databases.

Popular noSQL database types include:

  • Document Databases (similar to storing JSON file content)
  • Graph Databases (optimal for storing and querying data like friend graphs)
  • Columnar Databases (great for analytics and data warehousing)

Of course, like everything, there’s a lot of nuance to what I just said that’ll become apparent with more research. However, I think that I laid out enough of the breadth of SQL vs. noSQL database knowledge that would be expected to be known within a system design interview.

Sharding Databases in a System Design Interview

So, what is sharding?

Sharding a database is the act of dividing either the rows or columns of a database into multiple databases. 99% of the time, the word sharding in reference to databases means sharding by rows. This is formally called horizontal sharding.

You’re forced to shard databases at scale because all the data will not fit within a single database. Could you imagine the data for all Google search queries being stored in a single MySQL server? That also implies that it’s on one massive computer.

Sharding horizontally is done by an object called a sharding key. The sharding key is used along with a rule to split the data up. For instance, if I was storing data for comments on YouTube, I could pick a shard key of someone’s username. The rules associated with this could be: all comments for usernames that start with an ‘a’ are in a single database, and all comments for usernames that start with a ‘b through e‘ are stored in another database.

So, what are some possible sharding keys?

The sharding key could be something like a username, a user’s geographic location, or a unique ID assigned to a particular user. Sometimes, a combination of sharding keys is used if the amount of data that is stored is massive.

It’s important to understand that picking a poor sharding key could be detrimental to a system, especially within the context of a system design interview. 

It could cause an issue called a hotspot problem. In the case of sharing by username, you could have a concentration of maybe 80% of all the comments being posted by people with a username that starts with ‘a’. This means that the database storing the records for usernames that start with an ‘a’ could become overloaded because it’s a hotspot.

Ensuring High Availability and Durability in System Design Interviews

In the context of a system design interview, for a database, you generally need to ensure it has high availability (AP in terms of CAP theorem) and durability. This is done through database replication.

When a database is replicated, the replica can serve as a failover database if the primary database goes down. All traffic that would’ve been served from the original database could be routed to the secondary database. This is a primary-secondary failover database setup.

Usually, with this setup, the primary database takes in all the new writes and the secondary (or failover) database either synchronously or asynchronously receives the replicated data from the primary. Both of these replication methods have their own pros and cons.

Synchronous vs. Asynchronous Replication for System Design Interviews

If a secondary database synchronously receives data from the primary, it essentially ensures that the data that’s written to the primary data is durable. 

That means that once it’s written to the database, it helps in ensuring that the data will not be destroyed or disappear. However, this synchronous replication causes writes to become slower because they must finish replicating on the secondary database before the transaction is finalized.

The data durability benefits you get from the replication of data are probably one of the most appealing reasons to use synchronous replication, however, it can be taxing in practice for actual durability. If you’re replicating data for durability, you’d want to ideally replicate it in another geographical region because of things like natural disasters and local disasters.

Synchronous replication over a network will clog a database at scale if not properly thought about within a system.

If a secondary database synchronously receives data from the asynchronously, it ensures that writes are fast to the primary database because it doesn’t need to wait for the write to be completed on the secondary database before finalizing its transactions.

In this case, if the primary database were to go down, there’s a chance that not all of the data was able to be copied to the secondary database. While most of the data would hopefully be copied over already, some of the writes that were written to the primary database and not the secondary could potentially be lost forever.

Choosing the type of replication (synchronous vs. asynchronous) for databases could be tricky because of the balance between durability and speed. The choice you make will depend on the requirements for the system you’re designing. Sometimes, a hybrid solution could be used as well.

It’s possible to have more than one database to replicate. For example, we could have a single primary database and two secondary databases. A geographically semi-close database could be used to synchronously replicate data, and a far-geographically located database could receive its replica data asynchronously. 

This blend of techniques of replicating data across databases helps ensure that the data that is written is durable and highly available because at least two databases will have a perfect copy of the data, and one database will be geographically distant in case something happens to one of the other databases (and hopefully not both if they’re spread out enough).

Handling Read-Heavy Systems in System Design Interviews

Based on the requirements for a system, you may be able to identify that a system is going to be a read-heavy system. If a system is read-heavy, you’ll want to make sure you have an appropriate number of read replicas.

Read replicas are copies of a database that can only be read from. Read replications improve the throughput of a system, especially if it’s read-heavy. Read requests that could potentially overload a primary database are offloaded into read replica databases which can scale essentially infinitely.

In the event the primary database which allows writes goes down, there are methods like leader elections that can take place to promote a single read replica to become the primary database instead and accept writes.

While there are advantages to using read replicas within a read-heavy, it’s important to understand some key trade-offs. Asides from adding more technical complexity, read replicas may not have all the updated information and could be out of sync with both the primary database and each other.

This can cause bizarre issues like the user experiencing that they’re going back in time after performing an action because they initially read a database that was updated and then, from a subsequent database call, they read an outdated read replica.

This is another reason it’s important to understand the CAP theorem, given a system design problem. If you need strong consistency for the database, massive amounts of read replicas that receive their data asynchronously may not be the best choice for designing your system.


The cache is an important concept within system design to understand because they have the ability to dramatically improve the latency of read requests. In the context of caching data for a database, caches also provide a way to reduce load towards a database’s read requests.

Caches achieve lower latency reads by storing and retrieving data from memory instead of from disk (like in databases). Cache reads are typically orders of magnitude faster than database reads.

When caching data, it’s important to think about the cache eviction policy, updating the cache, and data consistency.

Why a Good Cache Eviction Policy is Important

One of the limitations of using a cache instead of using a database is the fact that it’s more expensive to buy more RAM instead of disk space. As a result, compared to the amount of disk space you’ll typically allot towards a database, the amount of memory allocated to support a cache will be substantially less.

This is why a cache eviction policy is necessary. The cache eviction policy is about what you want to do when the cache becomes full. Which existing cache data will you throw out in favor of newer records?

Here are a couple of cache eviction policies that are popular:

  1. Least recently used (LRU) — which is probably the most popular
  2. Least frequently used (LFU)
  3. First-in first-out (FIFO)

While the caching eviction policies are worth a read, I think it’s best to go with LRU for the purposes of a system design interview because of its battle-tested history and performance compared to the others. It would still be worth mentioning some of the other policies and why something like FIFO could have worse performance.

Updating a Cache in a System Design Interview

Because a cache (in the context of caching database data) will have different content than the database it’s caching data for, it’s important to understand when and how to update a cache upon a new write to a database. 

The main two ways to handle updating information within a cache are through write-through and write-back policies.

Write-through caching occurs upon writing to the database. This update method for a cache updates both the cache and database synchronously. While slow, it ensures that the data being pushed to both the cache and database is durable.

Write-back caching occurs upon writing to the database as well. This update method only updates the cache. This is faster than write-through caching, but the data could potentially be lost, so it’s not always preferred.

When to Not Use a Cache Within System Design Interviews

As mentioned previously, a cache is good for reducing user read request latency as well as reducing the load on a database. However, when is it preferable to not use a cache?

It’s important to remember that a cache is typically only good for read-intensive systems which read the same data, very frequently. Otherwise, the cache will actually slow down the system as an unnecessary redundant layer for the database.

Alternative Uses for Caches Within System Design Interviews

Caches don’t only have to be used as lower latency buffers for databases. They can also be used in a more abstract sense. You can utilize a cache within a system design interview to perform as caches for things like a user’s precomputed timeline feed which is fetched upon a user request. 

The timeline could be generated through background services and the caching aspect of the system would ensure that the user would get their timeline in a low-latency way.

Another use case for a cache could be for creating a throttler. Throttlers need to be able to handle lots of fast reads and writes which is what storing data in memory can support. Depending on the type of throttler, it may not be that big of a deal if the cache’s throttling data is lost as a tradeoff for performance.

There are a lot of cases where you can use a cache if you think about it beyond just databases. Caches are fundamentally readable and writable stores for data in memory.

Message Queues

Message queues act as buffers for data to be processed. Publishers push messages onto topics within a queue, and subscribers poll the messages from topics they subscribe to within the queue. 

This architecture decouples the execution of a task and makes an application more resilient to failures if either the publishers or subscribers aren’t able to do work.

When to Use a Message Queue Within System Design

Whenever you have orchestration of tasks that don’t necessarily need to execute synchronously, a message queue could be a prime example of this. 

An example of a good time to use a message queue would be within uploading a video to a website like YouTube. Here, the user makes a request to upload their video and video metadata. Once the data is uploaded, processing tasks can be published into queues. The tasks are processed asynchronously, but the user will receive a notification of a successful upload request.

From that point, the uploader and viewers of the video must wait for the video to be processed by the YouTube backend (asynchronously) through processing the video upload request information through many message queues for processing the video, audio, and metadata. 

Another example of a good time to use a queue could be when creating a backend for a globally used chatting system. Imagine how many messages will pass through a global chat system on average vs. on New Year’s Day. The backend could get overwhelmed with spiky chat loads and could use a message queue as a buffer for the messages. 

If many messages are building up, more subscribers (otherwise known as worker nodes) can be instantiated to handle the growing load. The important benefit of using the message queue is that the messages aren’t lost. In the worst case, they’re only delayed as opposed to being lost if a system was overloaded.

Blob Storage

With blob storage, you’re able to store binaries within a distributed system. The word BLOB in blob storage stands for binary large object. Blob storage services are typically managed by cloud providers like Amazon Web Services or Microsoft Azure. Their blob storage solutions are called Amazon S3 and Azure Blob, respectively.

The data stored within blob storage services are replicated. For example, Amazon S3 Standard (their standard S3 service) guarantees the data to be replicated across three availability zones. Additionally, you can have policies on the blob storage service to duplicate the data into another blob storage service in another geographical region asynchronously.

This would prevent the data from being lost if a single availability zone were destroyed in the real world due to something like a natural disaster. In the event of a region going down, you can have your data accessible within a separate geographically separated area. As a result, blob storage solutions are highly available and highly durable.

In terms of what to expect within a system design interview, this component is relatively simple to understand. 

Blob Storage vs. Database Storage Within System Design Interviews

While you can technically store binaries like images and files within traditional databases such as MySQL or MongoDB, it’s more optimal to use blob storage because blob storage is optimized for storing unstructured binary data.

Traditional databases are not the optimal solution for storing binaries (for many cases) in a distributed system.

How to Use Blob Storage Within a System Design Interviews

So by now, you should be able to understand when to use blob storage within a system design interview. Whenever a user is doing something like uploading a video, images, or files into your system, you should store those files in blob storage (oversimplifying here, typically you’d have some sort of post-processing).

But how do you maintain the relationship between the system’s design and the uploaded binaries?

A common solution for this that’s commonly used within system design interviews and real systems is to upload metadata to a database that contains a URL for the blob storage file location. This way, the metadata can always be used to find the uploaded file.

This is a simplistic base solution for being able to store binary data within a system that can be built upon for more complex systems.

Content Distribution Networks (CDNs)

Content distribution networks (CDNs), as the name implies, are geographically distributed networks that distribute content throughout the world. They’re used to lower the latency of retrieving static assets, globally. You can think of CDNs as distributed caches for static content.

A user will try to read static content from a CDN edge node that’s closest geographically to them (for the lowest latency). If the static content isn’t in the edge node yet, that user will have a slow read for that static content from the system’s source. 

Usually, that’ll signal to the system to propagate the requested static content to the originally used CDN’s edge node. From this point, subsequent reads from that edge node (from the same and future users near it) will be fast because the static content is cached there.

Note: Content Distribution Networks (CDNs) are usually used closely with blob storage solutions. Blob storage is usually the persistent storage solution chosen for the static assets which are pushed and pulled into the CDN (as cached data) depending on the system’s design.

Should CDN’s TTL be Long or Short?

CDNs allow you to configure something called a Time To Live, otherwise known as TTL. This is how long a specific piece of static content should live within the CDN before it’s marked as outdated. TTLs for CDNs vary in length from one day to a week.

Having too long of a length for a CDN TTL could cause a lot of data to be stored within the CDN that’s not accessed. This is going to generate more cost for a system’s design because you’re paying for unused storage. Additionally, the data could be stale or outdated which could affect the user experience.

Alternatively, having a short length for a CDN TTL could cause the CDN to be useless in practice. A short TTL causes the CDN to invalidate the static content which means it requires a new pull from the original system’s source. This would result in higher latency for the users and a poor user experience.

When to Use Polling vs. WebSockets

When I was beginning to study system design, I remember struggling immensely to understand this topic. When should I use short polling vs. long polling vs. WebSockets? Also, what’s the setup to even understand when to use either?

You use short polling, long polling, and WebSockets decisions within a system design to primarily decide how to transfer data from a server to a client. The client can always send data to the server through requests, but this problem revolves around the server(s) having and wanting to send data to the client.

We will take a look at the best cases to use short polling, long polling, and WebSockets during a system design interview through a couple of examples:

  • A payment submission confirmation system (think PayPal, Stripe, ApplePay)
  • A social media timeline (think Twitter, Instagram, or TikTok)
  • A gaming server (think League of Legends, Call of Duty, or Minecraft)

What is Short Polling

Short polling is intuitive to understand, and I would imagine it’s how most people would intuitively approach the server to client communication problems. 

Short polling is telling the client to continue to request information from a server in timed intervals until the server has the data the client is requesting.

Again, intuitively that probably makes sense, but there’s a lot to consider in terms of this ever being an optimal strategy for communicating information from a server to a client in practice.

Why You Should Never Use Short Polling

In short, you never want to use short polling to transfer data between a server to a client. Short polling is expensive for both the server and client to maintain because it wastes resources the majority of the time as long as the data isn’t present in the server yet.

Let’s go over a couple of the example cases we mentioned before for why you should never use short polling for them.

Why Short Polling Wastes Resources – Example 1

In the example of a payment submission system. Let’s say a user submits a payment and is waiting for the server to tell the user whether their payment was successful or not. 

Short polling, in this case, would ask the server through requests maybe every one or two seconds basically saying, “Is the payment done processing yet? Is the payment done processing yet? Is the payment done processing yet?” And this would happen until the server responds yes or no. 

There are so many wasted requests here that could overload a server if you had millions of concurrent users doing the same thing on a global scale. Also, as a nit, you’re not getting back a confirmation the exact moment the payment is processed because you’re querying for the answer every one or two seconds (which isn’t efficient).

Why Short Polling Is Expensive – Example 2

In the example of a social media timeline, you can probably imagine the problem. Whenever a client is connected to the server, the client is always asking “is there any new content to load into my feed?”

As a metaphor, this is like the prior example on steroids. Every user on the social media site is constantly making requests every 1 or 2 seconds for updates on their individual timelines. 

If a social media site had 100 million active users located throughout the globe, this would result in 50 to 100 million requests being sent to the system servers every second. The vast majority of the requests would return unnecessary and wasted responses (aka return back “no updates”).

There are much more efficient ways to handle this at scale.

Why Short Polling Doesn’t Scale – Example 3

A gaming server that used short polling to manage transferring data between users of the same game would be a nightmare.

Asides from being expensive, gaming servers tend to need to send real-time data which is different from the prior examples. At the very least, each of the short polling requests for this example will have useful information instead of being wasted.

Short polling for data that needs to be real-time isn’t efficient in terms of resources and isn’t effective in terms of results. 

If you’re polling every 1 to 2 seconds for data from the other players, the game would look laggy from each of your player’s perspectives. If you decrease the polling to like half a second or quarter of a second, your game wouldn’t scale well because of all the requests against your servers for all games.

When to Use Long Polling

Long polling keeps a single connection alive while waiting for a response from a server. The connection is terminated when a response is received back from the server or when the connection times out (which typically initiates another long poll connection).

Long polling is best used to retrieve a small set of information that’ll be available from a server.

A Perfect Use Case of Long Polling – Example 1

For the example of a payment processing system to return the results of a pending payment, long polling could be a great solution.

Essentially, a long poll connection is initiated by the client and that connection stays alive until the server responds back with a confirmation of the status of the payment. Alternatively, the connection could timeout and the client could restart a new connection.

Another Decent Use Case for Long Polling – Example 2

In the example of sending social media timeline updates to a user, long polling could also be used here.

Instead of spamming network calls asking the server if there are updates, the server will respond back whenever an update is available. This is a significant improvement over short polling.

A Bad Example of Using Long Polling – Example 3

For the example of a gaming server for a real-time game, long polling still isn’t a good solution here. Actually, it could arguably be no different than using short polling because of the frequency of the data being sent. 

The server essentially always has an update for the clients.

So what is a good solution for client and server communication for a real-time application like a gaming server?

What are WebSockets

A WebSocket is a communication protocol that can be used to create a 2-way communication channel between a client and a server. You can think of this as setting up two unidirectional pipes for transferring data between both a client and a server.

This means that if the client has data it wants to send to the server, it can send it. If the server has data it wants to send to the client, it can send it as well.

WebSockets vs. Polling

If WebSockets are like opening two pipes for data to flow between a client and a server, polling can be described as having a single messenger bird that has to fly between the client and server to deliver information.

In short polling, the messenger bird flies with data from the client to the server. If the server doesn’t have data to give to the messenger bird, then another messenger bird must be sent from the client at a later time.

In long polling, the messenger bird flies from the client to the server but will wait for a message from the server. A new messenger bird isn’t necessary because the current messenger bird can just wait for the server to have data before flying back to the client with the data.

When to use WebSockets in System Design

Essentially, you use WebSocket connections within a system’s design whenever you need bi-directional communication between clients and servers. Examples of this include gaming servers, chat applications, and sharing live location data for ride-sharing apps.

Taking a look at the prior examples, using a WebSocket connection for a payment system and timeline may not make the most sense. The payment system only returns a single result back. Additionally, the timeline example only needs to send information in one direction, so WebSockets aren’t necessary for this case as well.

Rate Limiters

Rate limiting is an important concept to understand for system design. Rate limiters are middleware that provides a system with logical ways to constrain the amount of traffic going into a system.

Without proper rate limiting, services within your system could become overloaded by malicious actors or unintentionally. Proper rate limiting allows you to logically constrain your system to understand how much data can flow throughout your system.

From my experience, this is one of the “nice to have” features to discuss within a system design interview if you have time, in many cases. Other components of the system design interview should probably be covered in more detail than rate limiting. 

That said, discussing rate limiting can only provide you with more positive system design interview signals.

There are various ways to set up rate limiting based on your system’s design. Some of the things you could rate limit on include:

  1. Max requests per IP address per 5 seconds (this is a bit broad though)
  2. Max writes on a social media timeline, per logged-in user, per day (this seems reasonable)
  3. Max number of requests from an internal service per minute (to protect other internal services from overloading other internal services)
  4. Max number of images that can be uploaded, per user, on an image uploading site, per hour (to prevent too many images from being stored within a system)

These are just a couple of ideas that could be worth mentioning during a system design interview or used as the basis for other throttling limits.


Monitoring a system is important because it allows you to observe a system’s health. This is especially important when something in the system breaks and requires remediation. Without monitoring, you’ll be unable to debug issues easily and possibly prevent preventable issues.

Monitoring typically includes logging data and alerting based on certain monitor metrics.

Looking back at my own performance during system design interviews, I think that monitoring was one of my weakest parts. The interviewers had to ask me about what I thought about monitoring for pretty much all my system design interviews, and I think it would’ve been much better if I brought it up.

Learn from my mistakes. Monitoring is important for the system design interview and worth talking about once your end-to-end system is designed.

What Should You Monitor In a System Design Interview

Fundamentally, every component of a system should be monitored. However, each of the components could have different metrics to monitor. For example, a cluster of servers acting as services may want to monitor things such as CPU usage, memory used, and QPS. This could help in aiding whether you want to add or remove more service servers.

For monitoring something like data within a database, it might be helpful to think about things from a business perspective. What kind of data should be monitored going into the database? If you’re working with financial data in a financial system, maybe you want to monitor, log, and potentially alert the amounts of money being sent from each user to detect fraud.

You might want another monitor with alerting on a database from a technical perspective to know if a database is getting near its full capacity. This could bring to light a hotspot problem that nobody knew existed (as we discussed before).

If you’re using a message queue, maybe there’s a certain type of task that keeps trying to retry. You would definitely want an alert to trigger based on monitoring this case, especially if it’s a mission-critical task that’s not processing correctly.

Every component can be monitored from both a technical and business perspective within a system’s design. It’s probably worth mentioning the highest priority, mission-critical monitoring you can do within a system design interview.

Learn The Top 10 System Design Interview Question Templates

Phew, so you go through the fundamentals section! Congrats, it was a doozy! But it all gets easier from here after you have a decent understanding of the fundamentals.

Now, we’re going to go over the top 12 system design interview questions I think are relevant to learn. These questions cover a good breadth of patterns that are fair game to be asked within a system design interview, so you must understand them at least at a basic level.

It’s important to understand that you must understand both the feature, through the requirements gathering, and the sample implementations. If an interviewer asks you to design a group chat application like Telegram, and you don’t understand the features of Telegram, then you’re already setting yourself up for failure.

If I’m being honest, you could probably just YouTube video solutions to all these topics because they’re popular system design interview-type questions.

Design a TinyURL Service

TinyURL is a service that converts a long URL into a tiny url, as the name suggests. The main issue here is understanding the different ways of generating tiny urls to serve to users in a fast way.

In a non-distributed system setting, this is simple. In a distributed system, issues come up with making sure you’re not storing duplicated shortened URLs and also still have a highly available system. This problem is a bit too thin to be asked alone within a system design interview, but it’s a nice problem to understand thoroughly as a beginner.

The concept of generating unique ids within a distributed system in this problem could apply to other problems which are more intensive.

Design a Distributed Web Crawler

It’s important to understand that some system design questions don’t necessarily deal with user requests. A great example of this is a distributed web crawler which navigates the internet for information.

If I was asked to design a distributed web crawler and haven’t practiced or seen an example of it, I’d surely fail the interview upon being given the problem prompt. There are a lot of use cases for web crawlers, and a lot of ways to structure them in a distributed manner.

Design Instagram Newsfeed

This is the standard system design interview problem I think everyone should understand inside and out. Within many of the possible system design solutions, the majority of the fundamental system design components will be used.

Some key patterns to look out for within this problem include potentially precomputing the feeds for each user, and how to update the feeds for each user. In addition to that, you want to try to avoid a hotspot problem for accessing and storing data.

Design an Airline Ticket Reservation System

I won’t name the company, but I was asked a really similar question to this. I completely bombed it because I didn’t understand the fundamental problem of the question because I’d never seen an example of it before. So, don’t be me.

If I had time to think about the problem independently, I could figure out some logical solutions. However, with the pressure of a live interview, I took a loss on this one.

The main problem here is understanding that tickets for airlines are first reserved before they’re purchased. You need a way to represent this within a system that is also supposed to be highly available. If you’re doing anything with reservations before purchasing something, that implies that the system needs to be a highly consistent system as well because you don’t want two people to be able to buy or reserve the same ticket.

The main issue relates to the CAP theorem, as we discussed above. You can optimize a system for either CP or AP, but this problem demands the whole thing: CAP. It’s a tradeoff nightmare if you’re unprepared, and that’s before you even get into the implementation.

Design Google Search Autocomplete System

This problem is about designing the autocomplete feature of Google Search. Whenever a user types a key, there are suggestions for the completion of the search query.

This problem involves representing autocomplete queries in an efficient way, potentially pre-computing results, and caching results. More abstractly, this problem also is called the top-K problem if you wanted to look up variations of it.

Design Yelp’s Find K Nearest Restaurants

For review websites like Yelp, you’re usually able to find the nearest K businesses of a certain type by giving an input location. Typically, a user will be using the site on a mobile device and be in a stationary location when the query is being run.

This problem is tricky because it’s hard to represent location data in a queryable way. Most people (like me, at first glance) would represent a location as a longitude and latitude location which is stored within a standard SQL or noSQL database. 

Storing the data would be highly efficient and simple. However, querying for this data within a massive database would be a computational nightmare.

This problem introduces ways of dealing with 2-dimensional data (in this case representing location) in an efficient manner through the use of geohashing or data structures like quadtrees. They sound like complex topics, but they’re actually quite simple to understand.

Design Telegram

This problem revolves around instant messaging. This problem could’ve been about designing Facebook messenger or WhatsApp as well.

There are interesting design patterns to learn from this system which essentially requires messages to be delivered between users in a fast way. This is a prime example of a problem where WebSockets could be a great option for connecting users to each other through servers.

Design Uber

Designing a system like Uber or GrubHub is a challenging system design problem. It incorporates real-time communication between drivers and customers in terms of messaging and location data. In addition, it also usually involves a matching system for making orders.

Depending on your interviewer and how well you prepare, this problem can be extended to be infinitely more complex.

Learn What to Expect In a System Design Interview

Now that we’ve discussed both the fundamental components of system design and top system design interview questions, we need to discuss the system design interview itself.

For the most part, you’ll be expected to direct the entire interview process yourself. While the system design interview might be pitched as a collaborative experience, from my experience, it really isn’t. It’s the interviewer taking notes in silence while you’re talking 95% of the time.

The system design interview is a relatively open-ended interview. You could technically discuss anything you want. However, as usual in the free market, efficient strategies for directing the interview have started to form.

In a system design interview, you’ll want to clarify the problem, define the problems you’ll be solving, understand the constraints and requirements, roughly discuss data models and APIs, and then design a complete high-level system.

How Long is a System Design Interview?

It’s important to remember that a system design interview is roughly 45 minutes in total. That means you’ll have 30 to 40 minutes typically to design your system after including time for questions and introductions.

How to Begin a System Design Interview

I think a lot of people will overlook preparing for the beginning of a system design interview. In terms of preparation, much of anyone’s focus will likely be the high-level design of a system, and that does make sense.

However, it’s important to understand that the system design interviewer will be evaluating you on much more than just your technical ability. You have to demonstrate that you can comprehend the problem while demonstrating consideration for the system as if you were developing a real system.

Clarifying The Problem and Picking The Focus

The first thing that’s understood as universally great advice is to clarify the problem statement. The problem statement is going to be broad many times. If it’s not, it is. So, make sure to ask clarifying questions, especially if you don’t understand the question.

Along with clarifying the problem, is identifying constraints of the system and understanding the users.

Here are some decent base questions to ask, but you should try to ask more specific questions if you can:

  1. How many daily active users does this system/website have?
  2. How many people will be reading timelines? (basically questioning all interactions)
  3. What’s the average amount of posts a user will post a day?
  4. Should I focus on just text messaging instead of features to allow users to upload images and videos?
  5. What’s the max number of friends a user can have?

From my experience, you’d want to ask the questions from the perspective of the interviewer being a product manager. That means that your questions towards your interviewer should be product focused instead of developer-focused.

Gathering Requirements

After you’re done clarifying the problem and picking the focus of your system design interview, you’ll want to gather the functional and non-functional requirements.

What are functional and non-functional requirements?

Functional requirements are requirements that typically deal with the functionality of your system from a user perspective. In other words, it’s a focus on the behaviors.

Examples of functional requirements include:

  1. The ability to login
  2. The ability for one user to message another user
  3. A user should be able to see a timeline
  4. A user should be able to make a post

Nonfunctional requirements are requirements for the system that is operational instead of behavioral.

Examples of non-functional requirements include:

  1. Low latency
  2. High availability
  3. High consistency
  4. Fault tolerance
  5. Reliable
  6. Scalable
  7. Extendable
  8. Secure

As a tip or cheat code, most of the time nonfunctional requirements always include low latency, fault tolerant, reliable, and scalable. Because of the CAP theorem, the debate is usually over high availability vs. high consistency.

Once you have the requirements, you’ll want to draft out a rough data model and perform back of the hand calculations for estimating the system’s specific requirements for things like storage and bandwidth.

The calculations can be rounded heavily. They don’t have to be perfect, but they should be accurate within 1 or 2 orders of magnitudes. I’ve seen people in mock interviews and YouTube videos do the back of the hand calculations, just to do them (because it’s a part of the current meta/process). However, I think there’s value within these calculations if you can use them to help direct your system’s design. These estimations would be used within a real system after all.

How to Create a High-Level Design For System Design

In terms of performing well on the high-level design portion of the system design interview, you’ll want to make sure that you mind your time and have a complete end-to-end solution done by the end of the interview.

How to Stay Organized Within The High-Level System Design

My best advice is to make sure you start small and scale up as necessary.

This means that if you’re developing a Facebook post and timeline system, you’ll want to start with a user accessing an API that hits one server. From there, you add a database. From there you add multiple servers behind a load balancer. From there, you’ll want to make the database highly available. You probably get the point.

I find this very helpful because it allows you to identify all the bottlenecks and address them one by one while making your system more and more complex over time. It essentially starts as a complete system but improves over time towards the requirements you’ve discussed with your interviewer.

Why You Should Always Be Talking Within a System Design Interview

Fundamentally, the interviewer is trying to evaluate you based on signals. If you draft a perfect system from the start but don’t vocalize your thought process and tradeoffs you made in your head, you could fail the system design interview because of the lack of a positive signal.

Yes, you can pause and think. That’s different.

But you should be discussing every tradeoff you’re making, especially if it’s between multiple sub-system designs. 

Also, you’ll want to make sure you’re not lazy with your explanations, especially if you understand what you’re doing.

For instance, don’t add a load balancer between the user and multiple servers and say, “Here I’ll add a load balancer for a lot of the servers and use consistent hashing,” before then talking about your database. Why is the load balancer necessary (even if you think it’s obvious)? Also, why are you using consistent hashing?

As an interviewer, I’m not sure if you know why you’re doing what you’re saying. If I was a nice interviewer, I would pause and ask you. However, I personally think it would be better if they didn’t have to ask you at all.

It’s just something to make sure you’re aware of moving forward because it could ruin your interview performance when you would’ve otherwise aced the system design interview.

Similar Posts