Practice Coding and System Design mock interviews with Senior Software Engineers from Silicon Valley

Tuesday, February 11, 2020

The Retry Pattern in System Design and Architecture interviews

The Retry pattern enables an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application. Explaining the Retry pattern together with the Circuit Breaker pattern come up often during the System Design interview. For infrastructure interviews at Google, Facebook, Amazon, etc. being familiar with these patterns is a strong requirement.

Context and problem

An application that communicates with elements running in the cloud has to be sensitive to the transient faults that can occur in this environment. Faults include the momentary loss of network connectivity to components and services, the temporary unavailability of a service, or timeouts that occur when a service is busy.

It is common to be asked to handle such faults during the system design interview.

These faults are typically self-correcting, and if the action that triggered a fault is repeated after a suitable delay it's likely to be successful. For example, a database service that's processing a large number of concurrent requests can implement a throttling strategy that temporarily rejects any further requests until its workload has eased. An application trying to access the database might fail to connect, but if it tries again after a delay it might succeed.

System Design options and tradeoffs

In the cloud, transient faults aren't uncommon and an application should be designed to handle them elegantly and transparently. This minimizes the effects faults can have on the business tasks the application is performing.

If an application detects a failure when it tries to send a request to a remote service, it can handle the failure using the following strategies:

  • Cancel. If the fault indicates that the failure isn't transient or is unlikely to be successful if repeated, the application should cancel the operation and report an exception. For example, an authentication failure caused by providing invalid credentials is not likely to succeed no matter how many times it's attempted.
  • Retry. If the specific fault reported is unusual or rare, it might have been caused by unusual circumstances such as a network packet becoming corrupted while it was being transmitted. In this case, the application could retry the failing request again immediately because the same failure is unlikely to be repeated and the request will probably be successful.
  • Retry after delay. If the fault is caused by one of the more commonplace connectivity or busy failures, the network or service might need a short period while the connectivity issues are corrected or the backlog of work is cleared. The application should wait for a suitable time before retrying the request.

For the more common transient failures, the period between retries should be chosen to spread requests from multiple instances of the application as evenly as possible. This reduces the chance of a busy service continuing to be overloaded. If many instances of an application are continually overwhelming a service with retry requests, it'll take the service longer to recover.

If the request still fails, the application can wait and make another attempt. If necessary, this process can be repeated with increasing delays between retry attempts, until some maximum number of requests have been attempted. The delay can be increased incrementally or exponentially, depending on the type of failure and the probability that it'll be corrected during this time.

The following diagram can be used in a system design whiteboard interview to illustrate invoking an operation in a hosted service using this pattern. If the request is unsuccessful after a predefined number of attempts, the application should treat the fault as an exception and handle it accordingly.

The application should wrap all attempts to access a remote service in code that implements a retry policy matching one of the strategies listed above. Requests sent to different services can be subject to different policies. Some vendors provide libraries that implement retry policies, where the application can specify the maximum number of retries, the time between retry attempts, and other parameters.

An application should log the details of faults and failing operations. This information is useful to operators. If a service is frequently unavailable or busy, it's often because the service has exhausted its resources. You can reduce the frequency of these faults by scaling out the service. For example, if a database service is continually overloaded, it might be beneficial to partition the database and spread the load across multiple servers.

Issues and considerations to discuss in the system design interview

You should consider the following points when deciding how to implement this pattern.

The retry policy should be tuned to match the business requirements of the application and the nature of the failure. For some noncritical operations, it's better to fail fast rather than retry several times and impact the throughput of the application. For example, in an interactive web application accessing a remote service, it's better to fail after a smaller number of retries with only a short delay between retry attempts, and display a suitable message to the user (for example, “please try again later”). For a batch application, it might be more appropriate to increase the number of retry attempts with an exponentially increasing delay between attempts.

An aggressive retry policy with minimal delay between attempts, and a large number of retries, could further degrade a busy service that's running close to or at capacity. This retry policy could also affect the responsiveness of the application if it's continually trying to perform a failing operation.

If a request still fails after a significant number of retries, it's better for the application to prevent further requests going to the same resource and simply report a failure immediately. When the period expires, the application can tentatively allow one or more requests through to see whether they're successful. We will discuss this strategy called the Circuit Breaker system design pattern in a future blog post.

Consider whether the operation is idempotent. If so, it's inherently safe to retry. Otherwise, retries could cause the operation to be executed more than once, with unintended side effects. For example, a service might receive the request, process the request successfully, but fail to send a response. At that point, the retry logic might re-send the request, assuming that the first request wasn't received. For example, creating an account or processing a payment are not idempotent.

A request to a service can fail for a variety of reasons raising different exceptions depending on the nature of the failure. Some exceptions indicate a failure that can be resolved quickly, while others indicate that the failure is longer lasting. It's useful for the retry policy to adjust the time between retry attempts based on the type of the exception.

Consider how retrying an operation that's part of a transaction will affect the overall transaction consistency. Fine tune the retry policy for transactional operations to maximize the chance of success and reduce the need to undo all the transaction steps.

Ensure that all retry code is fully tested against a variety of failure conditions. Check that it doesn't severely impact the performance or reliability of the application, cause excessive load on services and resources, or generate race conditions or bottlenecks.

Implement retry logic only where the full context of a failing operation is understood. For example, if a task that contains a retry policy invokes another task that also contains a retry policy, this extra layer of retries can add long delays to the processing. It might be better to configure the lower-level task to fail fast and report the reason for the failure back to the task that invoked it. This higher-level task can then handle the failure based on its own policy.

It's important to log all connectivity failures that cause a retry so that underlying problems with the application, services, or resources can be identified.

Investigate the faults that are most likely to occur for a service or a resource to discover if they're likely to be long lasting or terminal. If they are, it's better to handle the fault as an exception. The application can report or log the exception, and then try to continue either by invoking an alternative service (if one is available), or by offering degraded functionality. For more information on how to detect and handle long-lasting faults, look up the Circuit Breaker system design pattern.

When to use this in the system design interview

Use this pattern when an application could experience transient faults as it interacts with a remote service or accesses a remote resource. These faults are expected to be short lived, and repeating a request that has previously failed could succeed on a subsequent attempt.

This pattern might not be useful:

  • When a fault is likely to be long lasting, because this can affect the responsiveness of an application. The application might be wasting time and resources trying to repeat a request that's likely to fail.
  • For handling failures that aren't due to transient faults, such as internal exceptions caused by errors in the business logic of an application.
  • As an alternative to addressing scalability issues in a system. If an application experiences frequent busy faults, it's often a sign that the service or resource being accessed should be scaled up.
Discussing the the retry and circuit breaker patterns comes up frequently in system design and architecture interviews at Google, Microsoft, Facebook, Twitter, Uber and Amazon.

Sunday, February 9, 2020

Binary Search

In this blog post we will not explain what binary search is or how it works. Instead, we will focus on how to detect a binary search solution in a coding interview question and how to implement it properly.

When to binary search in a coding interview?

We can use binary search almost every time we deal with a monotonic function. A monotonic function is a function between ordered sets that preserves or reverses the original order.

A function f(x) = y that satisfies f(x') <= f(x'') for any x' < x'' is monotonic. We used "<=" as the order relationship, but we can also use ">=" or a custom defined order relationship.

f(x) = 3x+1 is monotonically increasing.
f(x) = -x+5 is monotonically decreasing.
a sorted array arr[] = {2, 5, 9, 10, 12, 17} is a monotonically increasing discrete function!

We should consider using binary search whenever we are given a sorted input, a timeline, or generally speaking, a monotonic function.

Some coding interview practice questions could look like this:

Given a sorted array of integers A and a random number Q, find out if Q exists in A.
Find the minimum of a given U-shaped function or the peak in a mountain array.

The textbook binary search is flawed

Let's start with the simple problem of finding a certain number Q in the sorted array A. In the programming interview, most candidates will implement something like this:

def find(arr, q):
  left = 0
  right = len(arr)
  while (left < right):
    mid = (left + right) / 2
    if arr[mid] == q:
      return true
    if arr[mid] > q:
      right = mid - 1
      left = mid + 1
  if right >= 0 and arr[right] == q:
    return true
  if arr[left] == q:
    return true
  return false

The problem with this implementation is that at the end of the while loop we don't know precisely if left == right or left > right. We also don't know if we are still within the boundaries of the array. Let's take a few examples:

[left, right] = [3, 4] => mid = 3
we may end up with [left, right] = [3, 2]

[left, right] = [0, 1] => mid = 0
we may end up with [left, right] = [0, -1]

This is where almost every candidate struggles in the programming interview, and, in most cases fails to handle all the possible edge cases.

Lower bound / upper bound

If, instead of asking to find an exact value Q, we now ask for an upper bound or lower bound (nearest value in the array that is >= Q or <= Q) then we end up with even more complex edge cases.

Specifically, when we move the pointers to mid-1 or mid+1, we leave arr[mid] out of the potential solutions space. But arr[mid] may still be a viable solution. In our interviewing experience, almost every attempt the candidate made to address this has resulted in an infinite loop.

Also, once we are out of the while loop, we won't be able to tell whether arr[left], arr[right], or neither of them is the value we are looking for.

The source of all these corner cases comes from the way we move the pointers:

left = mid + 1
right = mid - 1

Let's try to fix this so that we never leave mid out and we never end up with the confusing end state where left > right.

  • First, keep in mind that whenever we move the pointers, our goal is to generate a smaller interval than the original [left, right]. This is obvious when left and right are far apart, but can be confusing when they get very close.
  • Second, the new binary search interval should be roughly half the size of the previous one (to achieve the logarithmic time complexity).
  • Third, notice that the following relationship stands no matter what:

    left <= mid < right
    or even better:
    left <= mid < mid + 1 <= right

So, if we split the [left, right] interval into [left, mid] and [mid+1, right], then we check all 3 conditions from above, and, according to the third condition, it's guaranteed that left <= right always, and left == right at the end!

Let's implement it:

def find(arr, q):
  left = 0
  right = len(arr)
  while (left < right):
    mid = (left + right) / 2
    if arr[mid] <= q:
      right = mid
      left = mid + 1

  # at this point we know that left == right
  return arr[left] == q

Short, clean, without edge cases! Easy to adapt when looking for an upper or lower bound. Precisely what we need in a software engineering coding interview.

Of course, we can still check and return early if we find arr[mid] == q. However, this is an optimistic optimization and only provides a real benefit in particularly lucky situations.

def find(arr, q):
  left = 0
  right = len(arr)
  while (left < right):
    mid = (left + right) / 2
    if arr[mid] == q:
      return true
    if arr[mid] < q:
      right = mid
      left = mid + 1
  return arr[left] == q

Binary search coding interview practice questions

  • Find first or last occurrence of a given number in a sorted array
  • Count occurrences of a number in a sorted array with duplicates
  • Find smallest missing element from a sorted array
  • Search an element in a circular sorted array

Thursday, January 23, 2020

Gateway Routing system design pattern

Routing requests to multiple services using a single endpoint. This pattern is useful when you wish to expose multiple services on a single endpoint and route to the appropriate service based on the request. The pattern may come in handy in system design interview problems.

Context and problem

When a client needs to consume multiple services, setting up a separate endpoint for each service and having the client manage each endpoint can be challenging. For example, an e-commerce application might provide services such as search, reviews, cart, checkout, and order history. Each service has a different API that the client must interact with, and the client must know about each endpoint in order to connect to the services. If an API changes, the client must be updated as well. If you refactor a service into two or more separate services, the code must change in both the service and the client.

Despite the differences between them, all these API endpoints have some common requirements like a secure connection, session and/or authentication token management, rate limiting, request tagging and header validations, etc.


Place a gateway in front of a set of applications, services, or deployments. Use application Layer 7 routing to route the request to the appropriate instances.

With this system design pattern, the client application only needs to know about and communicate with a single endpoint. If a service is consolidated or decomposed, the client does not necessarily require updating. It can continue making requests to the gateway, and only the routing changes.

A gateway also lets you abstract backend services from the clients, allowing you to keep client calls simple while enabling changes in the backend services behind the gateway. Client calls can be routed to whatever service or services need to handle the expected client behavior, allowing you to add, split, and reorganize services behind the gateway without changing the client.

Diagram of the Gateway Routing pattern

This system design pattern can also help with deployment, by allowing you to manage how updates are rolled out to users. When a new version of your service is deployed, it can be deployed in parallel with the existing version. Routing lets you control what version of the service is presented to the clients, giving you the flexibility to use various release strategies, whether incremental, parallel, or complete rollouts of updates. Any issues discovered after the new service is deployed can be quickly reverted by making a configuration change at the gateway, without affecting clients.

Issues and considerations

  • The gateway service may introduce a single point of failure. Ensure it is properly designed to meet your availability requirements. Consider resiliency and fault tolerance capabilities when implementing.
  • The gateway service may introduce a bottleneck. Ensure the gateway has adequate performance to handle load and can easily scale in line with your growth expectations.
  • Perform load testing against the gateway to ensure you don't introduce cascading failures for services.
  • Gateway routing is level 7. It can be based on IP, port, header, or URL.

When to use this system design pattern

Use this system design pattern when:
  • A client needs to consume multiple services that can be accessed behind a gateway.
  • You wish to simplify client applications by using a single endpoint.
  • You need to route requests from externally addressable endpoints to internal virtual endpoints, such as exposing ports on a VM to cluster virtual IP addresses.
  • This system design pattern may not be suitable when you have a simple application that uses only one or two services.


The gateway is commonly implemented as a proxy layer at the same level where the load balancer sits. In Kubernetes for example you could use an Ingress controller as a gateway service.

Using Nginx as the router, the following is a simple example configuration file for a server that routes requests for applications residing on different virtual directories to different machines at the back end.

server { listen 80; server_name; location /app1 { proxy_pass; } location /app2 { proxy_pass; } location /app3 { proxy_pass; } }

To better prepare for the system design interview, we recommend reading these articles about industry proven implementations of the gateway system design pattern at scale:

Fibonacci and Matrix Exponentiation

Problem statement
The Fibonacci numbers, commonly denoted Fn form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is
for n > 1.
The beginning of the sequence is thus:

A common coding interview question is to compute the n-th Fibonacci number.

Naive solution

A naive way of implementing F(n) is to use recursion:

def fibonacci(n):
  if n <= 1:
    return n
  return fibonacci(n - 1) + fibonacci(n - 2)

Such an implementation leads to an exponential runtime. The issue is that we compute the same values over and over again. Let's take a look at the recursion tree:

We can quickly estimate an upper bound for the time complexity as O(2^n). A tight bound Θ((1+sqrt(5)/2)^n) = (Fib(n)) can be determined by using generating functions and the golden ratio.

Linear solution

We can easily improve the previous solution by using memorization to avoid recomputing the same value multiple times:

fib = {
  0: 0,
  1: 1

def fibonacci(n):
  if n in fib:
    return fib[n]

  fib[n] = fibonacci(n - 1) + fibonacci(n - 2)
  return fib[n]

The time complexity is now linear O(n), but we also using O(n) memory. Let's rewrite the solution iteratively:

def fibonacci(n):
  fib = [0, 1]
  for i in xrange(2, n + 1):
    fib.append(fib[i - 1] + fib[i - 2])
  return fib[n]

Note how at each step we only need to look at the previous two values. Thus we can achieve the same result with constant O(1) additional memory:

def fibonacci(n):
  if n <= 1:
    return n

  tmp0 = 0
  tmp1 = 1
  for i in xrange(2, n + 1):
    curr = tmp0 + tmp1
    tmp0 = tmp1
    tmp1 = curr
  return tmp1

Optimal solution

The linear solution is the one described in most coding interview training resources. But can we do better? As it turns out we can actually solve the problem in logarithmic time. Let's see how.

We will rewrite the Fibonacci formula using simple matrix algebra:

which is also equivalent to

now all we need to do is compute the matrix exponentiation which can be implemented in O(logN) time. The pseudocode implementation is not too difficult either:

power(result_matrix, base_matrix, exponent):
  if exponent == 1:
    result_matrix = base_matrix.copy()

  if exponent % 2 == 1:
    power(result_matrix, base_matrix, exponent - 1)
    result_matrix = result_matrix * base_matrix
    power(result_matrix, base_matrix, exponent / 2)
    result_matrix = result_matrix * result_matrix

  base_matrix = [ [1,1], [1, 0] ]
  result_matrix = []
  power(result_matrix, base_matrix, n)

  return result_matrix[0][1]

Solving linear recurrence problems

linear recurrence relation is an equation that defines the n^\text{th} term in a sequence in terms of the k previous terms in the sequence. The recurrence relation is in the form:
Where each c_i is a constant coefficient.

The Fibonacci problem is a particular case of a linear recurrence of a 2nd degree with both coefficients equal to 1. The matrix exponentiation solution can be used in solving any linear recurrence problems. For example if we had to solve:


then we can build the matrix:

[  6,  -12,  8  ]   [ Xn-1 ]   [ Xn   ]
[  1,   0,    0 ] x [ Xn-2 ] = [ Xn-1 ]
[  0,   1,    0 ]   [ Xn-3 ]   [ Xn-2 ]

For a recurrence of k-th degree the matrix multiplication takes O(k^3) and the exponentiation takes O(logN) time, therefore we can solve the general problem in O(k^3 * logN) time.


The logarithmic solution is only viable when we need to compute a single value for a given n. If, for example, we are asked to compute all the Fibonacci numbers up to and including n, then the linear solution is the obvious right choice.

Saturday, January 18, 2020

How relevant are the coding interview questions?

"Have you ever used any of these fancy algorithms in any of your industry roles?" - It's a question that comes up every now and then. In most of the cases, the answer is "No".

So then, why do we ask such tough algorithmic problems? It turns out, there is a strong correlation between doing well in the coding interview and being a successful software engineer. Here's a few reasons why:

  • Someone who can quickly solve such a complicated problem will find it easy to deal with the daily small tasks.
  • The debugging skills required during the coding interview are transferrable to industry programming.
  • Good test coverage and edge case handling is critical in both situations.
  • Comparing different approaches in terms of time and space complexity is something you will frequently encounter in your career.
  • Writing simple, readable, nicely spaced and properly aligned code is a must when working in a large project.
  • Your interaction with the interviewer translates into your ability to pair program.
  • the list goes on...
But why an algorithmic question? Why not a different type of question? Wouldn't a small app/project/task be more relevant in this context?

This has a lot to do with the cost per hire. Large companies spend over 50k USD per hire. Think 50 phone screens and maybe 10 on-site interview rounds only to get one engineer to pass the bar and accept the offer. It all adds up in terms of costs: recruiting efforts, engineering time, travel expenses, lodging, etc. We need a quick, yet good enough way to identify potential hires from prospects. The concept of a 40-minute algorithmic interview proved to be quite effective and this model has been widely adopted.

Unfortunately, the correlation between a good problem solver and a solid engineer only works one-way. There are many amazing engineers who simply do not perform well in solving algorithmic questions. This is where the process gets really flawed. An experienced interviewer should be able to pick up on this, maybe twist the question a bit, and still get enough relevant signals to make a hire/no hire decision. At the same time, a less experienced interviewer may be tempted to simply reject the candidate.

Is the current format the best option for the coding interview? Maybe. Maybe not. It's certainly far from perfect. But it's good enough and it is cost effective. And it will continue to play an important role in the early stages of the interview process. So make sure to take some time to brush up on those CS Fundamentals and do some coding interview practice every once in a while.