Today, we are going to learn how we can create DropBox – System Design Or GoogleDrive, Or OneDrive.
I can guarantee that most of you have used one of the many services mentioned above. Just in case you don’t know these services are used to store the data online in the servers so we can access them across devices.
- What is DropBox?
- DropBox – System Design Journey
- DropBox – System Design – Current
What is DropBox?
Dropbox gives comfortable access to your file anywhere in the world. You can collaborate with buddies, circle of relatives, and coworkers from any device. You may fast send any document—large or small—to everyone, even though they don’t have a Dropbox account. Preserve your files privately with multiple layers of protection from the provider relied on by tens of millions. And you can also manage tasks, tune report updates, and live in sync with your groups and clients.
- Write a client that uses fewer resources.
- Low Bandwidth users and mobile app constraints (imagine a speed of 256kbps on mobile data).
- Write Volume (High read to write ratio (almost 1:1), with multi petabytes cache).
- ACID requirements.
DropBox – System Design Journey
Suppose it’s 2007, how are you going to design it?
DropBox – System Design – Initial High-Level Design
Well do not laugh at it, that’s how you start a start-up. But this is the cleanest design you can ever see where the file is stored in your local server. Please respect the humble beginning of the company.
So what is going to break once we get more users? To be frank, the above design is neither reliable nor can handle a multitude of users, but the first thing that is going to break is storage, we are going to run out of it. So how do we solve it? We added space by using third-party services more specifically S3 of Amazon.
Meta Servers and Block Servers
As you can see the client is polling the server – see the arrow is pointing in one direction only, this is not a good practice and increase the load on the server. And that is my buddy is our next problem to solve. And how did we solve it, we added a new notification(not) server as you can see below, so we can start pushing the notification to the end-user. We also split the server into two parts, one in managed hosting which is doing all the metadata calls (meta servers), and another one in AWS hosting for all the file uploads (Block Servers). Meta Server is also responsible to figure out the change set for different clients and broadcast it to them using Notification Service
So our block server can directly call the DB queries, but because of the geographical distance between the two of them, there was a little bit of latency. Also, the DB queries started to become more complex with time with a bunch of complicated stored procedures, functions, etc. The other issue is we have one DataBase and it’s tough to scale, the good thing to do is to add some sort of cache somewhere.
To resolve these issues, we added a bunch of meta servers, block servers, and a load balancer to reduce latency and do all DB calls via meta server and also added a cache server.
DropBox – System Design – Launched High-Level Design
Caching and ACID transactions
Fundamental architecture is the same and hasn’t changed since. The only changes are there are multiple instances of DB, Memcache, and other servers and balancers. But the fundamental is almost the same. We have some services on top of it like providing sync etc that we will come to it later on.
The Memcache is also modified here because of how Memcached is worked, that if one of the servers is down it moves down to another server, which is great for availability but it is not so great for consistency because one server may think it is down but other may not. This can cause inconsistency and we have to solve that as well.
Since our firm uses python, it has a great feature called global interpreter lock. This allows you to run one thread at a time to achieve high throughput. But the main problem was the use of multiple load balancers. We also have to make sure it has one thread across the LBs, so it is highly available. How do we do it? We resolve it by having a copy of each load balancer. So if one dies its copy comes to life and we have high availability.
Also, the Least Recently Used (LRU) eviction policy is used for caching data. It allows us to discard the keys which are least recently fetched.
The next problem was building a distributing database. We have to build a system where we can make sure a transaction is done and adhere to ACID properties, so we build it from the ground up and make sure we follow proper ACID properties, generally a two-phase commit.
DropBox – System Design – Current
As mentioned above, the device wishes to cope with the massive extent of read and write operations and their ratio will remain nearly the same. Hence while designing the system, we should be cognizant of optimizing the data flow between consumer and server.
We used this service to divide the files into 4MB chunks to avoid duplication. We then map it to hash and if the hash is the same it is stored against the same object id in S3.
For example, if a 1 GB file is modified at three different places then there are 3 GB of data that need to be transferred. Chunkers make sure that only that 4MB X 3 data transfer should be done that has been edited. You can easily see 3 GB is reduced to 12MB.
Watcher monitors for file changes in a workspace like updating, creating, delete of files and folders. The watcher notifies the Indexer about the changes.
Indexer listens for the events from the watcher and updates the Client Metadata Database with information about the chunks of the modified file.
This it’s self a very detailed topic but in a nutshell, we deploy exact replicas of our service in the data centers across multiple geographical locations. So if one goes down, there is a replica of it to serve it.
Also if you are giving an interview, just remember page load time is not that much important for them. The service that runs behind the scenes should be available, reliable, and scalable.
Please go through our cornerstone content as well:
Check out more posts below:
Token and Leaky Bucket Implementation and Explanation or rather rate limiting is a technique used to control the rate of traffic on a network. These are implemented using Congestive-Avoidance Algorithms (CAA), namely Token and Leaky Bucket algorithm.
Software system monitoring tools are essential for ensuring the smooth and efficient operation of any software system. These tools are designed to continuously monitor the performance and status of a system, alerting administrators to any issues or potential problems that may arise. There are many different types of software system monitoring tools available, each with…
Let us divide Parking Lot – System Design HLD into two main points, one is entry and other is exit. In entry we should provide tickets to end user by querying the DB and save / update the details accordingly. And in exit the payment is processed and vehicle is marked for exit and parking…
A Race Conditions In Distributed Or Multithreaded System generally occurs when there are multiple threads or clients trying to access the same resource at one point of time resulting in an inconsistent, unavailable and faulty behaviour.