Photo by JESHOOTS.COM on Unsplash
System Design (Jargon)
In this blog, we break down all the tricky terms and complicated stuff, making it easy for you to understand how digital systems are put together.
Table of contents
- HLD (High-Level Design)
- LLD (Low-Level Design)
- Monolithic Architecture
- Distributed System
- Latency
- Throughput
- Availability
- Replication vs Redundancy
- CAP Theorem
- Lamport (Logical clock)
- Scalability (Horizontal vs Vertical)
- Load Balancer
- Caching
- Cache Eviction
- ACID
- File Based Database Management System (DBMS)
- Relational Based Database System (RDBMS)
- NO-SQL DATABASE
- Polyglot Persistence
- Normalization and Denormalization
- Indexing in Database
- Synchronous Communication
- Asynchronous Communication
- Message Based Communication
- Web Server
- Communication protocols
- API
- SOA (Service Oriented Architecture)
- Authentication and Authorization
- Forward and Reverse Proxy
- Bibliography
System design, in simple terms, refers to the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.
There are two types of system designs; let's discuss each of them:
HLD (High-Level Design)
Describes the main component that would be developed for the resulting product.
The system architecture, database, services and processes, the relationship between several modules, design and features.
LLD (Low-Level Design)
Describe the design of each element mentioned in the HLD in detail.
Classes, interfaces, relationships between different classes and actual logic of various components.
Now, let's delve into various aspects of System design, such as Monolithic architecture, microservices, latency, sync-async, throughput, proxy, authentication, CAP, replication, redundancy, etc.
Monolithic Architecture
Architecture is the internal design details for building the application. The most common architecture is Monolithic. Consider a web application; if all three tiers (front-end, back-end, and data storage) are created and deployed in the same place, it is termed as a monolithic app, i.e., a single code base. It is less complex, easy to understand, and has higher productivity. It is also known as a centralized system.
Advantages:
Fewer network calls as all modules are in the same place.
Easier to secure.
Integration testing is easier.
Disadvantages:
Single Point of Failure.
Even if a single module is updated, the whole system has to be updated.
Tightly coupled, supports the same language or framework else issues arise.
Distributed System
A distributed system is a collection of multiple individual systems connected through a network that share resources, communicate, and coordinate to achieve common goals. It also supports replication of the individual system, so if one fails, its copy is ready to be in use, i.e., it is partition-tolerant.
Advantages:
Scalable, horizontally scalable, can be distributed on different servers/machines rather than increasing the configuration of a main machine which has a limit to a point.
No Single Point of Failure (SPOF).
Low latency (if I'm in India and another user in the USA, we can spin a server near the USA for their users).
Disadvantages:
Complex.
Requires Management (load balancers).
Difficult to secure (many nodes/servers).
Latency
Latency typically refers to the delay or time lag introduced at different stages of processing or communication within a system. There are mainly two types: Network Latency and Computational/Process latency. In monolithic apps, there is no network latency since all components are in a single code base, while in distributed systems, latency will be network latency + computational latency.
To reduce latency, we can use caching (Redis) or CDN.
CDN is a geographically distributed network of proxy servers with the objective to send most frequently accessed data or static sites by storing them in CDN servers. While caching is the storing of information on a computer for a period of time.
Throughput
Troughput is the volume of work or information flowing through a system, a process flow rate measured in bits per second (bps).
The throughput of a distributed system is higher than monolithic since in distributed systems, jobs can be divided into several individual systems, and it can be horizontally scaled, etc. Causes of low throughput are latency, protocol overhead (to and fro connection between server), and congestion (too many requests at a time). Throughput can be improved by using CDN, caching, D.S, load balancer, and updating resources.
Availability
Availability refers to the degree to which a system, service, or component is operational and accessible when needed. It is one of the key attributes of a system's performance and reliability for a better UX. For example, when CBSE announces high school results, the site doesn't open for many users, making a bad UX. Monolithic apps are poor when it comes to availability because of a single point of failure, while distributed systems have the ability to make replicas so when one system goes down, its replica comes into play, increasing overall application availability. So, increase availability by using replication, distributed systems, and applying redundancy.
Replication vs Redundancy
Rplication involves creating identical copies of data or services in different locations to enhance performance and availability. It keeps data in sync and is useful in databases. Redundancy, on the other hand, is about having backup or duplicate components within a system to ensure fault tolerance and continuous operation in case of failures.
In a way, we can say: Replication = Redundancy + synchronization.
Types of redundancy => Active and Passive.
Active redundancy involves multiple parallel components actively working together to share the load, allowing seamless operation even if one component fails. In contrast, passive redundancy involves standby components that remain inactive until needed, minimizing system load during normal operation but requiring a switchover process when the primary component fails.
Also, Active-passive replication has one active node handling operations while others remain passive until the active node fails, at which point one of the passive nodes takes over to ensure system continuity.Consistency
Consistency ensures that all nodes in the distributed system see the same data at the same time, regardless of which node a client interacts with i.e preventing dirty read. E.g Use Cases are booking tickets, stock transactions etc.
Monolithic are natively consistent while there is a issue in consistency in distributed systems where there are many servers and it might take time to reflect changes from one server to other. Factor improving Consistency: improve network bandwidth, stop the read, replication based on distance(try to make servers closer)
Three common consistency models are:
1. Strong Consistency: no read operation is allowed till all systems are updated.
2. Evemtual Consistency: Read operation not halted, some may see old data, eventually updated.
3. Weak Consistency: Not even needed to update all system.
CAP Theorem
The CAP theorem is a fundamental principle in distributed systems that states that it is impossible for a distributed system to simultaneously provide all three of the following guarantees: Consistency, Availability, and Partition tolerance.
The system requirement should define which two properties should be chosen over the rest. Generally, only CP and AP are considered because if the system is not partition-tolerant, it can lead to a single point of failure which can be even more dangerous.
Consistency (C):
Banking Systems
Booking Systems (e.g., Flight Reservations)
Inventory Management Systems
Availability (A):
Social Media Platforms (e.g., Facebook, Twitter)
Gaming Systems (e.g., Online Multiplayer Games)
E-commerce Product Browsing
Lamport (Logical clock)
Lamport timestamps are a method for assigning a unique identifier to events in a distributed system. These timestamps provide a partial ordering of events across different processes. Lamport introduced the concept of logical clocks as a means to order events in a distributed system without relying on synchronized physical clocks.
Lamport's logical clocks and timestamps are employed in distributed databases and systems to manage concurrency control and ensure consistency.
Scalability (Horizontal vs Vertical)
**
Vertical Scaling:** Think of vertical scaling as adding more power to your existing machine. It's like upgrading your computer by adding more CPU, memory, or storage to a single device. Vertical scaling makes your machine more powerful but usually has limits on how much you can upgrade.Horizontal Scaling: Horizontal scaling is like getting more machines to share the load. Instead of making one machine more powerful, you add more machines to your setup. Each machine handles a part of the overall workload, and by adding more machines, you can handle more traffic or data. It's a way to scale by distributing the work across multiple devices. Drawback security and management increases.
Load Balancer
Load Balancing is the process of efficient distribution of network traffic across all nodes in a distributed system. It also ensures health checks, high scalability, high throughput and high availability. It is used in Microservices or DS architecture.
Round Robin: Distributes incoming requests equally among servers in a rotation.
Least Connections: Directs traffic to the server with the fewest active connections.
Least Response Time: Routes requests to the server with the quickest response time.
IP Hash: Uses a hash of the client's IP address to determine the server.
Random Selection: Randomly selects a server for each new request.
Weighted Round Robin: Assigns different weights to servers based on their capacity, influencing the distribution of requests accordingly.
Caching
Caching involves storing copies of frequently accessed data to reduce response time and improve system performance. Used in read-intensive and static content.
Web Browser Cache: Browsers store recently visited web pages, images, and scripts locally, reducing the need to re-download them on subsequent visits.
Content Delivery Network (CDN): CDNs cache and distribute content (images, videos) across multiple servers globally, delivering it from a server geographically closer to the user for faster access.
Database Caching: Systems cache query results or frequently accessed data to avoid repeated expensive database queries, improving overall application performance.
Object Caching (e.g., Memcached, Redis): Dedicated caching systems like Memcached or Redis store frequently accessed objects or data in memory, providing fast access to applications.
Content Caching in Proxy Servers: Proxy servers cache content from web servers, serving it directly to users and reducing the load on the origin server.
Application-Level Caching: Applications may cache computations or frequently accessed data to avoid redundant processing, enhancing responsiveness.
In-memory caching systems like Memcached and external/distributed caches like Redis enhance performance by storing frequently accessed data in the system's memory. Memcached is straightforward, excelling at caching key-value pairs for fast retrieval, while Redis provides additional features such as data persistence and advanced data structures, making it versatile for various use cases, including caching, messaging, and real-time analytics in distributed environments.
Cache Eviction
Cache eviction is the process of removing or replacing items from a cache to make room for new data. When the cache reaches its capacity, eviction policies determine which items to discard to accommodate incoming data. Common eviction strategies include Least Recently Used (LRU), where the least recently accessed items are removed, and First-In-First-Out (FIFO), where the oldest items are evicted. Eviction helps maintain cache efficiency by ensuring that the most relevant and frequently accessed data is retained.
LRU (Least Recently Used): Removes the least recently accessed item.
LFU (Least Frequently Used): Evicts the least frequently accessed item.
MRU (Most Recently Used): Discards the most recently accessed item.
LIFO (Last In, First Out): Removes the last item that was added.
FIFO (First-In, First-Out): Evicts the oldest item.
RR (Random Replacement): Randomly selects an item for eviction.
ACID
ACID is an acronym representing essential properties that guarantee the reliability and consistency of transactions in a database.
Atomicity ensures that a transaction is treated as a single, indivisible unit—either all its operations succeed, or none of them do. Consistency ensures that a transaction brings the database from one valid state to another, adhering to predefined rules. Isolation prevents interference between concurrent transactions, ensuring they run independently. Finally, Durability ensures that once a transaction is committed, its effects persist permanently, surviving system failures.
File Based Database Management System (DBMS)
A File-Based Database Management System (DBMS) is an early form of data storage and management where data is organized and stored in files. In this system, each application program has its own set of files for managing data, and there is little to no data independence between different applications. It faces challenges of security, data redundancy and speed.
File-based DBMS served as a precursor to more advanced database systems, such as Relational Database Management Systems (RDBMS), which introduced the concept of relational tables, normalization, and a standardized query language (SQL) to address the limitations of file-based approaches.
Relational Based Database System (RDBMS)
IIt's software that performs data operations on a relational database. Data is represented in the form of tables, the relationship between tables is represented by foreign keys, and operations like storing, managing, querying, and retrieving data are performed.
Horizontal scalability is difficult in RDBMS because if a user makes an order that involves updating both the 'Users' and 'Orders' tables, ensuring that these updates are transactionally consistent across different nodes poses a challenge.
Advantages | Disadvantages |
Security | Rigid Schema |
Concurrency Control | Horizontal Scalability |
Relationships | Performance |
Structured Query Language (SQL) | High Cost |
Data Independence | Flexibility |
Data Integrity | Learning Curve |
NO-SQL DATABASE
NoSQL (Not Only SQL) databases are a category of database management systems that provide a flexible and scalable approach to handling large volumes of unstructured or semi-structured data. Unlike traditional SQL databases, NoSQL databases are designed to be schema-less, allowing for easier scalability and better performance in certain use cases.
Types of NoSQL Databases:
Document-Oriented Databases (e.g., MongoDB):
Characteristics: Store data as documents, typically in JSON or BSON format. Each document is a set of key-value pairs. It combines relationship concepts from RDBMS and dynamic schema from NoSQL.
Use Cases: Well-suited for content management systems, blogging platforms, and applications with dynamic schemas.
Key-Value Stores (e.g., Redis, DynamoDB):
Characteristics: Simple key-value pairs where each key is associated with a value. Offers fast access and retrieval.
Use Cases: Caching, session storage, real-time analytics.
Column-Family Stores (e.g., Apache Cassandra, HBase):
Characteristics: Organize data into columns instead of rows, suitable for handling large amounts of sparse data.
Use Cases: Time-series data, analytics, machine learning, monitoring systems.
Graph Databases (e.g., Neo4j, Amazon Neptune):
Characteristics: Designed for representing and traversing relationships between data entities using graph structures.
Use Cases: Social networks, fraud detection, recommendation engines.
Pros:
Scalability: NoSQL databases are generally more scalable horizontally, allowing for easier distribution across multiple servers.
Flexibility: NoSQL databases are schema-less, providing flexibility to accommodate evolving data structures without requiring a predefined schema.
Performance: Well-suited for read and write-intensive workloads, providing faster performance for certain use cases.
Variety of Models: Different types of NoSQL databases cater to various data models, allowing developers to choose the most appropriate one for specific application requirements.
E.g Use cases:
ML => columnar database (cassandra)
Shopping Card => simplest, key value (redis)
Google Map => Graph Db
LinkedIn => Graph DB
scorecard => key value, redis
payment => we need high consistency , so RDBMS
Polyglot Persistence
If an application uses more than one type of database it is said to be polyglot persistant.
Normalization and Denormalization
Normalization in Database:** Normalization is the process of organizing data in a relational database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, related tables, minimizing data duplication, and ensuring that each piece of information is stored in only one place.
Denormalization in Database: Denormalization is the reverse process of normalization, where data is intentionally reintroduced to tables to improve query performance. It involves combining tables or adding redundant data to simplify complex queries, trading off some level of data redundancy, memory, increasing complexity, slow write operation for faster data retrieval. Denormalization is often used in situations where read performance is a higher priority than minimizing storage space and update anomalies.
Indexing in Database
Indexing is a technique to improve database performance by reducing the number of disk accesses when a query is run. It creates a lookup table with the column and the pointer to the memory location of the row containing this column.
Example:
Tool: MySQL, PostgreSQL.
Scenario: Indexing the "username" column in a user table for faster retrieval of user data.
Synchronous Communication
In synchronous communication, it's akin to asking a question and waiting for an instant answer before continuing the discussion, similar to a phone conversation where each person responds immediately after the other. It is necessary when consistency is crucial, for example, in stock market transactions, bank payments, ticket booking, and real-time decision making.
Asynchronous Communication
Asynchronous communication is often used to describe how processes or functions communicate with each other without waiting for the completion of each other. It is necessary when computation takes a lot of time, such as data fetching from APIs, scalability of applications (notifications), and avoiding cascading failures.
Message Based Communication
In message-based communication, the client sends a request in the form of a message and receives a response in the form of a message. It is asynchronous, allowing the client to continue its process without waiting. Terms include Producer (request sender), Consumer (request receiver), and Agent (communication channel, like a queue FIFO).
P2P (Peer-to-Peer):
Direct communication between two nodes.
Decentralized and bidirectional.
Pub/Sub (Publish-Subscribe):
Publishers produce messages.
Subscribers express interest and receive relevant messages.
Tools:
Apache Kafka:
Distributed streaming platform.
High-throughput, fault-tolerant, and scalable.
Suited for real-time event streaming.
RabbitMQ:
Message broker implementing AMQP.
Supports versatile messaging patterns.
Used for task distribution, microservices, and background jobs.
Web Server
Tools or programs that help keep the web application always up and running running. (source MDN)
Communication protocols
It ensures that devices or applications can understand and interpret the data they receive from each other. These protocols can be implemented by using hardware software or a combination of both.
Types of Communication Models:
Push Model:
Description: In a push model, data is sent from a server to a client without the client explicitly requesting it. The server "pushes" updates to clients as soon as they are available.
Use Case: Real-time notifications, live updates in web applications.
Pull Model:
Description: In a pull model, the client requests data from the server when it needs it. The server doesn't send data until explicitly requested by the client.
Use Case: Traditional web browsing where a user initiates requests by clicking links or submitting forms.
Long Polling:
Description: Long polling is a technique where the client sends a request to the server, and the server holds the request open until new data is available or a timeout occurs. The connection is then closed, and the client immediately opens a new one.
Use Case: Real-time chat applications, where instant message delivery is crucial.
WebSocket:
Description: WebSocket is a full-duplex communication protocol that provides a persistent connection between a client and a server. It allows bidirectional communication, enabling both the client and server to send messages independently.
Use Case: Real-time online gaming, financial trading platforms, chat applications.
Server-Sent Events (SSE):
Description: SSE is a unidirectional communication channel from the server to the client, allowing the server to push updates to the client over a single HTTP connection. The connection remains open for the duration of the event stream.
Use Case: Streaming live updates such as news feeds, stock tickers.
API
An API is a set of rules and tools that allows different software applications to communicate with each other. It defines the methods and data formats that applications can use to request and exchange information.
REST: Architectural style using standard HTTP methods, statelessness, and resource-based interactions.
GraphQL: Query language for APIs, allowing clients to request specific data.
gRPC: RPC framework using HTTP/2, Protocol Buffers, and supporting bidirectional streaming.
REST: Social Media Platforms, E-commerce Transactions, Mobile Application APIs
GraphQL: Content-Rich Interfaces Multiple Devices with Varied Data Needs Collaborative Editing Applications
gRPC: Microservices Communication Real-time Applications (Chat, Notifications) Large-Scale Distributed Systems
SOA (Service Oriented Architecture)
Service-Oriented Architecture (SOA) is an architectural style that structures software as a collection of loosely coupled services. These services communicate with each other through well-defined interfaces, typically over a network.
Microservices: evolved version of SOA
Microservices is an architectural style that structures an application as a collection of small, independent services, each focused on a specific business capability. These services are developed, deployed, and scaled independently, and they communicate with each other through well-defined APIs.
Pros of Microservices:
Scalability: Can scale individual services independently.
Flexibility: Supports diverse technology stacks for different services.
Autonomy: Allows for independent development and deployment.
Resilience: Faults in one service don't necessarily affect others.
Reusability: Encourages reuse of services across applications.
Cons of Microservices:
Complexity: Increased complexity in managing distributed systems.
Data Management: Challenges in handling data consistency across services.
Deployment Overhead: Individual service deployments may be more frequent.
Communication Overhead: Increased communication between services.
Monitoring and Debugging: Requires sophisticated monitoring and debugging tools.
Authentication and Authorization
Authentication:
Definition: Authentication is the process of verifying the identity of a user, system, or application. It ensures that the entity trying to access a resource is who it claims to be.
Example: Logging into a system using a username and password.
Authorization:
Definition: Authorization is the process of granting or denying access rights and permissions to authenticated users or systems. It determines what actions an authenticated entity is allowed to perform.
Example: Allowing a user to view, edit, or delete specific files based on their role or permissions.
Types of Authentications:
Basic Auth: Simple, but credentials are sent with each request.
OAuth: Delegated authorization for third-party access.
Token-Based Auth: Securely passes identity and permissions with each request.
Forward and Reverse Proxy
Forward Proxy:
Role: Acts on behalf of clients.
Example Tool: Squid.
Use Case: Users accessing external resources through a central gateway.
Reverse Proxy:
Role: Acts on behalf of servers.
Example Tool: Nginx.
Use Case: Distributes client requests among backend servers, improves security, and performs load balancing.
I hope you enjoyed the blog, and that all the terms and jargon regarding system design are clear to you. Whether you're exploring monolithic architectures or navigating the complexities of latency and consistency, understanding how all the pieces fit together is the key. Now, I strongly suggest making or replicating a real-world application to better understand how they are designed, you may refer to some of the blogs or video tutorials.