
Explore how distributed systems scale by running processes on different machines that communicate over a network, share a common state, and deliver seamless cloud-based services.
Explore core concepts of distributed systems, defining nodes and clusters, and design a leader election using zookeeper with ephemeral and persistent z nodes, ensuring high availability and automatic re-election.
Configure and start zookeeper on your machine, then use the zookeeper command line interface to visualize and debug znodes with ls, create, get, and rmr.
Explore Zookeeper's threading model, with I/O and event threads, and learn to connect a Java Maven app to Zookeeper using the Java API, handle connection events, and debug with Log4j.
Implement the leader election algorithm using zookeeper ephemeral sequential znodes, determine the leader by the smallest znode, and package and test a distributed app with a standalone jar.
Discover how watchers and triggers in ZooKeeper enable failure detection, avoid the herd effect, and support a robust leader election strategy.
Finish upgrading the leader election algorithm to be fault tolerant and scalable, using watchers and predecessor znodes to reelect leaders during failures and ensure continuous cluster availability.
Implement dynamic service registry and discovery with Apache zookeeper, using a permanent service registry znode and sequential nodes, watchers, and get data to adapt to cluster changes.
Implements a zookeeper-based service registry and discovery, storing host:port in ephemeral znodes, updating cluster addresses, and integrating with leader election via callbacks for scalable, fault-tolerant cluster management.
Explore the four TCP/IP layers from data link to application and trace the journey of a message between two machines, including IP addresses, ports, TCP vs UDP, and HTTP usage.
Learn how http enables node-to-node communication in distributed systems by detailing http request and response structures, get and post methods, headers, and status codes.
Build an http server in Java using standard libraries, with /status health check and /task post computing product of numbers via big integers, using custom headers and testing via curl.
Build an HTTP client using the JDK 11 API, enabling asynchronous requests and connection pooling to send tasks to multiple workers, analyze traffic with Wireshark, and aggregate their results.
Examine message delivery semantics in distributed systems, including at most once and at least once patterns. Use idempotent design and a monotonic sequence number to enable retries and prevent duplicates.
Learn to serialize and deserialize complex data objects in distributed systems using JSON, Java serialization, and Protocol Buffers, with explanations of their pros, cons, and suitable use cases.
Explore the tf-idf algorithm for scoring document relevance in a distributed search system. Learn term frequency, inverse document frequency, and a parallel coordinator–worker architecture.
Implement tf-idf by computing term frequency and inverse document frequency to score and rank documents in a sequential, single-machine setting, preparing for distributed use.
Explore data partitioning strategies for scalable tf-idf in a distributed system, partitioning by documents or terms, and parallelizing term frequency computations while the leader coordinates with zookeeper and service registry.
Build a distributed search worker node that processes coordinator tasks, computes tf-idf scores on documents, and communicates via java serialization over http within a leader-election aware zookeeper-based service registry.
Explore building a distributed tf-idf search cluster in Java using protocol buffers for front end to coordinator communication, worker task distribution, and score-based document ranking for cloud deployment.
Finish a distributed search by building a user-facing web app that takes input, runs a parallel tf-idf with protocol buffers, and returns filtered, score-normalized results via JSON.
A load balancer distributes traffic across a server cluster, prevents bottlenecks and single points of failure, and supports auto scaling, health checks, and hardware or software options.
Explore load balancing strategies and algorithms: round-robin, weighted round robin, source IP hash, least connection, and weighted response time, to optimize traffic and maintain session stickiness.
Compare transport layer and application layer load balancing in distributed systems, detailing how layer 4 forwards TCP packets while layer 7 inspects HTTP headers, URLs, methods, and cookies for routing.
Practice load balancing with haproxy, exploring http layer 7 and tcp mode, using round robin and weighted strategies, implementing health checks, content-based routing, and a live admin page for monitoring.
Learn to run HAProxy with docker on any platform by building and deploying three web app containers and a HAProxy container via docker compose, with port mappings and health checks.
Explore how message brokers decouple services, enable asynchronous communication, and support publish/subscribe and distributed queue patterns for scalable, fault-tolerant distributed systems.
Explore how Apache Kafka acts as a distributed streaming platform with topics, partitions, offsets, and consumer groups for scalable, fault-tolerant publish/subscribe and distributed queuing.
Explore how Apache Kafka scales as a distributed system via topic partitioning across brokers, enabling performance, fault tolerance, and durability through leader and follower replication, log persistence, and ZooKeeper coordination.
Install, configure, and run a Kafka cluster with zookeeper, starting from a single broker and expanding to multiple brokers, then test with console producer and consumer, explore replication and failover.
Learn to build a Kafka producer with the Java API, configure bootstrap servers and serializers, and send messages to a distributed topic using explicit partitions, keys, or round-robin.
Build a Java Kafka consumer, explore partition load-balancing in a consumer group, and test publish/subscribe patterns, with manual commit to safeguard against uncommitted messages.
Understand why distributed storage is essential for availability, scalability, and fault-tolerance, and compare filesystem storage with relational and non-relational databases, including atomic, consistent, isolated, and durable transactions.
Scale databases horizontally via sharding to partition data into shards across nodes and improve latency and throughput. Explore hash-based and range-based strategies for SQL and NoSQL, plus concurrency and distributed transactions.
apply consistent hashing by mapping keys and nodes to a shared hash space on a ring to enable dynamic sharding with minimal record movement, using virtual nodes for load balancing.
Explore how database replication enables high availability, scalability, and fault tolerance using master/slave and master/master architectures, and how quorum consensus guarantees strict consistency with flexible read and write quorums.
Explore MongoDB, a scalable NoSQL document-oriented database that stores JSON documents with auto-generated IDs, and learn core CRUD operations, basic querying, and local setup with the Mongo shell.
Learn to scale MongoDB with data replication by creating replication sets, understanding primary and secondary roles, and configuring write concerns and read preferences for reliability and performance.
Launch a replicated MongoDB cluster with a replication set and the MongoDB Java driver, then test a Java client that enrolls students into course collections and simulate a node failure.
Scale distributed MongoDB with data sharding by implementing hash-based and range-based strategies, choosing shard keys, and balancing chunks. Learn router, config server, and replication set roles for reliable horizontal scaling.
Launch a multi-shard MongoDB cluster with config servers and mongos, shard movies by name using a range-based strategy and users by id with a hash-based strategy, and study balancer behavior.
Discover cloud computing fundamentals, from infrastructure as a service to platform as a service, and learn how multi-region, fault-tolerant architectures with scaling, load balancing, and storage enable scalable distributed systems.
Deploy a distributed app on cloud compute instances with instance templates, including uploading a jar to storage, configuring VMs, and bootstrapping via startup scripts for automated deployment.
Deploy a multi-zone instance group with autoscaling and auto-healing, using health checks to detect unhealthy instances and automatically grow or shrink the cluster to maintain availability.
Launch and distribute your app across multiple regions with a global load balancer and static ip address, routing to the nearest region for low latency and high availability.
Have you always wanted to build software that reaches millions of users and impact people's lives?
Have you been wondering how modern companies
Handle massive amount of internet traffic and transactions?
Securely store billions of our photos, videos, and other data?
Provide impeccable user experience and high performance 24/7 all around the globe?
Then you are in the perfect place!
In this course you will:
Master the theory of Distributed Systems, Distributed Computing and modern Software Architecture
Gain the practical skills necessary to build Distributed Applications and Parallel Algorithms, focusing on Java based technologies
Deploy groups of distributed Java applications on the Cloud
Scale Distributed Databases to store petabytes of data
Build Highly Scalable and Fault Tolerant Distributed Systems
Along the way, you will learn modern technologies like:
Apache Kafka
Apache Zookeeper
MongoDB
HAProxy
JSON
Java HTTP Server and Client
Protocol Buffers
Google Cloud Platform
And many others
By the end of the course you will:
Apply best practices for building and architecting real-life Distributed Systems
Scale your Distributed System to handle billions of transactions per day
Deploy your distributed application on the Cloud
Choose the right technologies for your use case and Software Architecture
Use modern Java based techniques to store and handle large amounts of data
So what are you waiting for?
Join us today on this incredible journey!
FAQ
- What do I need to know to join the course?
Basic knowledge of Java will suffice. Knowing the fundamentals of Multithreading and Concurrency may help but is not required.
- Will this course help me in System Design Interviews?
Yes. Distributed Systems questions are frequently asked during System Design Interviews, especially by large companies that operate on a massive scale. The skills you will learn in this course will help you in your career both while interviewing and working on real projects
- Do I need to pay for any software or Cloud account?
No. All the technologies covered in the course are free and open-source. The lectures on the cloud don't require you to pay for anything. If you want to follow along, all cloud vendors provide free-tier accounts to play around with and practice for free. Please follow the specific cloud vendor's documentation for guidance.
- Can I run and develop a Distributed System locally on my personal computer?
Yes. You can develop and run a distributed system on your computer and you don't need to buy any additional hardware. Generally, most distributed computing development is done on a single computer before it goes to QA and production.
- Is this the right course for me if I want to become a Software Architect or Technical Lead?
Yes. This is the right place for you to gain practical Software Architecture and Distributed Computing skills to become a Software Architect and Technical Lead. Thanks to the advancement of Cloud Computing, most companies today run distributed systems and deploy them on the cloud. So the skills taught in this course are critical to being a successful Software Architect in the modern era.