Udemy

Building production grade Hight Available Kubernetes Cluster

A free video tutorial from Gourav Shah . 170,000+ Students
Premium Instructor | 170,000+ Students | DevOps/AIOps/MLOps
Rating: 4.5 out of 5Instructor rating
28 courses
179,264 students
Building production grade Hight Available Kubernetes Cluster

Learn more from the full course

Supercourse - Ultimate Advanced Kubernetes Bootcamp

Mastering container orchestration with Kubernetes one step at a time. Prepare for CKA/CKAD Exam

22:23:57 of on-demand video • Updated April 2025

Understand the need for a Container Orchestration System and Key Features of Kubernetes
Install and configure a Kubernetes Cluster
Create deployments with replicasets and setup highly available self healing application infrastructure
Setup service discovery and load balancing with Kubernetes Services, understand the service networking
Manage different types of workloads with Daemonsets, Statefulsets, Crons and Jobs
Understand how persistent storage works with PVs, PVCs, Storageclasses and dynamic provisioners
Setup Auto Scaling with Horizontal Pod Autoscaler
Create RBAC Policies, Roles/ClusterRoles and Bindings
English [Auto]
In the previous lesson I talked about Kubernetes architecture. In this one, I'm going to talk about high available production infrastructure design for Kubernetes. How do we go about setting it up for a production scale and keeping the high availability and redundancy and fault tolerance in mind at the Kubernetes control plane level as well? Right. So the first thing that you need in order to have a high available design is multiple masters. Multiple masters will also come with multiple Xkcdz. Xkcdz is important because Etcd has all the configuration of your cluster, the control plane, all that information, and that needs to be replicated and that's what Etcd cluster does. So and when you create a cluster created with two and there's a formula of two and plus one. So if you want high availability, if you want to afford a failover failure of one node, you create two and that is two into one plus one. There is three node cluster. So if you have three node cluster, you can afford to have a fault. You can tolerate one node down or failure. If you want to tolerate two node failures, you need five node cluster and you also need a quorum. So you generally use odd numbers. Three five, seven three is the minimum. Seven is a good number. If you have a really large cluster, anything beyond that really has a lot of overhead of the control plane because it uses a protocol called as raft and it has a lot of network traffic going in and out of your network there. Right. So you need to have multiple managers at any given point of time. Only one will be a leader. You should also look at raft protocol, how it works, and there will be only one leader. Others will be followers and it will, you know, once if the master goes down or the leader goes down, there will be another one which will take over based on an election and voting and all that. Right? So you should definitely look up raft as a protocol, how it really works. Raft consensus protocol. It's a distributed consensus building protocol, and that's what Kubernetes uses internally. That's on the master side. Once you set up multiple masters, you can have, you know, communicate. You know, you can send the traffic to those masters using a load balancer. Typically that's a good thing to have because that way that will offer you one is if the master is down, it will. If you have a load balancer, it will point you to the, you know, to other alternate nodes. That's one from the API service point of view. Second is it will distribute the load the request, right? So because a lot of reads, reads can be served from the, you know, the followers as well. But when it comes to writes right will automatically get redirected by other masters to the leader always so you can communicate technically you can communicate with any of the masters even if you have three, you can communicate with any of the masters and internally, the decision making and everything else will be taken care of. So you can put it behind a load balancer and that serves as an interface. So when you connect, you can connect to that load balancer using your kubectl UI or even your own API agent or a client or any other external tool, right? And not only that, even the worker nodes can communicate with the masters using the same load balancer. That's better because then you, you know, if you have a master down, your worker node doesn't need to, you know, keep track of which is the master, which is the leader and so on. It can communicate with the load balancer. Load Balancer will redirect to the currently available masters that's on the Kubernetes side and that's your high available architecture. Now in addition to that, when you typically think about a production design with Kubernetes, it's not just Kubernetes components that you need to think about. You also need to have, let's say, a few other entities, additional entities. What are those? Let's have a look. The first thing that you possibly need is a storage. Storage should typically be external. Outside of these nodes. You can use a network storage, you can have a storage provisioner which can either create dynamic drives and attach it to the containers. If you are running it on cloud, you can leverage cloud storage, Cloud, either a block storage or a, you know, object storage that the cloud provides and attach it create volumes and attach it to the containers or the nodes and then internally to the containers. Or you can also use NFS or Scaleio or you can use Glusterfs and open EBS. There are so many different storage solutions which are available and these are like these give a lot of these will give you high availability and it can survive your node or pod fail failure as well. Right? So you need a fault tolerant storage too, in a production environment, typically. What else you need, You also need a way to get those logs and put it in a central location. Typically you can either use a splunk like tool or a commercial tool, or you can use elk Stack, which is a open source alternative for centralized log storage analysis and management. Right. And you definitely need it because you're going to have pods which will come and go. Those are ephemeral. So when you actually want to have that data to be stored and when you want to analyze it, you need to store it externally in a centralized log management system. The third thing is monitoring. You also need to set up a monitoring. And there are tools such as Prometheus, which work very well with the Kubernetes. It can pull the data and you can it can integrate with it puts the data in Elasticsearch, lets you visualize it using grafana and tools like that. Right? So having a monitoring stack definitely helps you to get an idea about what's happening and you can send a set alerting for any of the events. A lot of monitoring is done by Kubernetes at the pod level, but what you may want to monitor is the service tracing. So you may want to you know, there are additional tools for tracing between microservices and performance issues and anything else that you may want to monitor at the system level as well. And at the container level, if you want to, you can do that with these monitoring tools. So that's typically my recommendation for a high available production environment with Kubernetes.