Building production grade Hight Available Kubernetes Cluster

Gourav Shah
A free video tutorial from Gourav Shah
Premium Instructor| 45k+ students| Devops Trainer and Author
4.2 instructor rating • 14 courses • 49,823 students

Learn more from the full course

Complete Kubernetes Tutorial by School of Devops®

Mastering container orchestration with Kubernetes one step at a time. Prepare for CKA Exam

22:23:57 of on-demand video • Updated August 2020

  • Understand the need for a Container Orchestration System and Key Features of Kubernetes
  • Install and configure a Kubernetes Cluster
  • Create deployments with replicasets and setup highly available self healing application infrastructure
  • Setup service discovery and load balancing with Kubernetes Services, understand the service networking
  • Manage different types of workloads with Daemonsets, Statefulsets, Crons and Jobs
  • Understand how persistent storage works with PVs, PVCs, Storageclasses and dynamic provisioners
  • Setup Auto Scaling with Horizontal Pod Autoscaler
  • Create RBAC Policies, Roles/ClusterRoles and Bindings
English [Auto] The previous lesson I talked about, governance, architecture, in this one, I'm going to talk about high available production infrastructure design for cabinet is how do we go about setting it up for a production scale and keeping the high availability and redundancy and fault tolerance in mind that the cabinet is controlled plane level as well. So the first thing that you need in order to have a higher level design is multiple masters, multiple masters. We also come with multiple SCD. SCD is important because it has all the configuration of your cluster, the control and all that information, and that needs to be replicated and that's what it does. So and when you create a cluster created with two and there's a formula of two and plus one, so if you want high visibility, if you want to afford fail or failure of one or you create two, and that is two to one plus one, there is no question. So if you have to not cluster, you can afford to have a portfolio. You can tolerate one node failure. If you want to tolerate two, not failures, you need five more cluster and you also need a quorum. So you can use our numbers. Three five seven three is the minimum. Seven is a good number. If you have a really large cluster, anything beyond that really has a lot of overhead because of the control plane, because it uses a protocol called a draft and it has a lot of network traffic going in and out of your network there. Right. So you need to have multiple managers at any given point of time. Only one will be a leader. You should also look at the protocol, how it works, and there'll be only one leader or this will be followers and it will no one. If the master goes down or the leader goes down, there'll be another one which will take over based on an election and voting and all that. Right. So you should definitely look up rough as a protocol, how it really works. Rough consensus protocol. It's a distributed consensus building protocol, and that's what governance uses internally. That's on the master side. Once you set up multiple masters, you can have, you know, communicate. You know, you can send the traffic to those masters using a lot balancers. Typically, that's a good thing to have because that way that will offer you one is if the master is down, it will if you have a load balancer, it will point you to the, you know, two other alternate nodes. That's one from the Apio service point of view. Second is it will distribute the load the request because a lot of routes can be sought from the follows as well. But when it comes to rights, they will automatically get redirected by other masters to the leader always. So you can communicate technically, you can communicate with any of the masters even when you have three. You can communicate with any of the masters. And internally, the decision making and everything else will be taken care of. So you can put it behind a load balancer and that serves as an interface. So when you connect, you can connect to that load balancer using your UI or even your own API agent or clan or any other external tool. And not only that, even the worker nodes can communicate with the Masters using the same load balancer. That's better because then you initiate master down your work orders need to keep track of which is the master, which is the leader and so on. It can communicate with the load balancer. Load balancer. Will you direct to the currently available masters that's on the community's side and that's your available architecture. Now, in addition to that, when you typically think about a production design with governorates, it's not just cabinet is in component that you need to think about. You also need to have, let's say, a few other entities, additional entities. One of those. Let's have a look. The first thing that you possibly need is a storage. Storage should typically be external outside of these nodes. You can use a network storage. You can have a storage provider which can either create dynamic drives and attach it to the containers. If you are running it on cloud, you can leverage cloud storage, cloud either block storage or object storage that the cloud provides and attach it, create volumes and attach it to the containers are the nodes and then internally to the containers. Or you can also use source scale or you can use cluster face and open access. There are so many different storage solutions which are available and these are like this. A lot of these will give you high availability and it can survive your node or for field failure as well. So you need storage to in a production environment, typically. What else you need? You also need a way to get those logs and put it in a central location. Typically, you can either use Splunk like Tool, a commercial tool, or you can use L.K. Stack, which is open source alternative for a centralized log storage analysis and management. Right. And you definitely need it because you're going to have pods which will come and go. Those are ephemeral. So when you actually want to have that data to be stored and be able to analyze it, you need to store it externally in a centralized log management system. The third thing is monitoring. You also need to set up a monitoring. And there are tools such as parameters which work very well with the communities that can pull the data. And you can it can integrate very personal data. And ElasticSearch let you visualize it using Rafina and tools like that. Right. So having a monitoring stack definitely helps you to get an idea about what's happening and you can send that alerting for any of the events. A lot of monitoring is done by Goulbourn that is at the board level. But what you may want to monitor the service tracing. So you may want to know there are additional tools for tracing between services and performance issues and anything else that you want to monitor at the system level as well. And at the container level, if you want to, you can do that with these monitoring tools. So that's typically my recommendation for a high of a level production environment with communities.