High Availability
This guide will go through all the configuration items needed to improve the availability and service continuity of your applications deployed in a cegedim.cloud managed Kubernetes clusters.
This is a "must-read" guide in order that your Disaster Recovery Strategy meets cegedim.cloud compute topology.
How to configure deployments to leverage HA capabilities
Once your Kubernetes cluster configured to run using the High Availability (HA) topology, some configuration best practices are required to allow your applications :
to run simutenaously on all datacenters of the region
to have sufficient capacity in all datacenters in case of Disaster on one of them
Deployment Configuration
As a reminder, the nodes of the Kubernetes clusters are distributed into 3 availability zones (AZ) and 2 datacenters :
AZ "A" and "B" are running on the primary datacenter
AZ "C" is running on the secondary datacenter
Replica Count
For stateless services that support scaling, best practice is to have at least 3 pods running :
Anti-Affinity
Those at-least 3 pods needs to be properly configured to have at least one pod running on each Availability Zone:
We are using preferedDuringSchedulingIgnoredDuringExecution
and not requiredDuringSchedulingIgnoredDuringExecution
because we want this requirement to be "soft" : Kubernetes will then allow to schedule multiple pods on same AZ if you are running more replicas than AZs, or in case of failure of a zone.
In kube 1.20,failure-domain.beta.kubernetes.io/zone
will be deprecated, the new topology key will betopology.kubernetes.io/zone
Spread Constraints for Pods
How to size your HA Cluster
If you are using the High Availability cluster topology, your objective is to deploy resilient applications in case of a datacenter failure.
This page describe some best practices to determine sizing of worker nodes for each Availability Zone where your workloads are running.
Thinking about the "Worst Case Scenario"
As a reminder, Kubernetes Cluster is deployed in 3 availability zones, and 2 datacenters. In the worst case scenario, only 1 AZ will run if the primary datacenter has a major disaster.
That's the hypothesis to take into account to determine the "nominal" sizing, that we can call "If the primary datacenter fails, how much CPU / RAM capacity do I need to keep my application working ?"
Defining Nominal Capacity
Understanding your requirements
To determine this capacity, and then the worker nodes deployed in "C" Availability Zone (how many, and with which resources), you will need 3 parameters :
:
Parameter | Description |
---|---|
Minimum Business Continuity Objective (MBCO) | As RTO / RPO, MBCO is a major parameter to size your DRP. To sum up, it is the percentage of capacity of your deployed application that is required to have your business up and running. Depending on how did you size your workloads when running in 3 AZs, performance you determine as sufficient, it can be 30%, 50% or 100%. For example, if you have an application with 3 replicas of 4GB RAM on each AZ, you can determine the MBCO really differently :
Choice is yours ! |
Pods Resources Requests | Pods are deployed using Kubernetes resources requests and limits. As Kubernetes will try to reschedule your pods in case of an outage, the requests is an absolutely major parameter to manage. If the AZ-C has not enough resources to satisfy all requirements of desired pods deployments, Kubernetes will not deploy them, and maybe your applications won't be available ! To know about your requests, you can run this command :
|
Resources Usage | Determining requests is OK to be sure that Kubernetes will deploy as many pods as you want to, but what about real capacity your pods are using ? This is also to take in account to have the picture on how "raw" resources your applications require. To determine that, you can run this command to know about your pod's current usage :
|
Calculate Sizing
Then you have two choices to calculate sizing :
At the "Cluster Level" granularity: if you are just beginning the process and do not have such complexity or variability in your workloads, use this :
Determine a global MBCO cross-deployments
Summing all pods resources requests to get an unique number
Summing all pods resources usages to get an unique number
At the "Pod Level" granularity: If you want the sizing to be fitted perfectly and you have time to, take the time to determine those parameters for each deployment in your Kubernetes Cluster, because MBCO may vary ! For example :
A web application will require a MBCO with 100%
A cache will require a MBCO of 50%
A "nice-to-have" feature, or an internal tool can be 0%
The "Cluster Level" calculation is not accurate enough to be absolutely certain that cluster will be appropriately sized. Just know about it, and evaluate if it's worth taking the risk.
In any case, this sizing have to be reassessed regularly, depending on new deployments or rescaling you are running on your daily operations
Using "Cluster Level" Granularity
If you have summed all requests and usage, and you've determined the MBCO on the "cluster" level, you can use this simple formula to calculate required sizing for AZ "C" in secondary datacenter :
Using "Pod" Granularity
If you've determined a per-deployment MBCO, you will have to calculate your sizing with a more complex formula :
Adjust your deployment descriptors
Once you've calculated your MBCO, it is important to leverage Kubernetes capabilities (QoS, especially PodDisruptionBudget
) to make your deployment follow your decision.
Adjust your cluster sizing
Use ITCare or request help from our support to size your cluster.
How to use Kubernetes QoS and Guaranteed Availability
During this phase, you'll need to prioritize your assets and define the components that are essential to the availability of your services.
To know your resource utilization, once deployed, it's a good idea to observe the resource consumption of your workload.
you can access your metrics via the rancher user interface or via a client like Lens.
Quality of service
In Kubernetes there are 3 classes of QOS:
Guaranteed
Burstable
BestEffort
For critical workloads, you can use "guaranteed" QOS, which simply sets resource limits equal to resource demands:
For less critical workloads, you can use the "Burstable" QOS, which will define resource limits and demands.
Pod disruption budget
The pod disruption budget lets you configure your fault tolerance and the number of failures your application can withstand before becoming unavailable.
Stateless
With a stateless workload, the aim is to have a minimum number of pods available at all times. To achieve this, you can define a simple pod disruption budget:
Stateful
Avec une charge de travail à état, le but est d'avoir un nombre maximum de pods indisponibles à tout moment, afin de maintenir le quorum par exemple :
Horizontal Pod Autoscaler
High traffic scenarios:
Last updated