High Availability

This guide will go through all the configuration items needed to improve the availability and service continuity of your applications deployed in a cegedim.cloud managed Kubernetes clusters.

This is a "must-read" guide in order that your Disaster Recovery Strategy meets cegedim.cloud compute topology.

How to configure deployments to leverage HA capabilities

Once your Kubernetes cluster configured to run using the High Availability (HA) topology, some configuration best practices are required to allow your applications :

  • to run simutenaously on all datacenters of the region

  • to have sufficient capacity in all datacenters in case of Disaster on one of them

Deployment Configuration

As a reminder, the nodes of the Kubernetes clusters are distributed into 3 availability zones (AZ) and 2 datacenters :

  • AZ "A" and "B" are running on the primary datacenter

  • AZ "C" is running on the secondary datacenter

Replica Count

For stateless services that support scaling, best practice is to have at least 3 pods running :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    name: myapp
spec:
  replicas: 3

Anti-Affinity

Those at-least 3 pods needs to be properly configured to have at least one pod running on each Availability Zone:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dep-nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - nginx
              topologyKey: "failure-domain.beta.kubernetes.io/zone"
      containers:
      - name: web-app
        image: nginx

We are using preferedDuringSchedulingIgnoredDuringExecution and not requiredDuringSchedulingIgnoredDuringExecution because we want this requirement to be "soft" : Kubernetes will then allow to schedule multiple pods on same AZ if you are running more replicas than AZs, or in case of failure of a zone.

In kube 1.20,failure-domain.beta.kubernetes.io/zonewill be deprecated, the new topology key will betopology.kubernetes.io/zone

Spread Constraints for Pods

How to size your HA Cluster

If you are using the High Availability cluster topology, your objective is to deploy resilient applications in case of a datacenter failure.

This page describe some best practices to determine sizing of worker nodes for each Availability Zone where your workloads are running.

Thinking about the "Worst Case Scenario"

As a reminder, Kubernetes Cluster is deployed in 3 availability zones, and 2 datacenters. In the worst case scenario, only 1 AZ will run if the primary datacenter has a major disaster.

That's the hypothesis to take into account to determine the "nominal" sizing, that we can call "If the primary datacenter fails, how much CPU / RAM capacity do I need to keep my application working ?"

Defining Nominal Capacity

Understanding your requirements

To determine this capacity, and then the worker nodes deployed in "C" Availability Zone (how many, and with which resources), you will need 3 parameters :

:

ParameterDescription

Minimum Business Continuity Objective (MBCO)

As RTO / RPO, MBCO is a major parameter to size your DRP.

To sum up, it is the percentage of capacity of your deployed application that is required to have your business up and running.

Depending on how did you size your workloads when running in 3 AZs, performance you determine as sufficient, it can be 30%, 50% or 100%.

For example, if you have an application with 3 replicas of 4GB RAM on each AZ, you can determine the MBCO really differently :

  • 33%

    • having only one running during outage is sufficient, because performance will be OK

    • you can take the risk to not have redundancy during outage period

  • 66%

    • either, 2 pods minimum is required to have a performance OK

    • and/or you don't want take the risk to fail if the only pod left fails

  • 100%

    • you need absolutely 3 pods minimum to run the service with nominal performance

Choice is yours !

Pods Resources Requests

Pods are deployed using Kubernetes resources requests and limits.

As Kubernetes will try to reschedule your pods in case of an outage, the requests is an absolutely major parameter to manage.

If the AZ-C has not enough resources to satisfy all requirements of desired pods deployments, Kubernetes will not deploy them, and maybe your applications won't be available !

To know about your requests, you can run this command :

kubectl get pods --all-namespaces -o=jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{"-"}{range .spec.containers[*]}{"/"}{.name}{";"}{.resources.requests.cpu}{";"}{.resources.limits.cpu}{";"}{.resources.requests.memory}{";"}{.resources.limits.memory}{"\n"}{end}{"\n"}{end}' | grep -v -e '^$'

Resources Usage

Determining requests is OK to be sure that Kubernetes will deploy as many pods as you want to, but what about real capacity your pods are using ? This is also to take in account to have the picture on how "raw" resources your applications require.

To determine that, you can run this command to know about your pod's current usage : kubectl top pod

Calculate Sizing

Then you have two choices to calculate sizing :

  • At the "Cluster Level" granularity: if you are just beginning the process and do not have such complexity or variability in your workloads, use this :

    • Determine a global MBCO cross-deployments

    • Summing all pods resources requests to get an unique number

    • Summing all pods resources usages to get an unique number

  • At the "Pod Level" granularity: If you want the sizing to be fitted perfectly and you have time to, take the time to determine those parameters for each deployment in your Kubernetes Cluster, because MBCO may vary ! For example :

    • A web application will require a MBCO with 100%

    • A cache will require a MBCO of 50%

    • A "nice-to-have" feature, or an internal tool can be 0%

The "Cluster Level" calculation is not accurate enough to be absolutely certain that cluster will be appropriately sized. Just know about it, and evaluate if it's worth taking the risk.

In any case, this sizing have to be reassessed regularly, depending on new deployments or rescaling you are running on your daily operations

Using "Cluster Level" Granularity

If you have summed all requests and usage, and you've determined the MBCO on the "cluster" level, you can use this simple formula to calculate required sizing for AZ "C" in secondary datacenter :

required_capacity = MBCO * max(requests, usage)

Using "Pod" Granularity

If you've determined a per-deployment MBCO, you will have to calculate your sizing with a more complex formula :

required_capacity = sum(pod.MBCO * max(pod.requests, pod.usage))

Adjust your deployment descriptors

Once you've calculated your MBCO, it is important to leverage Kubernetes capabilities (QoS, especially PodDisruptionBudget) to make your deployment follow your decision.

Adjust your cluster sizing

Use ITCare or request help from our support to size your cluster.

How to use Kubernetes QoS and Guaranteed Availability

During this phase, you'll need to prioritize your assets and define the components that are essential to the availability of your services.

To know your resource utilization, once deployed, it's a good idea to observe the resource consumption of your workload.

you can access your metrics via the rancher user interface or via a client like Lens.

Quality of service

In Kubernetes there are 3 classes of QOS:

  • Guaranteed

  • Burstable

  • BestEffort

For critical workloads, you can use "guaranteed" QOS, which simply sets resource limits equal to resource demands:

apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
  replicas: 3
  template:
    spec:
      resources:
        limits:
          memory: "200Mi"
          cpu: "700m"
        requests:
          memory: "200Mi"
          cpu: "700m"

For less critical workloads, you can use the "Burstable" QOS, which will define resource limits and demands.

apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
  replicas: 2
  template:
    spec:
      resources:
        requests:
          memory: "200Mi"
          cpu: "700m"

Pod disruption budget

The pod disruption budget lets you configure your fault tolerance and the number of failures your application can withstand before becoming unavailable.

Stateless

With a stateless workload, the aim is to have a minimum number of pods available at all times. To achieve this, you can define a simple pod disruption budget:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: myapp

Stateful

Avec une charge de travail à état, le but est d'avoir un nombre maximum de pods indisponibles à tout moment, afin de maintenir le quorum par exemple :

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

Horizontal Pod Autoscaler

High traffic scenarios:

Last updated