K8s - Get started

Getting started

Connect to Rancher UI

cegedim.cloud uses Rancher as the Kubernetes platform management.

Rancher handle ITCare SSO authentication : the login / password is the same as ITCare.

Rancher Instances

Regarding your cluster location, Rancher is reachable with following URLs:

In ITCare, you can find your cluster URL in the cluster detail page :

Connect to Rancher

Rancher will as for an authentication at first login : simply click on "Login with Keycloak"

Then you will be redirected to the standard login process :

Once logged in, you should have a screen listing all the clusters you have accesses to :

If the UI gets stuck on "Loading" after the logging in, please try to connect to:

If on first login you don't see your cluster in the cluster list you might want to logout and login again.

Manage your preferences

You can manage your UI preferences (dark theme, number of rows per table...) by setting up your user preferences. Please refer here to a full documentation:

Configure kubectl

In order to connect to the cluster using CLI, you have two options :

  • by regular remote kubectl

  • using rancher online kubectl

Both are available by getting to the "cluster" page in Rancher. There are two ways of doing that :

Using the kubectl configuration file

Once on the cluster homepage you can download the "Kubeconfig File":

Or just copy the content of "Kubeconfig File":

if you dont have kubectl we highly suggest you to install kubectl on your administration workstation, following this document.

This configuration can be mixed with other kubectl configuration.

The authentication can be shared with any cluster managed by the same rancher instance.

Using the web cli

Once on the cluster home page you can use the web cli by clicking on the below icon :

This should launch a web shell like this one :

This shell is temporary, any changes made inside will be discarded once the window closed. This session might get disconnected if no input/output is observed.

Get API an Token

Token management UI is accessible right beneath the user avatar :

Token scopes

There are two scopes :

  • no-scope : global scope : used to interact with global rancher API

  • cluster-scoped : token dedicated to access specific cluster

A Cluster scoped token is required to use helm 3. This means that you need a token per cluster in your CI/CD pipelines

Token lifecycle

Token can have different lifecycles :

  • a token can have a unlimited lifespan, it will follow the lifecycle of the account attached to it

  • a specific lifetime

Nodes management

Scale cluster

You can use ITCare to add or remove nodes to your cluster.

Manage Namespaces

Understanding Project - A Rancher concept

Rancher manages namespaces via project, which is a concept specifically existing only in Kubernetes clusters managed by Rancher.

Project is not a Kubernetes native resource.

By default, a Kubernetes cluster is provisioned with 2 projects:

  • System: containing core-component's namespaces like: kube-system, etc.

  • Default: containng the "default" namespace

Users are free to create more Projects if needed.

Basing on Project level, Rancher offers built-in automation like: access rights granting, network isolation, etc.

Users are very encouraged to classify namespace into a Project.

How to properly create a namespace

  • Switch to project view

  • Create a new namespace from project view

  • Insert a unique name, and fill other fields if needed, and click on "Create"

If you create a namespace with kubernetes CLI, e.g. kubectl, the created namespace will be moved into the the project parent of the namespace "default" (which is, by default, the project named Default)

Rights Management

cegedim.cloud recommends and officially supports access rights managing via AD groups.

Only AD groups starting with G_EMEA_* and G_K8_* are known by Rancher.

By default, when a cluster is created:

  • Standard user role is given to the group G_K8_<CLUSTER_NAME>_USERS which contains the power users of the related Cloud

  • Admin role is given to the group G_K8_<CLUSTER_NAME>_ADMINS which is empty by default and can be populated with competent & certified users via ITCare ticket toward AD support team.

For instance, user user1@cegedim.com needs to have standard user access to cluster test-preprod, he needs to ask to add user1@cegedim.com to the AD group named G_K8_TEST_PREPROD_USERS.

When users create a new Project, as default owner, they are free to bind any role on any AD group in the scope of this project.

If the Rancher predefined roles cannot fullfill your needs, please contact admins of your cluster to configure a custom rolebinding or clusterrolebinding.

Manage Rights

Project Level Rights Management

In this part we will assume that the rights are given to a group not to a user. In order to manage right on a project there two ways : The UI or the API.

One of the highest role you can assign is "Project Admin"

Using UI

Edit the project that you are owner or are given to sufficient rights from the project creator.

Select the group and the role in the form.

Please note that only groups starting with G_EMEA_* and G_K8_* are known by Rancher.

Using API

Using the API it's pretty straight forward. You will first need some parameters :

  • Getting Project ID

To get the project ID, you can use the API explorer simply use the "View in API" button.

curl --request GET \
 --url https://k8s-eb.cegedim.cloud/v3/projects \
 --header 'authorization: Bearer token-tttt:token-of-doom'
  • Getting Role ID

To get the role ID you might not be allowed to list the through the UI but you will get it through this API request :

curl --request GET \
--url https://rancher-staging.cegedim.cloud/v3/roleTemplates?name=Project%20Admin\
--header 'authorization: Bearer token-tttt:token-of-doom'
  • Give credentials

Using your API token you can make a single POST request to create the role binding:

curl --request POST \
--url https://k8s-eb.cegedim.cloud/v3/projectRoleTemplateBindings \
--header 'authorization: Bearer token-tttt:token-of-doom' \
--header 'content-type: application/json' \
--data '{
"projectId": "c-6t7f4:p-d43l6",
"namespaceId":"",
"groupPrincipalId":"keycloakoidc_group_group://G_EMEA_DUPER_GROUP",
"roleTemplateId": "project-owner"
}'

Compatibility check before upgrading Kubernetes version

Context

Kubernetes resource api versions can be deprecated and even removed when upgrading Kubernetes version.

Things can be broken if you have resources of which apiversion is removed in the new Kubernetes version.

To avoid this risk, one of the solutions is using "kubent" tool to check the compability.

Kubent detects deprecated objects of a Kubernetes cluster. You should migrate/modify detected resources before the Kubernetes version upgrading.

Quick start

To install kubent:

sh -c "$(curl -sSL 'https://git.io/install-kubent')"

To detect depricated objects that will be removed in the newer Kuberntes version:

kubent [--context my-cluster]

An example of the output:

...
__________________________________________________________________________________________
>>> Deprecated APIs removed in 1.22 <<<
------------------------------------------------------------------------------------------
KIND                       NAMESPACE      NAME                                   API_VERSION                         REPLACE_WITH (SINCE)
Ingress                    <undefined>     toto                                 networking.k8s.io/v1beta1           networking.k8s.io/v1 (1.19.0)
...

In this tutorial, if you cluster is planned for upgraded to Kubernetes version 1.22, you should migrate your ingress ressource named "toto" from api version networking.k8s.io/v1beta1 to networking.k8s.io/v1 before the upgrade.

This migration might imply modifying some extra fields of the resources. Please refer to the official documentation:

Kubent might fail to retrieve some information, e.g. namespace of the ingress, feel free to fire an issue for the editor: https://github.com/doitintl/kube-no-trouble/issues

Configure Log forwarding and centralization to OpenSearch

Context

In this example we will configure log forwarding from a Kubernetes cluster to an Open Search Cluster.

  • The Open Search cluster is in this example my-cluster.es.cegedim.cloud

  • The Cluster Output name is my-cluster-output

Banzai Logging Configuration

In Rancher, under Logging > ClusterOutput and Logging > Output, edit YAML configuration and change this:

    port: 443
    log_es_400_reason: true # useful to debug when no log is received in Opne Search / ELK
    suppress_type_name: true
    # Add this if the target is an Open Search cluster
    default_elasticsearch_version: "7"
    verify_es_version_at_startup: false

ClusterFlow/ClusterOutput come with a lot of troubles of sending logs to OpenSearch / ELK cluster : Conflict of expected value kind with the same key (e.g. value changed from "long" to "string" will be rejected by OpenSearch / ELK Cluster).

This can happen if you have Dynatrace activated.

Here are full examples for the spec of ClusterOutput/Output for ElasticSearch and OpenSearch

ClusterOutput/Output for OpenSearch
spec:
  elasticsearch:
    buffer:
      chunk_limit_size: 256m
      compress: gzip
      flush_at_shutdown: true
      flush_interval: 60s
      flush_mode: interval
      flush_thread_count: 2
      queue_limit_length: 96
      retry_forever: true
      type: file
    flatten_hashes: true
    host: {{ ELK URL }}
    include_tag_key: true
    log_es_400_reason: true
    logstash_format: true
    logstash_prefix: {{ prefix }}
    password:
      valueFrom:
        secretKeyRef:
          key: password
          name: {{ ELK credentials }}
    port: 443
    reconnect_on_error: true
    reload_connections: false
    reload_on_failure: true
    request_timeout: 30s
    scheme: https
    ssl_verify: false
    suppress_type_name: true
    user: {{ username }}
    default_elasticsearch_version: "7"
    verify_es_version_at_startup: false
ClusterOutput/Output for Elastic
spec:
  elasticsearch:
    buffer:
      chunk_limit_size: 256m
      compress: gzip
      flush_at_shutdown: true
      flush_interval: 60s
      flush_mode: interval
      flush_thread_count: 2
      queue_limit_length: 96
      retry_forever: true
      type: file
    flatten_hashes: true
    host: {{ ELK URL }}
    include_tag_key: true
    log_es_400_reason: true
    logstash_format: true
    logstash_prefix: {{ prefix }}
    password:
      valueFrom:
        secretKeyRef:
          key: password
          name: {{ ELK credentials }}
    port: 443
    reconnect_on_error: true
    reload_connections: false
    reload_on_failure: true
    request_timeout: 30s
    scheme: https
    ssl_verify: false
    suppress_type_name: true
    user: {{ username }}

Migration recommendations

There are 2 options:

  • Migration to Flow/ClusterOutput : will push all namespaces logs to the same Open Search index

  • Migration to Flow/Output : will push separately namespaces logs to dedicated Open Search indexes

Recommendation is to migrate to "Flow/Output", and even possible having dedicated OpenSearch index for very important application.

Migration from ClusterFlow/ClusterOutput to Flow/ClusterOutput

Create a file with all namespaces:

Define namespaces
mkdir -p log-config
kubectl get ns --no-headers |grep -v kube-system|\
grep -v cattle|\
grep -v dynatrace|\
grep -v ceg-it|\
grep -v ingress-nginx|awk '{print $1}' > namespaces.txt
# review namespaces.txtD

Create K8s YAML files to configure logs on all namespaces:

Create flow resources
while read LOG_NS
do
  cat <<EOF>> log-config/"configuration-"$LOG_NS".yaml"
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: all-namespace-logs
  namespace: $LOG_NS
spec:
  filters:
  - dedot:
      de_dot_nested: true
      de_dot_separator: '-'
  globalOutputRefs:
  - my-cluster-output
EOF
done < namespaces.txt
# review content of folder log-config

Apply configuration:

kubectl apply -f log-config

Migration from ClusterFlow/ClusterOutput to Flow/Output

Create a file with all namespaces:

mkdir -p log-config
kubectl get ns --no-headers |grep -v kube-system|\
grep -v cattle|\
grep -v dynatrace|\
grep -v ceg-it|\
grep -v ingress-nginx|awk '{print $1}' > namespaces.txt
# review namespace.txt

Create K8s YAML files:

while read LOG_NS
do
  kubectl -n cattle-logging-system get secret {{ ELK credentials }} -o json \
 | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' \
 | kubectl apply -n $LOG_NS -f -
  cat <<EOF>> log-config/"configuration-"$LOG_NS".yaml"
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
  name: all-ns
  namespace: $LOG_NS
spec:
  elasticsearch:
    buffer:
      chunk_limit_size: 256m
      compress: gzip
      flush_at_shutdown: true
      flush_interval: 60s
      flush_mode: interval
      flush_thread_count: 2
      queue_limit_length: 96
      retry_forever: true
      type: file
    flatten_hashes: true
    host: my-cluster.es.cegedim.cloud
    include_tag_key: true
    log_es_400_reason: true
    logstash_format: true
    logstash_prefix: {{ prefix }}
    password:
      valueFrom:
        secretKeyRef:
          key: password
          name: {{ ELK credentials }}
    port: 443
    reconnect_on_error: true
    reload_connections: false
    reload_on_failure: true
    request_timeout: 30s
    scheme: https
    ssl_verify: false
    suppress_type_name: true
    user: {{ username }}
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: all-namespace-logs
  namespace: $LOG_NS
spec:
  filters:
  - dedot:
      de_dot_nested: true
      de_dot_separator: '-'
  globalOutputRefs: []
  localOutputRefs:
  - all-ns
EOF
done < namespaces.txt
# review content of folder log-config

Apply configuration:

kubectl apply -f log-config

Debug if log loss stil persists

No big buffer should occur if everthing goes well. Let's check that:

kubectl -n cattle-logging-system get po -l app.kubernetes.io/name=fluentd -o name|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec {} -c fluentd -- du -hs /buffers"

Let's check the last 5 lines of fluentd's log:

kubectl -n cattle-logging-system get po -l app.kubernetes.io/name=fluentd -o name|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec {} -c fluentd -- tail -5  /fluentd/log/out"

Have a deep look into /fluentd/log/out inside pod fluentd, but most of the time the following will help

kubectl -n cattle-logging-system get po -l app.kubernetes.io/component=fluentd --no-headers|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec -it  {} -c fluentd -- cat /fluentd/log/out|grep 'Rejected'"
2023-01-20 10:53:38 +0000 [warn]: #0 send an error event to @ERROR: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse field [stack_trace] of type [text] in document with id 'L7fQzoUBIs35W4-JCcJQ'. Preview of field's value: '{source=    at reconcileChildFibers (webpack-internal:///./node_modules/react-dom/cjs/react-dom.development.js:14348:23)}''" location=nil tag="kubernetes.var.log.containers.log-service-deployment-57747558f8-52jgx_my-application_log-service-6c560e016f7241b3231b96ed358390d8a2170b175833a695705c952183dcda5e.log" time=2023-01-20 10:52:33.431659917 +0000

Easy to identify the pod that cause issue:

kubectl get po -A |grep log-service-deployment-57747558f8
                  my-application                         log-service-deployment-57747558f8-52jgx                          1/1     Running                 0          4d20h

Please understand that the error is not in Kubernetes, it is the container that produces inconsistent log in json format. Then OpenSearch rejects the sent logs. Banzai will retry and sooner or later, overflow will arrive.

Sources:

https://discuss.elastic.co/t/elasticsearch-rejecting-records-because-of-parsing-error/180040

https://www.elastic.co/guide/en/elasticsearch/reference/7.0/ignore-malformed.html#json-object-limits

One short term solutions can be picked below:

  • Remove the pod from Flow (pod exclusion) or disable entire Flow of related namespace

  • Clean up related index in ES server

Long term solution:

  • Adapt application to produce more consistent log

  • See if it is possible to configure ES to gently ignore, but not reject the whole package sent by Banzai

Last updated