K8s - Get started
Last updated
Last updated
cegedim.cloud uses Rancher as the Kubernetes platform management.
Rancher handle ITCare SSO authentication : the login / password is the same as ITCare.
Regarding your cluster location, Rancher is reachable with following URLs:
EB (Boulogne-Billancourt)
ET (Toulouse-Labège)
In ITCare, you can find your cluster URL in the cluster detail page :
Rancher will as for an authentication at first login : simply click on "Login with Keycloak"
Then you will be redirected to the standard login process :
Once logged in, you should have a screen listing all the clusters you have accesses to :
If the UI gets stuck on "Loading" after the logging in, please try to connect to:
If on first login you don't see your cluster in the cluster list you might want to logout and login again.
You can manage your UI preferences (dark theme, number of rows per table...) by setting up your user preferences. Please refer here to a full documentation:
In order to connect to the cluster using CLI, you have two options :
by regular remote kubectl
using rancher online kubectl
Both are available by getting to the "cluster" page in Rancher. There are two ways of doing that :
Once on the cluster homepage you can download the "Kubeconfig File":
Or just copy the content of "Kubeconfig File":
This configuration can be mixed with other kubectl configuration.
The authentication can be shared with any cluster managed by the same rancher instance.
Once on the cluster home page you can use the web cli by clicking on the below icon :
This should launch a web shell like this one :
Token management UI is accessible right beneath the user avatar :
There are two scopes :
no-scope : global scope : used to interact with global rancher API
cluster-scoped : token dedicated to access specific cluster
Token can have different lifecycles :
a token can have a unlimited lifespan, it will follow the lifecycle of the account attached to it
a specific lifetime
You can use ITCare to add or remove nodes to your cluster.
Rancher manages namespaces via project, which is a concept specifically existing only in Kubernetes clusters managed by Rancher.
Project is not a Kubernetes native resource.
By default, a Kubernetes cluster is provisioned with 2 projects:
System: containing core-component's namespaces like: kube-system, etc.
Default: containng the "default" namespace
Users are free to create more Projects if needed.
Basing on Project level, Rancher offers built-in automation like: access rights granting, network isolation, etc.
Users are very encouraged to classify namespace into a Project.
Switch to project view
Create a new namespace from project view
Insert a unique name, and fill other fields if needed, and click on "Create"
cegedim.cloud recommends and officially supports access rights managing via AD groups.
Only AD groups starting with G_EMEA_* and G_K8_* are known by Rancher.
By default, when a cluster is created:
Standard user role is given to the group G_K8_<CLUSTER_NAME>_USERS which contains the power users of the related Cloud
Admin role is given to the group G_K8_<CLUSTER_NAME>_ADMINS which is empty by default and can be populated with competent & certified users via ITCare ticket toward AD support team.
For instance, user user1@cegedim.com needs to have standard user access to cluster test-preprod, he needs to ask to add user1@cegedim.com to the AD group named G_K8_TEST_PREPROD_USERS.
When users create a new Project, as default owner, they are free to bind any role on any AD group in the scope of this project.
If the Rancher predefined roles cannot fullfill your needs, please contact admins of your cluster to configure a custom rolebinding or clusterrolebinding.
Project Level Rights Management
In this part we will assume that the rights are given to a group not to a user. In order to manage right on a project there two ways : The UI or the API.
One of the highest role you can assign is "Project Admin"
Edit the project that you are owner or are given to sufficient rights from the project creator.
Select the group and the role in the form.
Using the API it's pretty straight forward. You will first need some parameters :
To get the project ID, you can use the API explorer simply use the "View in API" button.
curl --request GET \
--url https://k8s-eb.cegedim.cloud/v3/projects \
--header 'authorization: Bearer token-tttt:token-of-doom'
Getting Role ID
To get the role ID you might not be allowed to list the through the UI but you will get it through this API request :
curl --request GET \
--url https://rancher-staging.cegedim.cloud/v3/roleTemplates?name=Project%20Admin\
--header 'authorization: Bearer token-tttt:token-of-doom'
Give credentials
Using your API token you can make a single POST request to create the role binding:
curl --request POST \
--url https://k8s-eb.cegedim.cloud/v3/projectRoleTemplateBindings \
--header 'authorization: Bearer token-tttt:token-of-doom' \
--header 'content-type: application/json' \
--data '{
"projectId": "c-6t7f4:p-d43l6",
"namespaceId":"",
"groupPrincipalId":"keycloakoidc_group_group://G_EMEA_DUPER_GROUP",
"roleTemplateId": "project-owner"
}'
Kubernetes resource api versions can be deprecated and even removed when upgrading Kubernetes version.
Things can be broken if you have resources of which apiversion is removed in the new Kubernetes version.
To avoid this risk, one of the solutions is using "kubent" tool to check the compability.
Kubent detects deprecated objects of a Kubernetes cluster. You should migrate/modify detected resources before the Kubernetes version upgrading.
To install kubent:
sh -c "$(curl -sSL 'https://git.io/install-kubent')"
To detect depricated objects that will be removed in the newer Kuberntes version:
kubent [--context my-cluster]
An example of the output:
...
__________________________________________________________________________________________
>>> Deprecated APIs removed in 1.22 <<<
------------------------------------------------------------------------------------------
KIND NAMESPACE NAME API_VERSION REPLACE_WITH (SINCE)
Ingress <undefined> toto networking.k8s.io/v1beta1 networking.k8s.io/v1 (1.19.0)
...
In this tutorial, if you cluster is planned for upgraded to Kubernetes version 1.22, you should migrate your ingress ressource named "toto" from api version networking.k8s.io/v1beta1
to networking.k8s.io/v1
before the upgrade.
This migration might imply modifying some extra fields of the resources. Please refer to the official documentation:
Kubent might fail to retrieve some information, e.g. namespace of the ingress, feel free to fire an issue for the editor: https://github.com/doitintl/kube-no-trouble/issues
In this example we will configure log forwarding from a Kubernetes cluster to an Open Search Cluster.
The Open Search cluster is in this example my-cluster.es.cegedim.cloud
The Cluster Output name is my-cluster-output
In Rancher, under Logging > ClusterOutput and Logging > Output, edit YAML configuration and change this:
port: 443
log_es_400_reason: true # useful to debug when no log is received in Opne Search / ELK
suppress_type_name: true
# Add this if the target is an Open Search cluster
default_elasticsearch_version: "7"
verify_es_version_at_startup: false
ClusterFlow/ClusterOutput come with a lot of troubles of sending logs to OpenSearch / ELK cluster : Conflict of expected value kind with the same key (e.g. value changed from "long" to "string" will be rejected by OpenSearch / ELK Cluster).
This can happen if you have Dynatrace activated.
Here are full examples for the spec of ClusterOutput/Output for ElasticSearch and OpenSearch
There are 2 options:
Migration to Flow/ClusterOutput : will push all namespaces logs to the same Open Search index
Migration to Flow/Output : will push separately namespaces logs to dedicated Open Search indexes
Recommendation is to migrate to "Flow/Output", and even possible having dedicated OpenSearch index for very important application.
Create a file with all namespaces:
mkdir -p log-config
kubectl get ns --no-headers |grep -v kube-system|\
grep -v cattle|\
grep -v dynatrace|\
grep -v ceg-it|\
grep -v ingress-nginx|awk '{print $1}' > namespaces.txt
# review namespaces.txtD
Create K8s YAML files to configure logs on all namespaces:
while read LOG_NS
do
cat <<EOF>> log-config/"configuration-"$LOG_NS".yaml"
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: all-namespace-logs
namespace: $LOG_NS
spec:
filters:
- dedot:
de_dot_nested: true
de_dot_separator: '-'
globalOutputRefs:
- my-cluster-output
EOF
done < namespaces.txt
# review content of folder log-config
Apply configuration:
kubectl apply -f log-config
Create a file with all namespaces:
mkdir -p log-config
kubectl get ns --no-headers |grep -v kube-system|\
grep -v cattle|\
grep -v dynatrace|\
grep -v ceg-it|\
grep -v ingress-nginx|awk '{print $1}' > namespaces.txt
# review namespace.txt
Create K8s YAML files:
while read LOG_NS
do
kubectl -n cattle-logging-system get secret {{ ELK credentials }} -o json \
| jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' \
| kubectl apply -n $LOG_NS -f -
cat <<EOF>> log-config/"configuration-"$LOG_NS".yaml"
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: all-ns
namespace: $LOG_NS
spec:
elasticsearch:
buffer:
chunk_limit_size: 256m
compress: gzip
flush_at_shutdown: true
flush_interval: 60s
flush_mode: interval
flush_thread_count: 2
queue_limit_length: 96
retry_forever: true
type: file
flatten_hashes: true
host: my-cluster.es.cegedim.cloud
include_tag_key: true
log_es_400_reason: true
logstash_format: true
logstash_prefix: {{ prefix }}
password:
valueFrom:
secretKeyRef:
key: password
name: {{ ELK credentials }}
port: 443
reconnect_on_error: true
reload_connections: false
reload_on_failure: true
request_timeout: 30s
scheme: https
ssl_verify: false
suppress_type_name: true
user: {{ username }}
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: all-namespace-logs
namespace: $LOG_NS
spec:
filters:
- dedot:
de_dot_nested: true
de_dot_separator: '-'
globalOutputRefs: []
localOutputRefs:
- all-ns
EOF
done < namespaces.txt
# review content of folder log-config
Apply configuration:
kubectl apply -f log-config
No big buffer should occur if everthing goes well. Let's check that:
kubectl -n cattle-logging-system get po -l app.kubernetes.io/name=fluentd -o name|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec {} -c fluentd -- du -hs /buffers"
Let's check the last 5 lines of fluentd's log:
kubectl -n cattle-logging-system get po -l app.kubernetes.io/name=fluentd -o name|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec {} -c fluentd -- tail -5 /fluentd/log/out"
Have a deep look into /fluentd/log/out
inside pod fluentd
, but most of the time the following will help
kubectl -n cattle-logging-system get po -l app.kubernetes.io/component=fluentd --no-headers|awk '{print $1}'|xargs -I {} sh -c "kubectl -n cattle-logging-system exec -it {} -c fluentd -- cat /fluentd/log/out|grep 'Rejected'"
2023-01-20 10:53:38 +0000 [warn]: #0 send an error event to @ERROR: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse field [stack_trace] of type [text] in document with id 'L7fQzoUBIs35W4-JCcJQ'. Preview of field's value: '{source= at reconcileChildFibers (webpack-internal:///./node_modules/react-dom/cjs/react-dom.development.js:14348:23)}''" location=nil tag="kubernetes.var.log.containers.log-service-deployment-57747558f8-52jgx_my-application_log-service-6c560e016f7241b3231b96ed358390d8a2170b175833a695705c952183dcda5e.log" time=2023-01-20 10:52:33.431659917 +0000
Easy to identify the pod that cause issue:
kubectl get po -A |grep log-service-deployment-57747558f8
my-application log-service-deployment-57747558f8-52jgx 1/1 Running 0 4d20h
Please understand that the error is not in Kubernetes, it is the container that produces inconsistent log in json format. Then OpenSearch rejects the sent logs. Banzai will retry and sooner or later, overflow will arrive.
Sources:
https://discuss.elastic.co/t/elasticsearch-rejecting-records-because-of-parsing-error/180040
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/ignore-malformed.html#json-object-limits
One short term solutions can be picked below:
Remove the pod from Flow (pod exclusion) or disable entire Flow of related namespace
Clean up related index in ES server
Long term solution:
Adapt application to produce more consistent log
See if it is possible to configure ES to gently ignore, but not reject the whole package sent by Banzai