Troubleshooting
This guide helps you diagnose and resolve common issues encountered when working with cegedim.cloud Kubernetes clusters.
Rancher UI Issues
Rancher UI Stuck on "Loading"
Symptoms: After logging in, the Rancher UI displays "Loading" indefinitely.
Solution:
Try accessing the direct dashboard URL:
For ET region: https://rancher-et.cegedim.cloud/dashboard/home
For EB production: https://rancher-eb.cegedim.cloud/dashboard/home
For EB non-production: https://rancher-eb-qa.cegedim.cloud/dashboard/home
If the issue persists, log out and log back in
Try using a different browser or incognito/private mode
Clear your browser cache and cookies for the Rancher domain
Cluster Not Visible After First Login
Symptoms: After your first login to Rancher, your cluster doesn't appear in the cluster list.
Solution:
Log out of Rancher completely
Log back in
Your cluster should now appear in the list
If the cluster still doesn't appear, verify your access rights through ITCare or contact your administrator
Cannot Access Rancher (Connection Refused or Timeout)
Symptoms: Unable to reach rancher-et.cegedim.cloud or rancher-eb.cegedim.cloud.
Solution:
Check network access: Some Rancher instances are only accessible from the server network
rancher-et.cegedim.cloud - Requires server network access (connect through bastion)
rancher-eb.cegedim.cloud (production) - Requires server network access (connect through bastion)
rancher-eb-qa.cegedim.cloud (non-production) - Accessible from standard network
Verify Rancher status: Check if a Rancher upgrade is in progress (typically 15-30 minutes)
kubectl Access Issues
kubectl Commands Fail with "Connection Refused"
Symptoms: kubectl commands return connection errors or timeouts.
Possible Causes and Solutions:
1. Rancher Proxy Issue
If your kubeconfig uses Rancher URL, Rancher might be down or upgrading
Wait for Rancher to become available again
Consider using direct cluster access if available
2. Invalid or Expired Credentials
Download a fresh kubeconfig from Rancher UI
Verify your token hasn't expired (check token lifecycle in Rancher)
3. Network Connectivity
Test connectivity:
curl -v https://<rancher-url>Verify you're on the correct network (bastion for ET/EB production)
Check firewall rules and proxy settings
kubectl Context Not Switching
Symptoms: kubectl commands affect the wrong cluster.
Solution:
Cluster Access and Authentication
"Forbidden" Errors When Running kubectl Commands
Symptoms: Commands return "Error from server (Forbidden): is forbidden".
Solution:
Verify your access rights in Rancher: Check the "Manage Rights" page for your Project/Cluster permissions
Use SelfSubjectAccessReview: Run the following command to check your permissions for specific resources:
Check Project/Namespace permissions: Ensure you have the correct role in the Project
Verify AD group membership: Confirm you're in the correct G_K8_* groups
Check token scope: Ensure you're using a cluster-scoped token for kubectl operations
Cannot Create Resources in Namespace
Symptoms: Permission denied when creating pods, deployments, etc.
Solution:
Verify the namespace belongs to a Project you have access to
If the namespace was created via kubectl (not Rancher UI), it may be in the "Default" project with restricted access
Contact your Project admin to move the namespace to the correct Project or grant permissions
Workload Issues
Pods Stuck in "Pending" State
Symptoms: Pods remain in "Pending" status and don't start.
Diagnosis:
Common Causes and Solutions:
1. Insufficient Resources
Message: "Insufficient cpu" or "Insufficient memory"
Solution: Request more nodes through ITCare or reduce resource requests
2. Persistent Volume Issues
Message: "persistentvolumeclaim not found" or "no persistent volumes available"
Solution: Verify PVC exists and storage class is correct
3. Node Selector/Affinity Mismatch
Message: "No nodes are available that match all of the following predicates"
Solution: Review nodeSelector and affinity rules
4. Image Pull Errors
Message: "Failed to pull image" or "ImagePullBackOff"
Solution: See "Image Pull Issues" section below
Image Pull Issues (ImagePullBackOff)
Symptoms: Pods fail with "ImagePullBackOff" or "ErrImagePull" status.
Diagnosis:
Common Causes and Solutions:
1. Private Registry Authentication
Create or verify image pull secret exists
Ensure secret is referenced in pod spec or service account
2. Image Name Typo
Verify image name and tag are correct
Check registry URL is properly formatted
3. Network Connectivity to Registry
Verify cluster can reach external registry
Check if network policies block registry access
Request network opening through ITCare if needed
Ingress Not Routing Traffic
Symptoms: Cannot access application through ingress URL.
Diagnosis:
Common Causes and Solutions:
1. Incorrect Ingress Class
For Nginx (default): No class annotation needed or use
kubernetes.io/ingress.class: "nginx"For Nginx external: Use
kubernetes.io/ingress.class: "nginx-ext"For Traefik: Use appropriate Traefik ingress class
For Istio: Use Istio Gateway configuration
2. Service Not Found or Misconfigured
Verify service name and port match ingress backend
Check that service has endpoints (pods selected)
3. Certificate Issues
Default:
*.yourclustername.ccs.cegedim.cloudcertificate is pre-configuredCustom domains: Request certificate configuration through ITCare
Persistent Storage Issues
PVC Stuck in "Pending" State
Symptoms: PersistentVolumeClaim remains "Pending" and pods cannot start.
Diagnosis:
Common Causes and Solutions:
1. Storage Class Not Found
Verify storage class name in PVC
List available storage classes:
kubectl get storageclassUse Ceph-based storage classes provided by cegedim.cloud
2. Storage Quota Exceeded
Check if storage quota is available
Request additional storage through ITCare
3. Ceph CSI Not Available
Verify Ceph CSI is enabled for your cluster
Contact support if Ceph CSI is not provisioned
Network Policy Issues
Pods Cannot Communicate Between Namespaces
Symptoms: Pods in different namespaces cannot reach each other.
Understanding Rancher Project Network Isolation:
Pods in namespaces within the same Rancher Project can communicate by default
Pods in namespaces in different Rancher Projects cannot communicate unless explicitly allowed
Solution:
Option 1: Move namespaces to the same Rancher Project (if appropriate)
Option 2: Create a NetworkPolicy to explicitly allow cross-project communication
Note: Pods in the "System" project can communicate with all other projects
Pods Cannot Access External Services
Symptoms: Pods cannot reach internet or external services.
Understanding Network Restrictions:
By default, pods can only reach services within the same VLAN
Internet access requires proxy configuration or network opening
Solution:
For internet access: Configure HTTP proxy in your pods or request network opening through ITCare
For specific external services: Request network opening between VLANs through ITCare
For external databases/APIs: Verify network policies and firewall rules
Logging and Monitoring Issues
Logs Not Appearing in OpenSearch/ELK
Symptoms: Application logs are not visible in your log aggregation platform.
Diagnosis:
Common Causes and Solutions:
1. Flow/Output Not Configured
Verify Flow and Output/ClusterOutput resources exist for your namespace
Check configuration matches your OpenSearch cluster
2. Conflicting Log Fields
OpenSearch/ELK rejects logs with field type conflicts
Check fluentd logs for "Rejected" messages
See detailed logging configuration in the "Get Started" guide
3. Application Producing Malformed JSON
Application logs must be properly formatted
Consider excluding problematic pods from logging Flow
Migration Issues
Application Fails After Migration to RKE2
Symptoms: Application worked on RKE but fails on RKE2.
Common Causes and Solutions:
1. Deprecated API Versions
Run
kubenttool before migration to detect deprecated APIsUpdate manifests to use current API versions
2. CNI Differences
If migrating to Cilium from Canal, network policies might behave differently
Review and test network policies after migration
3. Missing ConfigMaps or Secrets
Verify all ConfigMaps and Secrets were migrated
Check namespaces and names match exactly
4. External Integration Issues
Update CI/CD pipelines with new cluster kubeconfig
Reconfigure connections to Vault, databases, and other PaaS services
Update monitoring and alerting integrations
Getting Help
If you cannot resolve the issue using this guide:
Self-Service Resources
Check the Kubernetes official documentation
Review Rancher documentation
Consult other sections of this documentation (Features, Get Started, etc.)
Contact Support
For non-urgent issues: Submit a ticket through ITCare
For production incidents: Contact 24x7 support team (if you have 24x7 monitoring option)
For migration assistance: Submit a ticket with subject "RKE to RKE2 Migration Request"
Information to Provide When Requesting Support
To help support diagnose your issue quickly, please provide:
Cluster Information:
Cluster name
Region (ET, EB)
Kubernetes version
Problem Description:
What you were trying to do
What happened vs. what you expected
When the issue started
Any recent changes (deployments, upgrades, configuration changes)
Relevant Details:
Namespace and resource names affected
Error messages from kubectl or Rancher UI
Output of relevant kubectl describe commands
Screenshots of Rancher UI errors (if applicable)
Troubleshooting Already Performed:
Steps you've already tried
Results of those attempts
Most issues can be resolved quickly with proper diagnostics. Don't hesitate to gather relevant information before submitting a support ticket - it helps the support team assist you faster!
Last updated

