Upgrade Instructions
Follow these step-by-step instructions to upgrade Kloudfuse using Helm, including the upgrade command and version-specific notes for all supported releases.
Helm upgrade command
-
Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster.
-
To check which Kloudfuse version you have, run the following command:
helm list -
Run the
upgradecommand.helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \ -n kfuse \ --version <VERSION> \ (1) -f custom-values.yaml1 Replace <VERSION>with a valid Kloudfuse release value; use the most recent one.
Upgrade to 4.1.0
Pre-Upgrade Steps
- Set
events-query-service.config.rbacdb.user(Externally-Managed Postgres Only) -
If your
configDBis hosted on an externally-managed Postgres instance (AWS RDS, GCP Cloud SQL, Azure Database for PostgreSQL) where the application user is not the defaultpostgres, you must add an explicitevents-query-service.config.rbacdb.useroverride to yourcustom_values.yaml. Theevents-query-servicechart currently ships with a hardcodedrbacdb.user: postgresdefault that masks theglobal.configDB.usernamefallback, so without this overrideevents-query-serviceconnects torbacdbaspostgresand fails withno pg_hba.conf entry for host …, user "postgres", database "rbacdb". Skip this step ifglobal.configDB.usernameispostgres(the default).events-query-service: config: rbacdb: user: <your-configdb-app-username> # match global.configDB.usernameyamlThis step will not be required once
events-query-servicev0.1.0-<TBD> ships the chart-side fix; the override can be removed in a follow-up release.
Post-Upgrade Steps
- Migrate Ingestion Auth Keys to the UI (Optional, Recommended)
-
Starting in 4.1.0, ingestion API keys and their optional additional labels can be managed from the Kloudfuse UI. Open Admin > Settings, then click Configure on the Auth key labels card. The legacy YAML configuration (
kfuse-auth-ingestsecret +ingester.config.authKeyAdditionalLabels) continues to work but is deprecated. YAML-sourced entries appear in the UI as read-only.After upgrading, recreate each YAML-managed entry through the UI (reusing the existing token to avoid agent reconfiguration), then delete the legacy YAML configuration:
kubectl delete secret kfuse-auth-ingest -n <namespace> (1)1 Replace <namespace>with the namespace of your Kloudfuse deployment.Remove the
ingester.config.authKeyAdditionalLabelsblock fromcustom_values.yamland re-apply Helm values, then restartconfig-mgmt-service:kubectl rollout restart deploy/config-mgmt-service -n <namespace> (1)2 Replace <namespace>with the namespace of your Kloudfuse deployment.For full migration steps and a placeholder helper script, see Migrate YAML-managed Auth Keys to the UI.
4.0.1
Pre-Upgrade Steps
- TLS Certificate Key Type Check (Required for Envoy Gateway deployments)
-
Release 4.0.1 upgrades the Envoy proxy to v1.36 with FIPS-compliant cryptography. The FIPS proxy rejects RSA private keys for TLS certificates delivered via SDS. All TLS secrets should be checked and any RSA certificates should be recreated as ECDSA.
If your deployment uses AWS ACM ( tls.awsAcmEnabled: true), TLS is terminated at the NLB and Envoy never loads the listener certificate. You only need to check the internal envoy certificates (steps 1-2), not the listener certificate.-
Scan all TLS secrets for RSA keys:
for secret in $(kubectl get secrets -n <namespace> --field-selector type=kubernetes.io/tls -o jsonpath='{.items[*].metadata.name}'); do KEY_TYPE=$(kubectl get secret $secret -n <namespace> -o jsonpath='{.data.tls\.key}' | base64 -d | openssl pkey -noout -text 2>/dev/null | head -1) if echo "$KEY_TYPE" | grep -q "2048\|4096\|3072"; then echo "RSA: $secret — needs recreation" else echo "ECDSA: $secret — OK" fi done (1)1 Replace <namespace>with the namespace of your Kloudfuse deployment. -
Delete internal envoy certificates (certgen certs):
These are regenerated automatically by the certgen hook during the helm upgrade. The 4.0.1 certgen generates ECDSA P-256 keys.
kubectl delete secret envoy envoy-gateway envoy-rate-limit -n <namespace> (1)1 Replace <namespace>with the namespace of your Kloudfuse deployment. Some of these secrets may not exist depending on your configuration — errors for missing secrets can be ignored. -
Proceed with the helm upgrade to 4.0.1.
-
Post-Upgrade Steps
- TLS Listener Certificate Reissue (Required for cert-manager deployments)
-
After the upgrade, the Gateway resource has ECDSA annotations (
cert-manager.io/private-key-algorithm: ECDSA). If your listener TLS certificate was RSA (identified in the pre-upgrade check), it must be deleted and reissued as ECDSA.Skip this section if you use AWS ACM ( tls.awsAcmEnabled: true) or if your listener certificate was already ECDSA.-
Verify the Gateway has ECDSA annotations:
kubectl get gateway kfuse -n <namespace> -o yaml | grep -i 'private-key' (1)1 Replace <namespace>with the namespace of your Kloudfuse deployment.You should see:
cert-manager.io/private-key-algorithm: ECDSA cert-manager.io/private-key-size: "256"yamlIf these annotations are not present, add them manually:
kubectl annotate gateway kfuse -n <namespace> \ cert-manager.io/private-key-algorithm=ECDSA \ cert-manager.io/private-key-size="256" --overwrite (2)2 Replace <namespace>with the namespace of your Kloudfuse deployment. If you have an internal gateway (kfuse-internal), repeat for that gateway as well. -
Delete the listener TLS certificate:
kubectl delete secret <tls-secret-name> -n <namespace> (1)1 Replace <tls-secret-name>with your TLS secret name (default isletsencrypt-ingress-tls). Check yourtls.secretNamevalue if you use a custom name.cert-manager will automatically reissue the certificate as ECDSA within 30-60 seconds.
-
Verify the new certificate is ECDSA:
kubectl get secret <tls-secret-name> -n <namespace> \ -o jsonpath='{.data.tls\.key}' | base64 -d | \ openssl pkey -noout -text 2>/dev/null | head -1 (1)1 Replace <tls-secret-name>and<namespace>.Should show
Private-Key: (256 bit)(ECDSA P-256). -
Restart the Envoy proxy to load the new certificate:
kubectl rollout restart deploy/envoy -n <namespace> (1)1 Replace <namespace>with the namespace of your Kloudfuse deployment. If you have an internal envoy (envoy-internal), restart that as well. -
Verify HTTPS is working:
curl -sk https://<your-hostname> -o /dev/null -w "%{http_code}"Expected response:
401(redirects to login).
If you use a BYOC (Bring Your Own Certificate) RSA cert, cert-manager cannot reissue it. You must obtain an ECDSA P-256 certificate from your certificate authority and update the Kubernetes TLS secret, or switch to AWS ACM for TLS termination. -
Why is this needed?
The Envoy proxy FIPS build uses SafeLogic CryptoComply for cryptographic operations. When a TLS certificate is delivered to the proxy via SDS (Secret Discovery Service), the private key is imported into the FIPS cryptographic module. SafeLogic’s implementation runs a Pairwise Consistency Test (PCT) on RSA key import, which fails. ECDSA keys are not affected.
Internal certificates used for communication between Envoy components (certgen/xDS certs) are loaded from files and work with RSA, but are recreated as ECDSA for consistency.
| TLS Setup | Envoy loads private key via SDS? | Action needed? |
|---|---|---|
AWS ACM (NLB terminates TLS) |
No |
Pre-upgrade only (certgen certs) |
cert-manager (Envoy terminates TLS) |
Yes |
Pre-upgrade + Post-upgrade (all certs) |
BYOC (Envoy terminates TLS) |
Yes |
Pre-upgrade + reissue ECDSA cert from CA |
4.0.0
Pre-Upgrade Steps
kubectl delete deployment -n <namespace> hydration-service (1)
kubectl delete cm -n <namespace> hydration-service (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
- Envoy Gateway Migration (Optional)
-
Starting in 4.0.0, Kloudfuse supports Envoy Gateway as an alternative to NGINX Ingress. Existing deployments can optionally migrate using a zero-downtime 3-step migration process. See Configure Envoy Ingress for details.
- AWS CloudWatch Namespaces (Breaking Change)
-
Starting in 4.0.0, the
awsNamespaceslist in Helm values defaults to an empty list. Customers using AWS CloudWatch metrics scraping must explicitly list the namespaces they want to scrape. Before upgrading, add the desired namespaces underingester.config.awsNamespacesin your Helm values override:ingester: config: awsNamespaces: - "AWS/EC2" - "AWS/RDS" - "AWS/S3" - "AWS/Lambda" # Add other namespaces as neededyamlFor the full list of supported namespaces, see AWS Services.
Two new namespaces are now supported: AWS/BedrockandAWS/QBusiness. If you enable these, ensure your AWS IAM scraper role policy includes the required permissions:bedrock:ListFoundationModels,bedrock:ListTagsForResource,qbusiness:ListApplications,qbusiness:GetApplication,qbusiness:ListTagsForResource.
3.5.3
Pre-Upgrade Steps
- Metrics Transformer GOMEMLIMIT
-
Multi-resolution rollup increases memory usage in the metrics transformer. Review and increase the
GOMEMLIMITsetting for the metrics transformer if needed, especially for high-cardinality deployments. - Kafka Rollup Topic
-
Metrics rollup is enabled by default in 3.5.3. Before upgrading, ensure that
kf_metrics_rollup_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_metrics_topic. - RUM Vitals Kafka Topic (Conditional)
-
If RUM is enabled in your deployment, ensure that
kf_rum_vitals_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_rum_views_topic. - Legacy Rollup Interval (Conditional)
-
If you were using a non-default rollup interval before 3.5.3 (for example, 600s instead of the default 300s), set the legacy interval so that pre-3.5.3 rollup data is attributed to the correct resolution:
global: metrics: legacyRollupIntervalSecs: 600yaml
Post-Upgrade Steps
- Multi-Rollup Resolution
-
After the upgrade, wait for the setup-pinot job to complete, then restart the Pinot STS (StatefulSet) to pick up the new rollup schema and table configuration. Existing metrics data is automatically compatible with multi-rollup resolution.
- Pod Security Configuration (Optional)
-
Review your Helm values if you use custom security configurations. All services now run as non-root users by default and support configurable service accounts and security context. Existing deployments without custom security settings are unaffected.
- Scheduled Views
-
Scheduled views have been redesigned. You must recreate your scheduled views after upgrading. Data continuity from the previous implementation cannot be guaranteed.
3.5.2
Post-Upgrade Steps
- Container Image Signature Verification (Optional)
-
Starting with 3.5.2, all Kloudfuse container images and Helm charts are signed. You can optionally verify image signatures before deployment.
- Folder Permissions
-
If you use folder-based organization for dashboards and alerts, review folder permissions after upgrade to ensure appropriate access levels are configured.
- Logs Parser Restart
-
After upgrading to 3.5.2, you must restart the logs-parser to ensure proper functionality.
The following commands use kfuseas the default namespace. Replace with your actual namespace if different.-
Scale down the logs-parser:
kubectl scale sts logs-parser -n kfuse --replicas=0bash -
Verify that all logs-parser pods are terminated before proceeding:
kubectl get pods -n kfuse -l app=logs-parserbash -
Scale up the logs-parser to match the
numNodesvalue configured in your custom values YAML:kubectl scale sts logs-parser -n kfuse --replicas=<numNodes>bash
-
3.4.4 - p1
This guide covers upgrading to version 3.4.4-p1. This guide can be used for upgrading from 3.4.3 to 3.4.4-p1 and for upgrading from 3.4.4 to 3.4.4-p1; the only difference is in Phase 1 configuration.
Important Notes
- Indentation Matters
-
The
kafka-kraftsection must be at the same indentation level as thekafkasection (root level), NOT under theglobalsection. - Disk Size
-
Always copy the persistence disk size from your existing kafka broker to the kafka-kraft broker configuration.
- Version Consistency
-
Use version
3.4.4-p1for all helm upgrade commands throughout the process. - Scripts
-
The following scripts referenced in this guide are available at https://github.com/kloudfuse-ext/customer/tree/main/scripts:
-
pause_consumption.sh- Pauses Pinot consumption on all tables -
resume_consumption.sh- Resumes Pinot consumption on all tables -
get_consuming_segments_info.sh- Gets current status of consuming segments
-
Pre-Upgrade Steps
Update kfuse-vector Configuration
The kfuse-vector component has been renamed to kfuse-archival-vector. If your values.yaml contains a kfuse-vector section, you must rename it before upgrading to 3.4.4:
# Old configuration (3.4.3 and earlier)
kfuse-vector:
<your-configuration>
# New configuration (3.4.4 and later)
kfuse-archival-vector:
<your-configuration>
Phase 1: Deploy Both Legacy Kafka and Kafka-Kraft
Deploy both legacy kafka and kafka-kraft services, but continue using legacy kafka for all operations.
-
Update custom_values.yaml
Add the three legacy flags under the
global.kafkasection:global: kafka: deployLegacy: true useLegacy: true ingesterUseLegacy: trueyaml -
Configure Kafka Services
Add the
kafka-kraftsection at the same indentation level as thekafkasection (NOT underglobal).Ensure that the existing kafka.brokerdisk size is copied tokafka-kraft.broker. For example, if your existing kafka has persistence of 200Gi, copy it to the kafka-kraft section. If you are using a customstorageClassforkafka-brokerinstead ofkfuse-ssd, please include it in thekafka-kraft.brokersection and create a section forkafka-kraft.controllerwithstorageClass.- If upgrading from 3.4.3
-
kafka: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional kafka-kraft: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional #Optional controller: persistence: storageClass: <storage-class-name>yaml
-
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Wait for Deployment
Wait for
kafka-kraft-brokerandkafka-kraft-controllerpods to be up and running, and for the kafka topic creator job to finish.
Phase 2: Switch Ingester to Kafka-Kraft
Switch the ingester to use the new kafka-kraft by removing ingesterUseLegacy from custom_values.yaml.
-
Update custom_values.yaml
Remove only the
ingesterUseLegacyflag from theglobal.kafkasection. Thekafkaandkafka-kraftsections remain unchanged:global: kafka: deployLegacy: true useLegacy: trueyaml -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Check Kafka Consumer Lag
Check kafka consumer lag on
kafka-broker-0by running the below code snippet. Output of the code will show multiple topics with multiple columns; once the values in the lag column is all zero, then move onto the next step.kubectl exec -ti -n kfuse kafka-broker-0 -- bash unset JMX_PORT /opt/bitnami/kafka/bin/kafka-consumer-groups.sh \ --bootstrap-server :9092 --describe --all-groups -
Pause Pinot Consumption
Pause Pinot consumption by first port-forwarding to
pinot-controller-0and then running thepause_consumption.shscript:kubectl port-forward -n kfuse pinot-controller-0 9000:9000Then run the
pause_consumption.shscript. -
Wait for Segment Sealing
Run
get_consuming_segments_info.sh(pinot-controller needs to be port forwarded) to get the current status. To continue with the upgrade the segments need to be sealed, which can be verified if the map for_segmentToConsumingInfoMapelement doesn’t contain any element in{}, as shown below.Example output when segments are sealed:
~/get_consuming_segments_info.sh Fetching realtime tables... Found tables: kf_events_REALTIME kf_logs_REALTIME kf_logs_views_REALTIME kf_metrics_REALTIME kf_metrics_rollup_REALTIME kf_rum_actions_REALTIME kf_rum_errors_REALTIME kf_rum_longtasks_REALTIME kf_rum_resources_REALTIME kf_rum_views_REALTIME kf_traces_REALTIME kf_traces_errors_REALTIME Getting consuming segments info for: kf_events (from kf_events_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs (from kf_logs_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs_views (from kf_logs_views_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}}
Phase 3: Switch All Services to Kafka-Kraft
Switch all other services to use kafka-kraft.
-
Update custom_values.yaml
Remove the
global.kafkaandkafkasections from custom_values.yaml. Only thekafka-kraftsection is needed. The default helm configuration for 3.4.4-p1 already uses the new kafka for all services. -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Re-enable Pinot Consumption
Once the setup-pinot job has completed, re-enable pinot consumption on all tables by running
resume_consumption.sh.
Post-Upgrade Steps
Let the New Kafka-Kraft Bake for 24hrs
After successful migration and a waiting period of 24hrs, the legacy kafka-broker and kafka-zookeeper PVCs should be deleted:
kubectl get pvc -n kfuse | grep kafka-zookeeper
# Add the pvc names for all kafka-zookeeper
kubectl delete pvc data-kafka-zookeeper-0
kubectl get pvc -n kfuse | grep kafka-broker
# Add the pvc names for all kafka-broker instances
kubectl delete pvc data-kafka-broker-0
3.4.3
Pre-Upgrade Steps
If you plan to use GCP Stackdriver metrics and enrichment features, create a GCP service account secret before upgrading.
-
Follow the instructions at GCP Metrics Credentials to create a service account with the required permissions.
-
Create the secret in your Kubernetes cluster:
kubectl create secret generic kfuse-gcp-credentials \ --from-file=key.json=<path-to-service-account-json> -
Configure the secret name in your
values.yaml:global: gcpConfig: secretName: "kfuse-gcp-credentials"yaml
3.4.2
Pre-Upgrade Steps
-
Starting with 3.4.2, the AZ service is enabled by default. To ensure a successful upgrade, configure the
cloudStoragesection in yourvalues.yamlfile. -
You can define storage either:
-
At the service level (
pinot.deepStoreoraz-service.cloudStore) -
At the global
cloudStoragesectionService-level settings always take precedence. If both are present, the upgrade continues to work as is. We recommend consolidating into the global
cloudStoragesection for consistency across services.
-
-
Configure the storage backend. Supported types are s3, gcs, and azure:
global:
cloudStorage:
# Supported types: s3, gcs, azure
type: s3
useSecret: true
secretName: cloud-storage-secret
# S3-specific
s3:
region: <specify region>
bucket: <specify bucket>
# GCS-specific
gcs:
bucket: <specify bucket>
# Azure-specific
azure:
container: <specify container>
-
If you use secrets for authentication, create them outside of Kloudfuse using
kubectl:-
S3 – secret must include
accessKeyandsecretKey:kubectl create secret generic cloud-storage-secret \ --from-literal=accessKey=<accessKey> \ --from-literal=secretKey='<secretKey>' -
GCS – secret must include the JSON credentials file (saved as
secretKey):kubectl create secret generic cloud-storage-secret \ --from-file=./secretKey -
Azure – secret must include the storage account
connectionString:kubectl create secret generic cloud-storage-secret \ --from-literal=connectionString=<connectionString>
-
-
If Pinot was previously configured with
deepStore, migrate it:-
Remove the cloud storage configuration from pinot deepStore section
-
Replace
dataDirwithprefixin the service section. -
The bucket name goes to the global config; everything after the bucket path becomes the
prefix.Example: If
dataDirwas:
-
s3://kfuse-bucket/pisco/controller/data
Set:
global:
cloudStorage:
type: s3
s3:
bucket: kfuse-bucket
pinot:
deepStore:
enabled: true
prefix: pisco/controller/data
3.4.0 - p2
There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.4.0.p2.
3.4.0 - p1
Pre-Upgrade Steps
We changed the minion pvc disk size default. To successfully upgrade to this version, delete the Pinot minion StatefulSet and its PVC by running:
kubectl delete sts -n <namespace> pinot-minion (1)
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=minion -n <namespace> (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
3.4.0
Pre-Upgrade and Post-Upgrade Steps
Perform the following check before and after upgrading to ensure the admin user configuration is correct:
-
Verify the admin user configuration in the alerts database:
kubectl exec -it kfuse-configdb-0 -- bin/bash psql -U postgres -d alertsdb select * from public.user where login='admin'; select * from public.user where email='admin@localhost';bashBoth queries should return the same row with
id = 1. If they return different IDs, fix it using the following operations:UPDATE public.user SET id=1 where email='admin@localhost'; DELETE from public.user where id=<ID from the output of the first command>;sqlThen restart Grafana:
kubectl rollout restart deployment kfuse-grafanabash
3.3.0
There are no specific post-upgrade steps for this release.
Pre-Upgrade Steps
-
If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the
values.yamlfile before upgrading.config-mgmt-service: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ng_label operator: In values: - az1 tolerations: - key: "ng_taint" operator: "Equal" value: "az1" effect: "NoSchedule"code -
The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the
globalsection.