Upgrade Instructions
Helm upgrade command
-
Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster.
-
To check which Kloudfuse version you have, run the following command:
helm list -
Run the
upgradecommand.helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \ -n kfuse \ --version <VERSION> \ (1) -f custom-values.yaml1 Replace <VERSION>with a valid Kloudfuse release value; use the most recent one.
4.0.0
Pre-Upgrade Steps
kubectl delete deployment -n <namespace> hydration-service (1)
kubectl delete cm -n <namespace> hydration-service (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
- Envoy Gateway Migration (Optional)
-
Starting in 4.0.0, Kloudfuse supports Envoy Gateway as an alternative to NGINX Ingress. Existing deployments can optionally migrate using a zero-downtime 3-step migration process. See Configure Envoy Ingress for details.
3.5.3
Pre-Upgrade Steps
- Metrics Transformer GOMEMLIMIT
-
Multi-resolution rollup increases memory usage in the metrics transformer. Review and increase the
GOMEMLIMITsetting for the metrics transformer if needed, especially for high-cardinality deployments. - Kafka Rollup Topic
-
Metrics rollup is enabled by default in 3.5.3. Before upgrading, ensure that
kf_metrics_rollup_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_metrics_topic. - RUM Vitals Kafka Topic (Conditional)
-
If RUM is enabled in your deployment, ensure that
kf_rum_vitals_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_rum_views_topic. - Legacy Rollup Interval (Conditional)
-
If you were using a non-default rollup interval before 3.5.3 (for example, 600s instead of the default 300s), set the legacy interval so that pre-3.5.3 rollup data is attributed to the correct resolution:
global: metrics: legacyRollupIntervalSecs: 600yaml
Post-Upgrade Steps
- Multi-Rollup Resolution
-
After the upgrade, wait for the setup-pinot job to complete, then restart the Pinot STS (StatefulSet) to pick up the new rollup schema and table configuration. Existing metrics data is automatically compatible with multi-rollup resolution.
- Pod Security Configuration (Optional)
-
Review your Helm values if you use custom security configurations. All services now run as non-root users by default and support configurable service accounts and security context. Existing deployments without custom security settings are unaffected.
- Scheduled Views
-
Scheduled views have been redesigned. You must recreate your scheduled views after upgrading. Data continuity from the previous implementation cannot be guaranteed.
3.5.2
Post-Upgrade Steps
- Container Image Signature Verification (Optional)
-
Starting with 3.5.2, all Kloudfuse container images and Helm charts are signed. You can optionally verify image signatures before deployment.
- Folder Permissions
-
If you use folder-based organization for dashboards and alerts, review folder permissions after upgrade to ensure appropriate access levels are configured.
- Logs Parser Restart
-
After upgrading to 3.5.2, you must restart the logs-parser to ensure proper functionality.
The following commands use kfuseas the default namespace. Replace with your actual namespace if different.-
Scale down the logs-parser:
kubectl scale sts logs-parser -n kfuse --replicas=0bash -
Verify that all logs-parser pods are terminated before proceeding:
kubectl get pods -n kfuse -l app=logs-parserbash -
Scale up the logs-parser to match the
numNodesvalue configured in your custom values YAML:kubectl scale sts logs-parser -n kfuse --replicas=<numNodes>bash
-
3.4.4 - p1
This guide covers upgrading to version 3.4.4-p1. This guide can be used for upgrading from 3.4.3 to 3.4.4-p1 and for upgrading from 3.4.4 to 3.4.4-p1; the only difference is in Phase 1 configuration.
Important Notes
- Indentation Matters
-
The
kafka-kraftsection must be at the same indentation level as thekafkasection (root level), NOT under theglobalsection. - Disk Size
-
Always copy the persistence disk size from your existing kafka broker to the kafka-kraft broker configuration.
- Version Consistency
-
Use version
3.4.4-p1for all helm upgrade commands throughout the process. - Scripts
-
The following scripts referenced in this guide are available at https://github.com/kloudfuse-ext/customer/tree/main/scripts:
-
pause_consumption.sh- Pauses Pinot consumption on all tables -
resume_consumption.sh- Resumes Pinot consumption on all tables -
get_consuming_segments_info.sh- Gets current status of consuming segments
-
Pre-Upgrade Steps
Update kfuse-vector Configuration
The kfuse-vector component has been renamed to kfuse-archival-vector. If your values.yaml contains a kfuse-vector section, you must rename it before upgrading to 3.4.4:
# Old configuration (3.4.3 and earlier)
kfuse-vector:
<your-configuration>
# New configuration (3.4.4 and later)
kfuse-archival-vector:
<your-configuration>
Phase 1: Deploy Both Legacy Kafka and Kafka-Kraft
Deploy both legacy kafka and kafka-kraft services, but continue using legacy kafka for all operations.
-
Update custom_values.yaml
Add the three legacy flags under the
global.kafkasection:global: kafka: deployLegacy: true useLegacy: true ingesterUseLegacy: trueyaml -
Configure Kafka Services
Add the
kafka-kraftsection at the same indentation level as thekafkasection (NOT underglobal).Ensure that the existing kafka.brokerdisk size is copied tokafka-kraft.broker. For example, if your existing kafka has persistence of 200Gi, copy it to the kafka-kraft section. If you are using a customstorageClassforkafka-brokerinstead ofkfuse-ssd, please include it in thekafka-kraft.brokersection and create a section forkafka-kraft.controllerwithstorageClass.- If upgrading from 3.4.3
-
kafka: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional kafka-kraft: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional #Optional controller: persistence: storageClass: <storage-class-name>yaml
-
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Wait for Deployment
Wait for
kafka-kraft-brokerandkafka-kraft-controllerpods to be up and running, and for the kafka topic creator job to finish.
Phase 2: Switch Ingester to Kafka-Kraft
Switch the ingester to use the new kafka-kraft by removing ingesterUseLegacy from custom_values.yaml.
-
Update custom_values.yaml
Remove only the
ingesterUseLegacyflag from theglobal.kafkasection. Thekafkaandkafka-kraftsections remain unchanged:global: kafka: deployLegacy: true useLegacy: trueyaml -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Check Kafka Consumer Lag
Check kafka consumer lag on
kafka-broker-0by running the below code snippet. Output of the code will show multiple topics with multiple columns; once the values in the lag column is all zero, then move onto the next step.kubectl exec -ti -n kfuse kafka-broker-0 -- bash unset JMX_PORT /opt/bitnami/kafka/bin/kafka-consumer-groups.sh \ --bootstrap-server :9092 --describe --all-groups -
Pause Pinot Consumption
Pause Pinot consumption by first port-forwarding to
pinot-controller-0and then running thepause_consumption.shscript:kubectl port-forward -n kfuse pinot-controller-0 9000:9000Then run the
pause_consumption.shscript. -
Wait for Segment Sealing
Run
get_consuming_segments_info.sh(pinot-controller needs to be port forwarded) to get the current status. To continue with the upgrade the segments need to be sealed, which can be verified if the map for_segmentToConsumingInfoMapelement doesn’t contain any element in{}, as shown below.Example output when segments are sealed:
~/get_consuming_segments_info.sh Fetching realtime tables... Found tables: kf_events_REALTIME kf_logs_REALTIME kf_logs_views_REALTIME kf_metrics_REALTIME kf_metrics_rollup_REALTIME kf_rum_actions_REALTIME kf_rum_errors_REALTIME kf_rum_longtasks_REALTIME kf_rum_resources_REALTIME kf_rum_views_REALTIME kf_traces_REALTIME kf_traces_errors_REALTIME Getting consuming segments info for: kf_events (from kf_events_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs (from kf_logs_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs_views (from kf_logs_views_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}}
Phase 3: Switch All Services to Kafka-Kraft
Switch all other services to use kafka-kraft.
-
Update custom_values.yaml
Remove the
global.kafkaandkafkasections from custom_values.yaml. Only thekafka-kraftsection is needed. The default helm configuration for 3.4.4-p1 already uses the new kafka for all services. -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Re-enable Pinot Consumption
Once the setup-pinot job has completed, re-enable pinot consumption on all tables by running
resume_consumption.sh.
Post-Upgrade Steps
Let the New Kafka-Kraft Bake for 24hrs
After successful migration and a waiting period of 24hrs, the legacy kafka-broker and kafka-zookeeper PVCs should be deleted:
kubectl get pvc -n kfuse | grep kafka-zookeeper
# Add the pvc names for all kafka-zookeeper
kubectl delete pvc data-kafka-zookeeper-0
kubectl get pvc -n kfuse | grep kafka-broker
# Add the pvc names for all kafka-broker instances
kubectl delete pvc data-kafka-broker-0
3.4.3
Pre-Upgrade Steps
If you plan to use GCP Stackdriver metrics and enrichment features, create a GCP service account secret before upgrading.
-
Follow the instructions at GCP Metrics Credentials to create a service account with the required permissions.
-
Create the secret in your Kubernetes cluster:
kubectl create secret generic kfuse-sd-secret \ --from-file=key.json=<path-to-service-account-json> -
Configure the secret name in your
values.yaml:global: gcpConfig: secretName: "kfuse-sd-secret"yaml
3.4.2
Pre-Upgrade Steps
-
Starting with 3.4.2, the AZ service is enabled by default. To ensure a successful upgrade, configure the
cloudStoragesection in yourvalues.yamlfile. -
You can define storage either:
-
At the service level (
pinot.deepStoreoraz-service.cloudStore) -
At the global
cloudStoragesectionService-level settings always take precedence. If both are present, the upgrade continues to work as is. We recommend consolidating into the global
cloudStoragesection for consistency across services.
-
-
Configure the storage backend. Supported types are s3, gcs, and azure:
global:
cloudStorage:
# Supported types: s3, gcs, azure
type: s3
useSecret: true
secretName: cloud-storage-secret
# S3-specific
s3:
region: <specify region>
bucket: <specify bucket>
# GCS-specific
gcs:
bucket: <specify bucket>
# Azure-specific
azure:
container: <specify container>
-
If you use secrets for authentication, create them outside of Kloudfuse using
kubectl:-
S3 – secret must include
accessKeyandsecretKey:kubectl create secret generic cloud-storage-secret \ --from-literal=accessKey=<accessKey> \ --from-literal=secretKey='<secretKey>' -
GCS – secret must include the JSON credentials file (saved as
secretKey):kubectl create secret generic cloud-storage-secret \ --from-file=./secretKey -
Azure – secret must include the storage account
connectionString:kubectl create secret generic cloud-storage-secret \ --from-literal=connectionString=<connectionString>
-
-
If Pinot was previously configured with
deepStore, migrate it:-
Remove the cloud storage configuration from pinot deepStore section
-
Replace
dataDirwithprefixin the service section. -
The bucket name goes to the global config; everything after the bucket path becomes the
prefix.Example: If
dataDirwas:
-
s3://kfuse-bucket/pisco/controller/data
Set:
global:
cloudStorage:
type: s3
s3:
bucket: kfuse-bucket
pinot:
deepStore:
enabled: true
prefix: pisco/controller/data
3.4.0 - p2
There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.4.0.p2.
3.4.0 - p1
Pre-Upgrade Steps
We changed the minion pvc disk size default. To successfully upgrade to this version, delete the Pinot minion StatefulSet and its PVC by running:
kubectl delete sts -n <namespace> pinot-minion (1)
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=minion -n <namespace> (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
3.4.0
Pre-Upgrade and Post-Upgrade Steps
Perform the following check before and after upgrading to ensure the admin user configuration is correct:
-
Verify the admin user configuration in the alerts database:
kubectl exec -it kfuse-configdb-0 -- bin/bash psql -U postgres -d alertsdb select * from public.user where login='admin'; select * from public.user where email='admin@localhost';bashBoth queries should return the same row with
id = 1. If they return different IDs, fix it using the following operations:UPDATE public.user SET id=1 where email='admin@localhost'; DELETE from public.user where id=<ID from the output of the first command>;sqlThen restart Grafana:
kubectl rollout restart deployment kfuse-grafanabash
3.3.0
There are no specific post-upgrade steps for this release.
Pre-Upgrade Steps
-
If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the
values.yamlfile before upgrading.config-mgmt-service: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ng_label operator: In values: - az1 tolerations: - key: "ng_taint" operator: "Equal" value: "az1" effect: "NoSchedule"code -
The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the
globalsection.