Upgrade Instructions
Helm upgrade command
-
Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster.
-
To check which Kloudfuse version you have, run the following command:
helm list -
Run the
upgradecommand.helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \ -n kfuse \ --version <VERSION> \ (1) -f custom-values.yaml1 Replace <VERSION>with a valid Kloudfuse release value; use the most recent one.
4.0.0
Pre-Upgrade Steps
kubectl delete deployment -n <namespace> hydration-service (1)
kubectl delete cm -n <namespace> hydration-service (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
- Envoy Gateway Migration (Optional)
-
Starting in 4.0.0, Kloudfuse supports Envoy Gateway as an alternative to NGINX Ingress. Existing deployments can optionally migrate using a zero-downtime 3-step migration process. See Configure Envoy Ingress for details.
3.5.3
Pre-Upgrade Steps
- Metrics Transformer GOMEMLIMIT
-
Multi-resolution rollup increases memory usage in the metrics transformer. Review and increase the
GOMEMLIMITsetting for the metrics transformer if needed, especially for high-cardinality deployments. - Kafka Rollup Topic
-
Metrics rollup is enabled by default in 3.5.3. Before upgrading, ensure that
kf_metrics_rollup_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_metrics_topic. - RUM Vitals Kafka Topic (Conditional)
-
If RUM is enabled in your deployment, ensure that
kf_rum_vitals_topicis included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existingkf_rum_views_topic. - Legacy Rollup Interval (Conditional)
-
If you were using a non-default rollup interval before 3.5.3 (for example, 600s instead of the default 300s), set the legacy interval so that pre-3.5.3 rollup data is attributed to the correct resolution:
global: metrics: legacyRollupIntervalSecs: 600yaml
Post-Upgrade Steps
- Multi-Rollup Resolution
-
After the upgrade, wait for the setup-pinot job to complete, then restart the Pinot STS (StatefulSet) to pick up the new rollup schema and table configuration. Existing metrics data is automatically compatible with multi-rollup resolution.
- Pod Security Configuration (Optional)
-
Review your Helm values if you use custom security configurations. All services now run as non-root users by default and support configurable service accounts and security context. Existing deployments without custom security settings are unaffected.
- Scheduled Views
-
Scheduled views have been redesigned. You must recreate your scheduled views after upgrading. Data continuity from the previous implementation cannot be guaranteed.
3.5.2
Post-Upgrade Steps
- Container Image Signature Verification (Optional)
-
Starting with 3.5.2, all Kloudfuse container images and Helm charts are signed. You can optionally verify image signatures before deployment.
- Folder Permissions
-
If you use folder-based organization for dashboards and alerts, review folder permissions after upgrade to ensure appropriate access levels are configured.
- Logs Parser Restart
-
After upgrading to 3.5.2, you must restart the logs-parser to ensure proper functionality.
The following commands use kfuseas the default namespace. Replace with your actual namespace if different.-
Scale down the logs-parser:
kubectl scale sts logs-parser -n kfuse --replicas=0bash -
Verify that all logs-parser pods are terminated before proceeding:
kubectl get pods -n kfuse -l app=logs-parserbash -
Scale up the logs-parser to match the
numNodesvalue configured in your custom values YAML:kubectl scale sts logs-parser -n kfuse --replicas=<numNodes>bash
-
3.4.4 - p1
This guide covers upgrading to version 3.4.4-p1. This guide can be used for upgrading from 3.4.3 to 3.4.4-p1 and for upgrading from 3.4.4 to 3.4.4-p1; the only difference is in Phase 1 configuration.
Important Notes
- Indentation Matters
-
The
kafka-kraftsection must be at the same indentation level as thekafkasection (root level), NOT under theglobalsection. - Disk Size
-
Always copy the persistence disk size from your existing kafka broker to the kafka-kraft broker configuration.
- Version Consistency
-
Use version
3.4.4-p1for all helm upgrade commands throughout the process. - Scripts
-
The following scripts referenced in this guide are available at https://github.com/kloudfuse-ext/customer/tree/main/scripts:
-
pause_consumption.sh- Pauses Pinot consumption on all tables -
resume_consumption.sh- Resumes Pinot consumption on all tables -
get_consuming_segments_info.sh- Gets current status of consuming segments
-
Pre-Upgrade Steps
Update kfuse-vector Configuration
The kfuse-vector component has been renamed to kfuse-archival-vector. If your values.yaml contains a kfuse-vector section, you must rename it before upgrading to 3.4.4:
# Old configuration (3.4.3 and earlier)
kfuse-vector:
<your-configuration>
# New configuration (3.4.4 and later)
kfuse-archival-vector:
<your-configuration>
Phase 1: Deploy Both Legacy Kafka and Kafka-Kraft
Deploy both legacy kafka and kafka-kraft services, but continue using legacy kafka for all operations.
-
Update custom_values.yaml
Add the three legacy flags under the
global.kafkasection:global: kafka: deployLegacy: true useLegacy: true ingesterUseLegacy: trueyaml -
Configure Kafka Services
Add the
kafka-kraftsection at the same indentation level as thekafkasection (NOT underglobal).Ensure that the existing kafka.brokerdisk size is copied tokafka-kraft.broker. For example, if your existing kafka has persistence of 200Gi, copy it to the kafka-kraft section. If you are using a customstorageClassforkafka-brokerinstead ofkfuse-ssd, please include it in thekafka-kraft.brokersection and create a section forkafka-kraft.controllerwithstorageClass.- If upgrading from 3.4.3
-
kafka: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional kafka-kraft: broker: persistence: size: 200Gi storageClass: <storage-class-name> #Optional #Optional controller: persistence: storageClass: <storage-class-name>yaml
-
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Wait for Deployment
Wait for
kafka-kraft-brokerandkafka-kraft-controllerpods to be up and running, and for the kafka topic creator job to finish.
Phase 2: Switch Ingester to Kafka-Kraft
Switch the ingester to use the new kafka-kraft by removing ingesterUseLegacy from custom_values.yaml.
-
Update custom_values.yaml
Remove only the
ingesterUseLegacyflag from theglobal.kafkasection. Thekafkaandkafka-kraftsections remain unchanged:global: kafka: deployLegacy: true useLegacy: trueyaml -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Check Kafka Consumer Lag
Check kafka consumer lag on
kafka-broker-0by running the below code snippet. Output of the code will show multiple topics with multiple columns; once the values in the lag column is all zero, then move onto the next step.kubectl exec -ti -n kfuse kafka-broker-0 -- bash unset JMX_PORT /opt/bitnami/kafka/bin/kafka-consumer-groups.sh \ --bootstrap-server :9092 --describe --all-groups -
Pause Pinot Consumption
Pause Pinot consumption by first port-forwarding to
pinot-controller-0and then running thepause_consumption.shscript:kubectl port-forward -n kfuse pinot-controller-0 9000:9000Then run the
pause_consumption.shscript. -
Wait for Segment Sealing
Run
get_consuming_segments_info.sh(pinot-controller needs to be port forwarded) to get the current status. To continue with the upgrade the segments need to be sealed, which can be verified if the map for_segmentToConsumingInfoMapelement doesn’t contain any element in{}, as shown below.Example output when segments are sealed:
~/get_consuming_segments_info.sh Fetching realtime tables... Found tables: kf_events_REALTIME kf_logs_REALTIME kf_logs_views_REALTIME kf_metrics_REALTIME kf_metrics_rollup_REALTIME kf_rum_actions_REALTIME kf_rum_errors_REALTIME kf_rum_longtasks_REALTIME kf_rum_resources_REALTIME kf_rum_views_REALTIME kf_traces_REALTIME kf_traces_errors_REALTIME Getting consuming segments info for: kf_events (from kf_events_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs (from kf_logs_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}} Getting consuming segments info for: kf_logs_views (from kf_logs_views_REALTIME) {"serversFailingToRespond":0, "serversUnparsableRespond":0, "_segmentToConsumingInfoMap":{}}
Phase 3: Switch All Services to Kafka-Kraft
Switch all other services to use kafka-kraft.
-
Update custom_values.yaml
Remove the
global.kafkaandkafkasections from custom_values.yaml. Only thekafka-kraftsection is needed. The default helm configuration for 3.4.4-p1 already uses the new kafka for all services. -
Run Helm Upgrade
helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1 -
Re-enable Pinot Consumption
Once the setup-pinot job has completed, re-enable pinot consumption on all tables by running
resume_consumption.sh.
Post-Upgrade Steps
Let the New Kafka-Kraft Bake for 24hrs
After successful migration and a waiting period of 24hrs, the legacy kafka-broker and kafka-zookeeper PVCs should be deleted:
kubectl get pvc -n kfuse | grep kafka-zookeeper
# Add the pvc names for all kafka-zookeeper
kubectl delete pvc data-kafka-zookeeper-0
kubectl get pvc -n kfuse | grep kafka-broker
# Add the pvc names for all kafka-broker instances
kubectl delete pvc data-kafka-broker-0
3.4.3
Pre-Upgrade Steps
If you plan to use GCP Stackdriver metrics and enrichment features, create a GCP service account secret before upgrading.
-
Follow the instructions at GCP Metrics Credentials to create a service account with the required permissions.
-
Create the secret in your Kubernetes cluster:
kubectl create secret generic kfuse-gcp-credentials \ --from-file=key.json=<path-to-service-account-json> -
Configure the secret name in your
values.yaml:global: gcpConfig: secretName: "kfuse-gcp-credentials"yaml
3.4.2
Pre-Upgrade Steps
-
Starting with 3.4.2, the AZ service is enabled by default. To ensure a successful upgrade, configure the
cloudStoragesection in yourvalues.yamlfile. -
You can define storage either:
-
At the service level (
pinot.deepStoreoraz-service.cloudStore) -
At the global
cloudStoragesectionService-level settings always take precedence. If both are present, the upgrade continues to work as is. We recommend consolidating into the global
cloudStoragesection for consistency across services.
-
-
Configure the storage backend. Supported types are s3, gcs, and azure:
global:
cloudStorage:
# Supported types: s3, gcs, azure
type: s3
useSecret: true
secretName: cloud-storage-secret
# S3-specific
s3:
region: <specify region>
bucket: <specify bucket>
# GCS-specific
gcs:
bucket: <specify bucket>
# Azure-specific
azure:
container: <specify container>
-
If you use secrets for authentication, create them outside of Kloudfuse using
kubectl:-
S3 – secret must include
accessKeyandsecretKey:kubectl create secret generic cloud-storage-secret \ --from-literal=accessKey=<accessKey> \ --from-literal=secretKey='<secretKey>' -
GCS – secret must include the JSON credentials file (saved as
secretKey):kubectl create secret generic cloud-storage-secret \ --from-file=./secretKey -
Azure – secret must include the storage account
connectionString:kubectl create secret generic cloud-storage-secret \ --from-literal=connectionString=<connectionString>
-
-
If Pinot was previously configured with
deepStore, migrate it:-
Remove the cloud storage configuration from pinot deepStore section
-
Replace
dataDirwithprefixin the service section. -
The bucket name goes to the global config; everything after the bucket path becomes the
prefix.Example: If
dataDirwas:
-
s3://kfuse-bucket/pisco/controller/data
Set:
global:
cloudStorage:
type: s3
s3:
bucket: kfuse-bucket
pinot:
deepStore:
enabled: true
prefix: pisco/controller/data
3.4.0 - p2
There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.4.0.p2.
3.4.0 - p1
Pre-Upgrade Steps
We changed the minion pvc disk size default. To successfully upgrade to this version, delete the Pinot minion StatefulSet and its PVC by running:
kubectl delete sts -n <namespace> pinot-minion (1)
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=minion -n <namespace> (1)
| 1 | Replace <namespace> with the namespace of your Kloudfuse deployment. |
3.4.0
Pre-Upgrade and Post-Upgrade Steps
Perform the following check before and after upgrading to ensure the admin user configuration is correct:
-
Verify the admin user configuration in the alerts database:
kubectl exec -it kfuse-configdb-0 -- bin/bash psql -U postgres -d alertsdb select * from public.user where login='admin'; select * from public.user where email='admin@localhost';bashBoth queries should return the same row with
id = 1. If they return different IDs, fix it using the following operations:UPDATE public.user SET id=1 where email='admin@localhost'; DELETE from public.user where id=<ID from the output of the first command>;sqlThen restart Grafana:
kubectl rollout restart deployment kfuse-grafanabash
3.3.0
There are no specific post-upgrade steps for this release.
Pre-Upgrade Steps
-
If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the
values.yamlfile before upgrading.config-mgmt-service: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ng_label operator: In values: - az1 tolerations: - key: "ng_taint" operator: "Equal" value: "az1" effect: "NoSchedule"code -
The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the
globalsection.
3.2.3
There are no specific post-upgrade steps for this release.
Pre-Upgrade Steps
- Scheduled Views
-
To support the new feature, Scheduled Views, ensure that your
global.kafkaTopicssection in thecustom-values.yamlfile contains the following code:- name: kf_logs_views_topic partitions: 1 replicationFactor: 1text - RUM Applications
-
In this release, we added support for applications in RUM. See Add and Manage Applications.
To successfully migrate existing RUM applications to the Kloudfuse platform, follow these steps during the Kloudfuse Kuberenetes install. Alternatively, contact Support for assistance.
-
Connect to the
configdbpod:k exec -it kfuse-configdb-0 -- /bin/bash PGPASSWORD=env | grep -i PASSWORD | cut -d'=' -f2 psql -U postgres -
Connect to the
rumdbtable:\c rumdb -
Insert the applications manually into the
dbfrom the configyaml:insert into applications (id, name, type, collect_client_ip, client_token) values ('app1_id', 'app1_name', 'app1_type', true/false, 'app1_auth_token'), ('app2_id', 'app2_name', 'app2_type', true/false, 'app2_auth_token);text
-