Upgrade Instructions

Helm upgrade command

  1. Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster.

  2. To check which Kloudfuse version you have, run the following command:

    helm list
  3. Run the upgrade command.

    helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \
      -n kfuse \
      --version <VERSION> \ (1)
      -f custom-values.yaml
    1 Replace <VERSION> with a valid Kloudfuse release value; use the most recent one.

Version Specific Instructions

4.0.0

Pre-Upgrade Steps

Hydration Service Deployment change (Required)

K8s deployment type for hydration-service has changed from Deployment to StatefulSet. Before upgrading to 4.0, delete the existing Deployment and ConfigMap.

kubectl delete deployment -n <namespace> hydration-service (1)
kubectl delete cm -n <namespace> hydration-service (1)
1 Replace <namespace> with the namespace of your Kloudfuse deployment.
Envoy Gateway Migration (Optional)

Starting in 4.0.0, Kloudfuse supports Envoy Gateway as an alternative to NGINX Ingress. Existing deployments can optionally migrate using a zero-downtime 3-step migration process. See Configure Envoy Ingress for details.

Post-Upgrade Steps

Pinot STS Rollout Restart

After the upgrade, wait for the setup-pinot job to complete, then perform a rollout restart of the Pinot STS (StatefulSet) to pick up the updated schema and table configuration.

kubectl rollout restart statefulset/pinot-server-offline
kubectl rollout restart statefulset/pinot-server-realtime
kubectl rollout restart statefulset/pinot-broker
kubectl rollout restart statefulset/pinot-controller
kubectl rollout restart statefulset/pinot-minion

3.5.3-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.3-p1.

3.5.3

Pre-Upgrade Steps

Metrics Transformer GOMEMLIMIT

Multi-resolution rollup increases memory usage in the metrics transformer. Review and increase the GOMEMLIMIT setting for the metrics transformer if needed, especially for high-cardinality deployments.

Kafka Rollup Topic

Metrics rollup is enabled by default in 3.5.3. Before upgrading, ensure that kf_metrics_rollup_topic is included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existing kf_metrics_topic.

RUM Vitals Kafka Topic (Conditional)

If RUM is enabled in your deployment, ensure that kf_rum_vitals_topic is included in your Kafka topics list in the Helm values. Specify the same number of partitions as the existing kf_rum_views_topic.

Legacy Rollup Interval (Conditional)

If you were using a non-default rollup interval before 3.5.3 (for example, 600s instead of the default 300s), set the legacy interval so that pre-3.5.3 rollup data is attributed to the correct resolution:

global:
  metrics:
    legacyRollupIntervalSecs: 600
yaml

Post-Upgrade Steps

Multi-Rollup Resolution

After the upgrade, wait for the setup-pinot job to complete, then restart the Pinot STS (StatefulSet) to pick up the new rollup schema and table configuration. Existing metrics data is automatically compatible with multi-rollup resolution.

Pod Security Configuration (Optional)

Review your Helm values if you use custom security configurations. All services now run as non-root users by default and support configurable service accounts and security context. Existing deployments without custom security settings are unaffected.

Scheduled Views

Scheduled views have been redesigned. You must recreate your scheduled views after upgrading. Data continuity from the previous implementation cannot be guaranteed.

3.5.2-p2

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.2-p2.

3.5.2-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.2-p1.

3.5.2

Pre-Upgrade Steps

There are no specific pre-upgrade steps for upgrading to Release 3.5.2.

Post-Upgrade Steps

Container Image Signature Verification (Optional)

Starting with 3.5.2, all Kloudfuse container images and Helm charts are signed. You can optionally verify image signatures before deployment.

Folder Permissions

If you use folder-based organization for dashboards and alerts, review folder permissions after upgrade to ensure appropriate access levels are configured.

Logs Parser Restart

After upgrading to 3.5.2, you must restart the logs-parser to ensure proper functionality.

The following commands use kfuse as the default namespace. Replace with your actual namespace if different.
  1. Scale down the logs-parser:

    kubectl scale sts logs-parser -n kfuse --replicas=0
    bash
  2. Verify that all logs-parser pods are terminated before proceeding:

    kubectl get pods -n kfuse -l app=logs-parser
    bash
  3. Scale up the logs-parser to match the numNodes value configured in your custom values YAML:

    kubectl scale sts logs-parser -n kfuse --replicas=<numNodes>
    bash

3.5.1-p2

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1-p2.

3.5.1-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1-p1.

3.5.1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1.

3.5.0

Pre-Upgrade Steps

To support improved management of RUM session recordings, ensure that your global.kafkaTopics section in the custom-values.yaml file contains the following additional topic:

        - name: kf_rum_expired_sessions_topic
          partitions: 1
          replicationFactor: 1
yaml

3.4.4 - p1

This guide covers upgrading to version 3.4.4-p1. This guide can be used for upgrading from 3.4.3 to 3.4.4-p1 and for upgrading from 3.4.4 to 3.4.4-p1; the only difference is in Phase 1 configuration.

Important Notes

Indentation Matters

The kafka-kraft section must be at the same indentation level as the kafka section (root level), NOT under the global section.

Disk Size

Always copy the persistence disk size from your existing kafka broker to the kafka-kraft broker configuration.

Version Consistency

Use version 3.4.4-p1 for all helm upgrade commands throughout the process.

Scripts

The following scripts referenced in this guide are available at https://github.com/kloudfuse-ext/customer/tree/main/scripts:

  • pause_consumption.sh - Pauses Pinot consumption on all tables

  • resume_consumption.sh - Resumes Pinot consumption on all tables

  • get_consuming_segments_info.sh - Gets current status of consuming segments

Pre-Upgrade Steps

Update kfuse-vector Configuration

The kfuse-vector component has been renamed to kfuse-archival-vector. If your values.yaml contains a kfuse-vector section, you must rename it before upgrading to 3.4.4:

# Old configuration (3.4.3 and earlier)
kfuse-vector:
  <your-configuration>

# New configuration (3.4.4 and later)
kfuse-archival-vector:
  <your-configuration>
yaml

Phase 1: Deploy Both Legacy Kafka and Kafka-Kraft

Deploy both legacy kafka and kafka-kraft services, but continue using legacy kafka for all operations.

  1. Update custom_values.yaml

    Add the three legacy flags under the global.kafka section:

    global:
      kafka:
        deployLegacy: true
        useLegacy: true
        ingesterUseLegacy: true
    yaml
  2. Configure Kafka Services

    Add the kafka-kraft section at the same indentation level as the kafka section (NOT under global).

    Ensure that the existing kafka.broker disk size is copied to kafka-kraft.broker. For example, if your existing kafka has persistence of 200Gi, copy it to the kafka-kraft section. If you are using a custom storageClass for kafka-broker instead of kfuse-ssd, please include it in the kafka-kraft.broker section and create a section for kafka-kraft.controller with storageClass.
    If upgrading from 3.4.3
    kafka:
      broker:
        persistence:
          size: 200Gi
          storageClass: <storage-class-name> #Optional
    
    kafka-kraft:
      broker:
        persistence:
          size: 200Gi
          storageClass: <storage-class-name> #Optional
    
    #Optional
      controller:
        persistence:
          storageClass: <storage-class-name>
    yaml
  3. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  4. Wait for Deployment

    Wait for kafka-kraft-broker and kafka-kraft-controller pods to be up and running, and for the kafka topic creator job to finish.

Phase 2: Switch Ingester to Kafka-Kraft

Switch the ingester to use the new kafka-kraft by removing ingesterUseLegacy from custom_values.yaml.

  1. Update custom_values.yaml

    Remove only the ingesterUseLegacy flag from the global.kafka section. The kafka and kafka-kraft sections remain unchanged:

    global:
      kafka:
        deployLegacy: true
        useLegacy: true
    yaml
  2. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  3. Check Kafka Consumer Lag

    Check kafka consumer lag on kafka-broker-0 by running the below code snippet. Output of the code will show multiple topics with multiple columns; once the values in the lag column is all zero, then move onto the next step.

    kubectl exec -ti -n kfuse kafka-broker-0 -- bash
    unset JMX_PORT
    /opt/bitnami/kafka/bin/kafka-consumer-groups.sh \
      --bootstrap-server :9092 --describe --all-groups
  4. Pause Pinot Consumption

    Pause Pinot consumption by first port-forwarding to pinot-controller-0 and then running the pause_consumption.sh script:

    kubectl port-forward -n kfuse pinot-controller-0 9000:9000

    Then run the pause_consumption.sh script.

  5. Wait for Segment Sealing

    Run get_consuming_segments_info.sh (pinot-controller needs to be port forwarded) to get the current status. To continue with the upgrade the segments need to be sealed, which can be verified if the map for _segmentToConsumingInfoMap element doesn’t contain any element in {}, as shown below.

    Example output when segments are sealed:

    ~/get_consuming_segments_info.sh
    Fetching realtime tables...
    Found tables:
    kf_events_REALTIME
    kf_logs_REALTIME
    kf_logs_views_REALTIME
    kf_metrics_REALTIME
    kf_metrics_rollup_REALTIME
    kf_rum_actions_REALTIME
    kf_rum_errors_REALTIME
    kf_rum_longtasks_REALTIME
    kf_rum_resources_REALTIME
    kf_rum_views_REALTIME
    kf_traces_REALTIME
    kf_traces_errors_REALTIME
    
    Getting consuming segments info for: kf_events
      (from kf_events_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}
    
    Getting consuming segments info for: kf_logs
      (from kf_logs_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}
    
    Getting consuming segments info for: kf_logs_views
      (from kf_logs_views_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}

Phase 3: Switch All Services to Kafka-Kraft

Switch all other services to use kafka-kraft.

  1. Update custom_values.yaml

    Remove the global.kafka and kafka sections from custom_values.yaml. Only the kafka-kraft section is needed. The default helm configuration for 3.4.4-p1 already uses the new kafka for all services.

  2. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  3. Re-enable Pinot Consumption

    Once the setup-pinot job has completed, re-enable pinot consumption on all tables by running resume_consumption.sh.

Post-Upgrade Steps

Let the New Kafka-Kraft Bake for 24hrs

After successful migration and a waiting period of 24hrs, the legacy kafka-broker and kafka-zookeeper PVCs should be deleted:

kubectl get pvc -n kfuse | grep kafka-zookeeper
# Add the pvc names for all kafka-zookeeper
kubectl delete pvc data-kafka-zookeeper-0

kubectl get pvc -n kfuse | grep kafka-broker
# Add the pvc names for all kafka-broker instances
kubectl delete pvc data-kafka-broker-0

3.4.3

Pre-Upgrade Steps

If you plan to use GCP Stackdriver metrics and enrichment features, create a GCP service account secret before upgrading.

  1. Follow the instructions at GCP Metrics Credentials to create a service account with the required permissions.

  2. Create the secret in your Kubernetes cluster:

    kubectl create secret generic kfuse-sd-secret \
      --from-file=key.json=<path-to-service-account-json>
  3. Configure the secret name in your values.yaml:

    global:
      gcpConfig:
        secretName: "kfuse-sd-secret"
    yaml

Post-Upgrade Steps

After upgrading to 3.4.3, perform a rolling restart of all Pinot components to ensure proper initialization:

kubectl rollout restart statefulset -l app=pinot

Verify all Pinot pods are running:

kubectl get pods -l app=pinot

3.4.2-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.4.2-p1.

3.4.2

Pre-Upgrade Steps

  1. Starting with 3.4.2, the AZ service is enabled by default. To ensure a successful upgrade, configure the cloudStorage section in your values.yaml file.

  2. You can define storage either:

    • At the service level (pinot.deepStore or az-service.cloudStore)

    • At the global cloudStorage section

      Service-level settings always take precedence. If both are present, the upgrade continues to work as is. We recommend consolidating into the global cloudStorage section for consistency across services.

  3. Configure the storage backend. Supported types are s3, gcs, and azure:

global:
  cloudStorage:
    # Supported types: s3, gcs, azure
    type: s3
    useSecret: true
    secretName: cloud-storage-secret

    # S3-specific
    s3:
      region: <specify region>
      bucket: <specify bucket>

    # GCS-specific
    gcs:
      bucket: <specify bucket>

    # Azure-specific
    azure:
      container: <specify container>
yaml
  1. If you use secrets for authentication, create them outside of Kloudfuse using kubectl:

    • S3 – secret must include accessKey and secretKey:

      kubectl create secret generic cloud-storage-secret \
        --from-literal=accessKey=<accessKey> \
        --from-literal=secretKey='<secretKey>'
    • GCS – secret must include the JSON credentials file (saved as secretKey):

      kubectl create secret generic cloud-storage-secret \
        --from-file=./secretKey
    • Azure – secret must include the storage account connectionString:

      kubectl create secret generic cloud-storage-secret \
        --from-literal=connectionString=<connectionString>
  2. If Pinot was previously configured with deepStore, migrate it:

    • Remove the cloud storage configuration from pinot deepStore section

    • Replace dataDir with prefix in the service section.

    • The bucket name goes to the global config; everything after the bucket path becomes the prefix.

      Example: If dataDir was:

s3://kfuse-bucket/pisco/controller/data

Set:

global:
  cloudStorage:
    type: s3
    s3:
      bucket: kfuse-bucket

pinot:
  deepStore:
    enabled: true
    prefix: pisco/controller/data
yaml

Post-Upgrade Steps

No additional steps are required after the upgrade.

3.4.1

Pre-Upgrade Steps

  • Update the Pinot configuration in your deployment YAML to use jvmMemory

Post-Upgrade Steps

  • Restart Pinot to apply any configuration changes:

kubectl rollout restart sts -n kfuse pinot-server-realtime pinot-controller pinot-broker
bash

The default namespace is kfuse. If your deployment uses a different namespace, replace kfuse with the appropriate namespace.

3.4.0 - p2

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.4.0.p2.

3.4.0 - p1

Pre-Upgrade Steps

We changed the minion pvc disk size default. To successfully upgrade to this version, delete the Pinot minion StatefulSet and its PVC by running:

kubectl delete sts -n <namespace> pinot-minion (1)
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=minion -n <namespace> (1)
1 Replace <namespace> with the namespace of your Kloudfuse deployment.

Post-Upgrade Steps

After completing the upgrade, run the following command:

kubectl rollout restart deployment -n <namespace> kfuse-grafana (1)
1 Replace <namespace> with the namespace of your Kloudfuse deployment.

3.4.0

Pre-Upgrade and Post-Upgrade Steps

Perform the following check before and after upgrading to ensure the admin user configuration is correct:

  1. Verify the admin user configuration in the alerts database:

    kubectl exec -it kfuse-configdb-0 -- bin/bash
    psql -U postgres -d alertsdb
    
    select * from public.user where login='admin';
    select * from public.user where email='admin@localhost';
    bash

    Both queries should return the same row with id = 1. If they return different IDs, fix it using the following operations:

    UPDATE public.user SET id=1 where email='admin@localhost';
    DELETE from public.user where id=<ID from the output of the first command>;
    sql

    Then restart Grafana:

    kubectl rollout restart deployment kfuse-grafana
    bash

3.3.6

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.6.

3.3.5

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.5.

3.3.4

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.4.

3.3.3

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.3.

3.3.2

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.2.

3.3.1

There are no specific post-upgrade steps for this release.

Pre-Upgrade Steps

If your Kloudfuse configuration has RBAC enabled, you must also enable Audit Logs. Set both feature flags, RBACEnabled and EnableAuditLogs, to true in your yaml configuration file.

Set RBAC and Audit Logs feature flags
global:
...
  RBACEnabled: true
  EnableAuditLogs: true
...
code

3.3.0

There are no specific post-upgrade steps for this release.

Pre-Upgrade Steps

  1. If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the values.yaml file before upgrading.

    config-mgmt-service:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: ng_label
                operator: In
                values:
                - az1
      tolerations:
      - key: "ng_taint"
        operator: "Equal"
        value: "az1"
        effect: "NoSchedule"
    code
  2. The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the global section.