Upgrade Kloudfuse

Upgrade command

  1. Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster. See Upgrade validation

  2. To check which Kloudfuse version you have, run the following command:

    helm list
  3. Run the upgrade command.

    helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \
      -n kfuse \
      --version <VERSION> \
      -f custom-values.yaml
    1 version: Valid Kloudfuse release value; use the most recent one.

Upgrading to Latest Kloudfuse Releases

3.5.2-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.2-p1.

3.5.2

Pre-upgrade Steps

There are no specific pre-upgrade steps for upgrading to Release 3.5.2.

Post-upgrade Steps

Container Image Signature Verification (Optional)

Starting with 3.5.2, all Kloudfuse container images and Helm charts are signed. You can optionally verify image signatures before deployment.

Folder Permissions

If you use folder-based organization for dashboards and alerts, review folder permissions after upgrade to ensure appropriate access levels are configured.

Logs Parser Restart

After upgrading to 3.5.2, you must restart the logs-parser to ensure proper functionality.

The following commands use kfuse as the default namespace. Replace with your actual namespace if different.
  1. Scale down the logs-parser:

    kubectl scale sts logs-parser -n kfuse --replicas=0
    bash
  2. Verify that all logs-parser pods are terminated before proceeding:

    kubectl get pods -n kfuse -l app=logs-parser
    bash
  3. Scale up the logs-parser to match the numNodes value configured in your custom values YAML:

    kubectl scale sts logs-parser -n kfuse --replicas=<numNodes>
    bash

3.5.1-p2

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1-p2.

3.5.1-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1-p1.

3.5.1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.5.1.

3.5.0

Pre-upgrade Steps

To support improved management of RUM session recordings, ensure that your global.kafkaTopics section in the custom-values.yaml file contains the following additional topic:

        - name: kf_rum_expired_sessions_topic
          partitions: 1
          replicationFactor: 1
yaml

3.4.4 - p1

This guide covers upgrading to version 3.4.4-p1. This guide can be used for upgrading from 3.4.3 to 3.4.4-p1 and for upgrading from 3.4.4 to 3.4.4-p1; the only difference is in Phase 1 configuration.

Important Notes

Indentation Matters

The kafka-kraft section must be at the same indentation level as the kafka section (root level), NOT under the global section.

Disk Size

Always copy the persistence disk size from your existing kafka broker to the kafka-kraft broker configuration.

Version Consistency

Use version 3.4.4-p1 for all helm upgrade commands throughout the process.

Scripts

The following scripts referenced in this guide are available at https://github.com/kloudfuse-ext/customer/tree/main/scripts:

  • pause_consumption.sh - Pauses Pinot consumption on all tables

  • resume_consumption.sh - Resumes Pinot consumption on all tables

  • get_consuming_segments_info.sh - Gets current status of consuming segments

Pre-upgrade Steps

Update kfuse-vector Configuration

The kfuse-vector component has been renamed to kfuse-archival-vector. If your values.yaml contains a kfuse-vector section, you must rename it before upgrading to 3.4.4:

# Old configuration (3.4.3 and earlier)
kfuse-vector:
  <your-configuration>

# New configuration (3.4.4 and later)
kfuse-archival-vector:
  <your-configuration>
yaml

Phase 1: Deploy Both Legacy Kafka and Kafka-Kraft

Deploy both legacy kafka and kafka-kraft services, but continue using legacy kafka for all operations.

  1. Update custom_values.yaml

    Add the three legacy flags under the global.kafka section:

    global:
      kafka:
        deployLegacy: true
        useLegacy: true
        ingesterUseLegacy: true
    yaml
  2. Configure Kafka Services

    Add the kafka-kraft section at the same indentation level as the kafka section (NOT under global).

    Ensure that the existing kafka.broker disk size is copied to kafka-kraft.broker. For example, if your existing kafka has persistence of 200Gi, copy it to the kafka-kraft section. If you are using a custom storageClass for kafka-broker instead of kfuse-ssd, please include it in the kafka-kraft.broker section and create a section for kafka-kraft.controller with storageClass.
    If upgrading from 3.4.3
    kafka:
      broker:
        persistence:
          size: 200Gi
          storageClass: <storage-class-name> #Optional
    
    kafka-kraft:
      broker:
        persistence:
          size: 200Gi
          storageClass: <storage-class-name> #Optional
    
    #Optional
      controller:
        persistence:
          storageClass: <storage-class-name>
    yaml
  3. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  4. Wait for Deployment

    Wait for kafka-kraft-broker and kafka-kraft-controller pods to be up and running, and for the kafka topic creator job to finish.

Phase 2: Switch Ingester to Kafka-Kraft

Switch the ingester to use the new kafka-kraft by removing ingesterUseLegacy from custom_values.yaml.

  1. Update custom_values.yaml

    Remove only the ingesterUseLegacy flag from the global.kafka section. The kafka and kafka-kraft sections remain unchanged:

    global:
      kafka:
        deployLegacy: true
        useLegacy: true
    yaml
  2. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  3. Check Kafka Consumer Lag

    Check kafka consumer lag on kafka-broker-0 by running the below code snippet. Output of the code will show multiple topics with multiple columns; once the values in the lag column is all zero, then move onto the next step.

    kubectl exec -ti -n kfuse kafka-broker-0 -- bash
    unset JMX_PORT
    /opt/bitnami/kafka/bin/kafka-consumer-groups.sh \
      --bootstrap-server :9092 --describe --all-groups
  4. Pause Pinot Consumption

    Pause Pinot consumption by first port-forwarding to pinot-controller-0 and then running the pause_consumption.sh script:

    kubectl port-forward -n kfuse pinot-controller-0 9000:9000

    Then run the pause_consumption.sh script.

  5. Wait for Segment Sealing

    Run get_consuming_segments_info.sh (pinot-controller needs to be port forwarded) to get the current status. To continue with the upgrade the segments need to be sealed, which can be verified if the map for _segmentToConsumingInfoMap element doesn’t contain any element in {}, as shown below.

    Example output when segments are sealed:

    ~/get_consuming_segments_info.sh
    Fetching realtime tables...
    Found tables:
    kf_events_REALTIME
    kf_logs_REALTIME
    kf_logs_views_REALTIME
    kf_metrics_REALTIME
    kf_metrics_rollup_REALTIME
    kf_rum_actions_REALTIME
    kf_rum_errors_REALTIME
    kf_rum_longtasks_REALTIME
    kf_rum_resources_REALTIME
    kf_rum_views_REALTIME
    kf_traces_REALTIME
    kf_traces_errors_REALTIME
    
    Getting consuming segments info for: kf_events
      (from kf_events_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}
    
    Getting consuming segments info for: kf_logs
      (from kf_logs_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}
    
    Getting consuming segments info for: kf_logs_views
      (from kf_logs_views_REALTIME)
    {"serversFailingToRespond":0,
     "serversUnparsableRespond":0,
     "_segmentToConsumingInfoMap":{}}

Phase 3: Switch All Services to Kafka-Kraft

Switch all other services to use kafka-kraft.

  1. Update custom_values.yaml

    Remove the global.kafka and kafka sections from custom_values.yaml. Only the kafka-kraft section is needed. The default helm configuration for 3.4.4-p1 already uses the new kafka for all services.

  2. Run Helm Upgrade

    helm upgrade -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse -f custom_values.yaml --version 3.4.4-p1
  3. Re-enable Pinot Consumption

    Once the setup-pinot job has completed, re-enable pinot consumption on all tables by running resume_consumption.sh.

Post-upgrade Steps

Let the New Kafka-Kraft Bake for 24hrs

After successful migration and a waiting period of 24hrs, the legacy kafka-broker and kafka-zookeeper PVCs should be deleted:

kubectl get pvc -n kfuse | grep kafka-zookeeper
# Add the pvc names for all kafka-zookeeper
kubectl delete pvc data-kafka-zookeeper-0

kubectl get pvc -n kfuse | grep kafka-broker
# Add the pvc names for all kafka-broker instances
kubectl delete pvc data-kafka-broker-0

3.4.3

Pre-upgrade Steps

If you plan to use GCP Stackdriver metrics and enrichment features, create a GCP service account secret before upgrading.

  1. Follow the instructions at GCP Metrics Credentials to create a service account with the required permissions.

  2. Create the secret in your Kubernetes cluster:

    kubectl create secret generic kfuse-sd-secret \
      --from-file=key.json=<path-to-service-account-json>
  3. Configure the secret name in your values.yaml:

    global:
      gcpConfig:
        secretName: "kfuse-sd-secret"
    yaml

Post-upgrade Steps

After upgrading to 3.4.3, perform a rolling restart of all Pinot components to ensure proper initialization:

kubectl rollout restart statefulset -l app=pinot

Verify all Pinot pods are running:

kubectl get pods -l app=pinot

3.4.2-p1

There are no specific pre-upgrade or post-upgrade steps for upgrading to Release 3.4.2-p1.

3.4.2

Pre-upgrade Steps

  1. Starting with 3.4.2, the AZ service is enabled by default. To ensure a successful upgrade, configure the cloudStorage section in your values.yaml file.

  2. You can define storage either:

    • At the service level (pinot.deepStore or az-service.cloudStore)

    • At the global cloudStorage section

      Service-level settings always take precedence. If both are present, the upgrade continues to work as is. We recommend consolidating into the global cloudStorage section for consistency across services.

  3. Configure the storage backend. Supported types are s3, gcs, and azure:

global:
  cloudStorage:
    # Supported types: s3, gcs, azure
    type: s3
    useSecret: true
    secretName: cloud-storage-secret

    # S3-specific
    s3:
      region: <specify region>
      bucket: <specify bucket>

    # GCS-specific
    gcs:
      bucket: <specify bucket>

    # Azure-specific
    azure:
      container: <specify container>
yaml
  1. If you use secrets for authentication, create them outside of Kloudfuse using kubectl:

    • S3 – secret must include accessKey and secretKey:

      kubectl create secret generic cloud-storage-secret \
        --from-literal=accessKey=<accessKey> \
        --from-literal=secretKey='<secretKey>'
    • GCS – secret must include the JSON credentials file (saved as secretKey):

      kubectl create secret generic cloud-storage-secret \
        --from-file=./secretKey
    • Azure – secret must include the storage account connectionString:

      kubectl create secret generic cloud-storage-secret \
        --from-literal=connectionString=<connectionString>
  2. If Pinot was previously configured with deepStore, migrate it:

    • Remove the cloud storage configuration from pinot deepStore section

    • Replace dataDir with prefix in the service section.

    • The bucket name goes to the global config; everything after the bucket path becomes the prefix.

      Example: If dataDir was:

s3://kfuse-bucket/pisco/controller/data

Set:

global:
  cloudStorage:
    type: s3
    s3:
      bucket: kfuse-bucket

pinot:
  deepStore:
    enabled: true
    prefix: pisco/controller/data
yaml

Post-upgrade Steps

No additional steps are required after the upgrade.

3.4.1

Pre-upgrade

  • Update the Pinot configuration in your deployment YAML to use jvmMemory

Post-upgrade

  • Restart Pinot to apply any configuration changes:

kubectl rollout restart sts -n kfuse pinot-server-realtime pinot-controller pinot-broker
bash

The default namespace is kfuse. If your deployment uses a different namespace, replace kfuse with the appropriate namespace.

3.4.0 - p2

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.4.0.p2.

3.4.0 - p1

Pre-upgrade

We changed the minion pvc disk size default. To successfully upgrade to this version, run the following commands:

kubectl delete sts -n <namespace> pinot-minion # <1> (2)
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=minion -n <namespace> # <2> (3)
1 Delete the Kloudfuse Pinot minion
2 The namespace of your Kloudfuse deployment
3 Delete the PVC

Post-upgrade

After completing the upgrade, run the following command:

kubectl rollout restart deployment -n <namespace> kfuse-grafana (1)
1 The namespace of your Kloudfuse deployment

3.4.0

Pre-upgrade and Post-upgrade Steps

Perform the following check before and after upgrading to ensure the admin user configuration is correct:

  1. Verify the admin user configuration in the alerts database:

    kubectl exec -it kfuse-configdb-0 -- bin/bash
    psql -U postgres -d alertsdb
    
    select * from public.user where login='admin';
    select * from public.user where email='admin@localhost';
    bash

    Both queries should return the same row with id = 1. If they return different IDs, fix it using the following operations:

    UPDATE public.user SET id=1 where email='admin@localhost';
    DELETE from public.user where id=<ID from the output of the first command>;
    sql

    Then restart Grafana:

    kubectl rollout restart deployment kfuse-grafana
    bash

3.3.6

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.6.

3.3.5

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.5.

3.3.4

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.4.

3.3.3

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.3.

3.3.2

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.3.2.

3.3.1

There are no specific post-upgrade steps for this release.

Pre-upgrade Steps

If your Kloudfuse configuration has RBAC enabled, you must also enable Audit Logs. Set both feature flags, RBACEnabled and EnableAuditLogs, to true in your yaml configuration file.

Set RBAC and Audit Logs feature flags
global:
...
  RBACEnabled: true
  EnableAuditLogs: true
...
code

3.3.0

There are no specific post-upgrade steps for this release.

Pre-upgrade Steps

  1. If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the values.yaml file before upgrading.

    config-mgmt-service:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: ng_label
                operator: In
                values:
                - az1
      tolerations:
      - key: "ng_taint"
        operator: "Equal"
        value: "az1"
        effect: "NoSchedule"
    code
  2. The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the global section.

3.2.5

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.2.5.

3.2.4

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.2.4.

3.2.3

There are no specific post-upgrade steps for this release.

Pre-upgrade

Scheduled Views

To support the new feature, Scheduled Views, ensure that your global.kafkaTopics section in the custom-values.yaml file contains the following code:

        - name: kf_logs_views_topic
          partitions: 1
          replicationFactor: 1
text
RUM Applications

In this release, we added support for applications in RUM. See Add and Manage Applications.

To successfully migrate existing RUM applications to the Kloudfuse platform, follow these steps during the Kloudfuse Kuberenetes install. Alternatively, contact Kloudfuse Support for assistance.

  1. Connect to the configdb pod:

    k exec -it kfuse-configdb-0 -- /bin/bash
    PGPASSWORD=env | grep -i PASSWORD | cut -d'=' -f2 psql -U postgres
  2. Connect to the rumdb table:

    \c rumdb
  3. Insert the applications manually into the db from the config yaml:

    insert into applications (id, name, type, collect_client_ip, client_token) values  ('app1_id', 'app1_name', 'app1_type', true/false, 'app1_auth_token'), ('app2_id', 'app2_name', 'app2_type', true/false, 'app2_auth_token);
    text

3.2.2

We changed the backup disk type of the kfuse-ssd storage class for AWS from io1 to gp3. Therefore, if you run Kloudfuse in AWS, you must make adjustments before upgrading to Release 3.2.2.

There are no specific post-upgrade steps for this release.

Pre-upgrade

Run the following command:

kubectl delete storageclass kfuse-ssd

On new installs, any PV that you create using kfuse-ssd on AWS has a gp3 type.

In existing installs, kfuse-ssd remains io1. You must manually change the corresponding backed EBS disk to gp3, in the AWS console .

3.2.1

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.2.1.

3.2.0

There are no specific pre-upgrade steps for this release.

Post-upgrade

After the upgrade, restart pinot services:

kubectl rollout restart sts pinot-broker pinot-controller pinot-server-realtime pinot-server-offline

This step takes care of the race condition related to raw index version change.

3.1.3

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 3.1.3.

3.1.2

There are no specific post-upgrade steps for this release.

Pre-Upgrade

Before upgrading to Release 3.1.2, run the following command:

kubectl delete deployments.apps catalog-service rulemanager advance-functions-service

3.1.0

Pre-Upgrade

Because of the fix for the labels and labelselector so some of our components can match the rest, you must run this command before upgrading to Release 3.1.0.

kubectl delete deployments.apps catalog-service rulemanager advance-functions-service

Post-Upgrade

  1. Restart Pinot Services

    kubectl rollout restart sts pinot-broker pinot-controller pinot-server-realtime pinot-server-offline
  2. We moved hydration-service (HS) from a deployment to statefulset. You must manually delete the pod associated with it.

    kubectl delete pod hydration-service-<tag>

    HS pod now runs under a custom pod name. Use the following clause to fetch it.

    (kubectl get pods | grep hydration-service)

2.7.4

Pre-Upgrade

For RBAC, before upgrading to Release 2.7.4 from Release 2.7.3, check for a blank user row; click the Admin tab, and select User Management. The login and email fields are empty, and the record has a random id. Delete that row directly in the UI.

Alternatively, complete these steps in the console:

  1. Run the kfuse-postres.sh script to enter the configdb shell.

    #!/usr/bin/env bash
    
    # Optional parameters:
    # 1. pod name - default kfuse-configdb-0
    # 2. namespace - default kfuse
    # 3. database name - default configdb
    
    kubectl exec -it ${1:-kfuse-configdb-0} -n ${2:-kfuse} -- bash -c "PGPASSWORD=\$POSTGRES_PASSWORD psql -U postgres -d ${3:-configdb}"
  2. Delete users with null emails and logins.

    ./kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb
    
    rbacdb=# DELETE FROM users where email ISNULL and login ISNULL;
    DELETE 1

Post-Upgrade

Restart Pinot Services.

kubectl rollout restart sts pinot-server-offline

kubectl port-forward --namespace kfuse deployments.apps/trace-query-service 8080:8080
curl -X POST http://localhost:8080/v1/trace/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "query { refreshServicesInApmStore(lookbackDays: 1) }"
  }'

2.7.3

Upgrade to Release 2.7.3:

Pre-upgrade

  1. In the custom-values.yaml file, set the value pinot.server.realtime.replicaCount to 0.

    Keep note of the original value of this field. You must set it to the original value later.

  2. Run helm upgrade as usual.

Post-upgrade

  1. Ensure that all pods and jobs are finished successfully.

  2. In the custom-values.yaml file, set the value pinot.server.realtime.replicaCount to its original value.

  3. Run helm upgrade again.

Alternatively, run the following command:

kubectl scale sts pinot-server-realtime --replicas=N

2.7.2

Pre-Upgrade

This release changes the RBAC implementation.

  1. You may see numeric IDs in the email field of the users. To populate Kloudfuse with correct emails, delete all users. Kloudfuse recreates individual users as they log in, with correct email values.

  2. Create new groups after completing this step. You can then assign users to groups, policies to users and groups, and so on.

Post-upgrade

  1. Connect to rbacdb.

    > ./kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb
  2. Make a note of each user_id with null value that resulted from te RBAC migration.

    rbacdb=# select id from users where grafana_id=NULL;
  3. Clean up empty users in the RBAC database.

    rbacdb=# delete from users where grafana_id=NULL;
  4. For each user_id that you noted earlier, delete the user from the group.

    rbacdb=# delete from group_members where user_id='<user-id>';

2.7.1

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 2.7.1.

2.7.0

There are no specific post-upgrade steps for this release.

Pre-upgrade

Package upgrades to remove service vulnerabilities.

  1. Before helm upgrade, run the kafka-upgrade.sh script. Expect some downtime between running the script and helm upgrade.

  2. Edit the custom_values.yaml file, and move the block under kafka to the kafka-broker section.

    kafka:
      broker:
        <<previous kafka block>>
    yaml
  3. Add these topics to the kafkaTopics section to ensure record-replay.

      kafkaTopics:
        - name: kf_commands
          partitions: 1
          replicationFactor: 1
        - name: kf_recorder_data
          partitions: 1
          replicationFactor: 1
    yaml
  4. Add a recorder section with the same affinity and toleration values as the ingester. If empty, don’t add the recorder section.

    recorder:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: ng_label
                operator: In
                values:
                - amrut
      tolerations:
      - key: "ng_taint"
        operator: "Equal"
        value: "amrut"
        effect: "NoSchedule"
    yaml
  5. If you use AWS enrichment, the config format in the values changed. See AWS Services.

  6. Upgrade the stack; see Upgrade command.

2.6.8

There are no specific pre-upgrade or post-upgrade steps for upgrading to the Release 2.6.8.

2.6.7

Release 2.6.7 introduces Identity for Databases. It takes effect on newly-ingested APM-related data.

We increased timestamp granularity for APM/span data from millisecond to nanosecond, because it provides better accuracy for the Trace Flamegraph and Waterfall visuals.

Pre-upgrade Steps

SLO

We re-enabled SLO in this release, with enhanced features.

  1. Enable the kfuse-postres.sh script.

  2. Drop the SLO DB.

    > ./kfuse-postgres.sh kfuse-configdb-0 kfuse slodb
    
    slodb=# drop table slodbs;
APM

You must convert older APM data to Kloudfuse 2.6.5 APM Service Identity format.

APM data ingested before Release 2.6.5 is incompatible, and does not render properly in the APM UI page. You have an option to convert the older data to the current format. The conversion process may take time, depending on the volume of data. When enabled, the conversion runs when Pinot servers start, and load the segments.

  1. To enable the conversion, ensure that the custom_values.yaml file has the following configuration:

    pinot:
      traces:
        serviceHashConversionEnabled: true
      traces_errors:
        serviceHashConversionEnabled: true
      metrics:
        serviceHashConversionEnabled: true
    text
  2. Disable the KV Cardinality limit on the Pinot Metrics table.

    pinot:
      metrics:
        kvTotalCardinalityThreshold: 0
    yaml
  3. Increase the heap allocation for Pinot Server Offline servers. Segment conversion requires memory. Temporarily double the memory for the Pinot server offline in custom_values.yaml file.

    pinot:
      server:
        offline:
          jvmOpts: "<Adjust the Xmx and Xms settings here>"
    yaml
  4. Reduce the helix threads to 10.

    kubectl port-forward -n kfuse pinot-controller-0 9000:9000
    curl -X POST "http://localhost:9000/cluster/configs" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"STATE_TRANSITION.maxThreads\": \"10\"}"
    # Verify using:
    curl GET "http://localhost:9000/cluster/configs"
  5. Run the standard upgrade command using the updated custom_values.yaml file. See Upgrade command.

Post-upgrade Steps

  1. The upgrade includes changes to Pinot table configuration.

    Restart Pinot servers to ensure that the configuration is updated.

    kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime
  2. It takes time to convert all Pinot segments. The table segments status in the Pinot controller UI console should reflect the loaded (converted) segments. Connect to Pinot controller to monitor when all segments are in good state; this is when the conversion is complete.

    # Create port-forward to the pinot controller
    kubectl port-forward -n kfuse pinot-controller-0 9000:9000
    # From the browser, go to localhost:9000
  3. After conversion finishes, revert the helix threads back to the default setting.

    kubectl port-forward -n kfuse pinot-controller-0 9000:9000
    curl -X DELETE "http://localhost:9000/cluster/configs/STATE_TRANSITION.maxThreads" -H "accept: application/json"
  4. Revert the cardinality threshold configuration and heap allocation of the Pinot server offline servers in the custom_values.yaml file.

  5. Run the upgrade again. See Upgrade command.

  6. In some special cases, you may have to force a re-conversion of segments before the upgrade, delete the pinot-server-offline STS and PVC, and then run the conversion steps. This forces older segments to download from the deep store.

    kubectl delete sts -n kfuse pinot-server-offline
    kubectl delete pvc -l component=server-offline -n kfuse

2.6.6

Pre-upgrade

Kloudfuse introduces a new kfuse-ssd-offline storage class. By default, it uses: - gp3 on AWS - pd-balanced on GCP - Standard_LRS on Azure

If your values.yaml already defines this class, skip this step.

Delete the existing offline pinot server stateful set and PVCs:

kubectl delete sts -n kfuse pinot-server-offline
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=server-offline -n kfuse

After the upgrade, Kloudfuse automatically creates PVCs using the updated storage class.

2.5.3

Pre-upgrade

Set the PVC size for Zookeeper pods to 32Gi:

kafka:
  zookeeper:
    persistence:
      size: 32Gi

pinot:
  zookeeper:
    persistence:
      size: 32Gi
yaml

After updating, run resize_pvc.sh.

Post-upgrade

Restart the following services:

kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker logs-parser logs-query-service
kubectl rollout restart deployment -n kfuse logs-transformer trace-transformer trace-query-service

2.5.0

Post-upgrade

This release includes changes to the pinot database. Restart the following services:

kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker logs-parser logs-query-service
kubectl rollout restart deployment -n kfuse logs-transformer

2.2.4

Post-upgrade

The pinot schema has changed. Restart all pinot server components:

kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker

2.2.3

Pre-upgrade

The default pinot zookeeper PVC size is now 32Gi. If your setup uses the default and doesn’t define the size explicitly, update it to 16Gi:

pinot:
  zookeeper:
    persistence:
      size: 16Gi
yaml

2.1.0

Post-upgrade

Alert organization has changed. Manually delete old alert versions:

kubens kfuse
kubectl exec -it catalog-servicexxx -- bash
python3 /catalog_service/catalog.py --remove_installed --list kloudfuse,kloudfuse_alerts,kubernetes_alerts --artifact_type alerts

2.0.1

Post-upgrade

Clean up legacy dashboards provisioned by Kloudfuse:

kubectl -n kfuse exec -it kfuse-configdb-0 -- bash -c "PGDATABASE=alertsdb PGPASSWORD=$POSTGRES_PASSWORD psql -U postgres -c 'delete from dashboard_provisioning where name='''hawkeye-outliers-resources''';"

1.3.4

Pre-upgrade

Kfuse services will go offline.

Migrate old storage class configurations:

./migrate_storage_class.sh

Then verify that PVCs now use the kfuse-ssd storage class:

kubectl get pvc -n kfuse

Also remove obsolete alerts from Grafana. Delete all alerts in the kloudfuse_alerts and kubernetes_alerts folders.

Post-upgrade

Remove legacy credentials from custom_values.yaml, and delete the kfuse-credentials secret if present:

config:
  AUTH_TYPE: "google"
  AUTH_COOKIE_MAX_AGE_IN_SECONDS: 259200
auth:
  existingAdminSecret: "kfuse-credentials"
  existingSecret: "kfuse-credentials"
yaml

Restart pinot servers to apply trace schema changes:

kubectl rollout restart sts -n kfuse pinot-server-realtime
kubectl rollout restart sts -n kfuse pinot-server-offline

1.3.2

Post-upgrade

Version 1.3.3 introduces pinot schema changes. Restart pinot servers:

kubectl rollout restart sts -n kfuse pinot-server-realtime
kubectl rollout restart sts -n kfuse pinot-server-offline

1.2.1

Pre-upgrade

To enable advanced monitoring (introduced in version 1.3):

  1. Install the Knight agent

  2. Configure agent settings as documented

Delete the pinot minion to support retention:

kubectl delete sts -n kfuse pinot-minion

Refresh alerts manually:

  1. Go to Alerts → Alert Rules

  2. Filter for "Kloudfuse" and "Kubernetes"

  3. Delete all matching alerts

1.1.1

Cloud configuration changes

Starting in version 1.2.0, the Helm chart no longer includes aws.yaml, gcp.yaml, or azure.yaml. You must now define cloud settings in custom_values.yaml.

You no longer need to pull the chart before installation. Run helm upgrade directly using the Kloudfuse registry.

Pre-upgrade

Version 1.1.0 introduced a breaking change in PostgreSQL setup. To preserve alerts, back up the database:

kubectl exec -n kfuse alerts-postgresql-0 --  bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dump -U postgres -F c alertsdb' > alertsdb.tar

Post-upgrade

Restore the backup:

kubectl cp -n kfuse alertsdb.tar kfuse-configdb-0:/tmp/alertsdb.tar
kubectl exec -n kfuse kfuse-configdb-0 --  bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_restore -U postgres -Fc --clean --if-exists -d alertsdb < /tmp/alertsdb.tar'

Delete old PVCs:

kubectl delete pvc -n kfuse data-alerts-postgresql-0
kubectl delete pvc -n kfuse data-beffe-postgresql-0
kubectl delete pvc -n kfuse data-fpdb-postgresql-0

1.0.4

Pre-upgrade

Delete old kfuse-ssd-* storage classes:

helm list
kubectl delete storageclass kfuse-ssd-aws kfuse-ssd-aws-gp3 kfuse-ssd-gcp

Then proceed with the standard upgrade steps.