Upgrade Kloudfuse
Upgrade command
-
Before performing an upgrade, validate that the upgrade won’t revert any customization on your cluster. See Upgrade validation
-
To check which Kloudfuse version you have, run the following command:
helm list
-
Run the
upgrade
command.helm upgrade --install kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \ -n kfuse \ --version <VERSION> \ -f custom-values.yaml
1 version
: Valid Kloudfuse release value; use the most recent one.
Upgrading to Latest Kloudfuse Releases
3.3.0
There are no specific post-upgrade steps for this release.
Pre-upgrade Steps
-
If your organization runs Kloudfuse on a shared cluster, or if it has the az-service enabled (it has taints and labels), update the following configuration in the
values.yaml
file before upgrading.config-mgmt-service: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ng_label operator: In values: - az1 tolerations: - key: "ng_taint" operator: "Equal" value: "az1" effect: "NoSchedule"
code -
The configuration for label tracking is now part of the global section. If your organization tracks labels, move their definition to the
global
section.
3.2.3
There are no specific post-upgrade steps for this release.
Pre-upgrade
- Scheduled Views
-
To support the new feature, Scheduled Views, ensure that your
global.kafkaTopics
section in thecustom-values.yaml
file contains the following code:- name: kf_logs_views_topic partitions: 1 replicationFactor: 1
text - RUM Applications
-
In this release, we added support for applications in RUM. See Add and Manage Applications.
To successfully migrate existing RUM applications to the Kloudfuse platform, follow these steps during the Kloudfuse Kuberenetes install. Alternatively, contact Kloudfuse Support for assistance.
-
Connect to the
configdb
pod:k exec -it kfuse-configdb-0 -- /bin/bash PGPASSWORD=env | grep -i PASSWORD | cut -d'=' -f2 psql -U postgres
-
Connect to the
rumdb
table:\c rumdb
-
Insert the applications manually into the
db
from the configyaml
:insert into applications (id, name, type, collect_client_ip, client_token) values ('app1_id', 'app1_name', 'app1_type', true/false, 'app1_auth_token'), ('app2_id', 'app2_name', 'app2_type', true/false, 'app2_auth_token);
text
-
3.2.2
We changed the backup disk type of the kfuse-ssd
storage class for AWS from io1
to gp3
. Therefore, if you run Kloudfuse in AWS, you must make adjustments before upgrading to Release 3.2.2.
There are no specific post-upgrade steps for this release.
3.1.0
Pre-Upgrade
Because of the fix for the labels and labelselector
so some of our components can match the rest, you must run this command before upgrading to Release 3.1.0.
kubectl delete deployments.apps catalog-service rulemanager advance-functions-service
Post-Upgrade
-
Restart Pinot Services
kubectl rollout restart sts pinot-broker pinot-controller pinot-server-realtime pinot-server-offline
-
We moved hydration-service (HS) from a deployment to
statefulset
. You must manually delete the pod associated with it.kubectl delete pod hydration-service-<tag>
HS pod now runs under a custom pod name. Use the following clause to fetch it.
(kubectl get pods | grep hydration-service)
2.7.4
Pre-Upgrade
For RBAC, before upgrading to Release 2.7.4 from Release 2.7.3, check for a blank user row; click the Admin tab, and select User Management. The login and email fields are empty, and the record has a random id. Delete that row directly in the UI.
Alternatively, complete these steps in the console:
-
Run the
kfuse-postres.sh
script to enter theconfigdb
shell.#!/usr/bin/env bash # Optional parameters: # 1. pod name - default kfuse-configdb-0 # 2. namespace - default kfuse # 3. database name - default configdb kubectl exec -it ${1:-kfuse-configdb-0} -n ${2:-kfuse} -- bash -c "PGPASSWORD=\$POSTGRES_PASSWORD psql -U postgres -d ${3:-configdb}"
-
Delete users with
null
emails and logins../kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb rbacdb=# DELETE FROM users where email ISNULL and login ISNULL; DELETE 1
Post-Upgrade
Restart Pinot Services.
kubectl rollout restart sts pinot-server-offline
kubectl port-forward --namespace kfuse deployments.apps/trace-query-service 8080:8080
curl -X POST http://localhost:8080/v1/trace/query \
-H "Content-Type: application/json" \
-d '{
"query": "query { refreshServicesInApmStore(lookbackDays: 1) }"
}'
2.7.3
Upgrade to Release 2.7.3:
2.7.2
Pre-Upgrade
This release changes the RBAC implementation.
-
You may see numeric IDs in the email field of the users. To populate Kloudfuse with correct emails, delete all users. Kloudfuse recreates individual users as they log in, with correct email values.
-
Create new groups after completing this step. You can then assign users to groups, policies to users and groups, and so on.
Post-upgrade
-
Connect to
rbacdb
.> ./kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb
-
Make a note of each
user_id
withnull
value that resulted from te RBAC migration.rbacdb=# select id from users where grafana_id=NULL;
-
Clean up empty users in the RBAC database.
rbacdb=# delete from users where grafana_id=NULL;
-
For each
user_id
that you noted earlier, delete the user from the group.rbacdb=# delete from group_members where user_id='<user-id>';
2.7.0
There are no specific post-upgrade steps for this release.
Pre-upgrade
Package upgrades to remove service vulnerabilities.
-
Before
helm
upgrade, run the kafka-upgrade.sh script. Expect some downtime between running the script andhelm
upgrade. -
Edit the
custom_values.yaml
file, and move the block underkafka
to thekafka-broker
section.kafka: broker: <<previous kafka block>>
yaml -
Add these topics to the
kafkaTopics
section to ensure record-replay.kafkaTopics: - name: kf_commands partitions: 1 replicationFactor: 1 - name: kf_recorder_data partitions: 1 replicationFactor: 1
yaml -
Add a
recorder
section with the same affinity and toleration values as theingester
. If empty, don’t add therecorder
section.recorder: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ng_label operator: In values: - amrut tolerations: - key: "ng_taint" operator: "Equal" value: "amrut" effect: "NoSchedule"
yaml -
If you use AWS enrichment, the
config
format in the values changed. See AWS Services. -
Upgrade the stack; see Upgrade command.
2.6.7
Release 2.6.7 introduces Identity for Databases. It takes effect on newly-ingested APM-related data.
We increased timestamp granularity for APM/span data from millisecond to nanosecond, because it provides better accuracy for the Trace Flamegraph and Waterfall visuals.
Pre-upgrade Steps
- SLO
-
We re-enabled SLO in this release, with enhanced features.
-
Enable the
kfuse-postres.sh
script. -
Drop the SLO DB.
> ./kfuse-postgres.sh kfuse-configdb-0 kfuse slodb slodb=# drop table slodbs;
-
- APM
-
You must convert older APM data to Kloudfuse 2.6.5 APM Service Identity format.
APM data ingested before Release 2.6.5 is incompatible, and does not render properly in the APM UI page. You have an option to convert the older data to the current format. The conversion process may take time, depending on the volume of data. When enabled, the conversion runs when Pinot servers start, and load the segments.
-
To enable the conversion, ensure that the
custom_values.yaml
file has the following configuration:pinot: traces: serviceHashConversionEnabled: true traces_errors: serviceHashConversionEnabled: true metrics: serviceHashConversionEnabled: true
text -
Disable the KV Cardinality limit on the Pinot Metrics table.
pinot: metrics: kvTotalCardinalityThreshold: 0
yaml -
Increase the heap allocation for Pinot Server Offline servers. Segment conversion requires memory. Temporarily double the memory for the Pinot server offline in
custom_values.yaml
file.pinot: server: offline: jvmOpts: "<Adjust the Xmx and Xms settings here>"
yaml -
Reduce the
helix
threads to10
.kubectl port-forward -n kfuse pinot-controller-0 9000:9000 curl -X POST "http://localhost:9000/cluster/configs" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"STATE_TRANSITION.maxThreads\": \"10\"}" # Verify using: curl GET "http://localhost:9000/cluster/configs"
-
Run the standard upgrade command using the updated
custom_values.yaml
file. See Upgrade command.
-
Post-upgrade Steps
-
The upgrade includes changes to Pinot table configuration.
Restart Pinot servers to ensure that the configuration is updated.
kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime
-
It takes time to convert all Pinot segments. The table segments status in the Pinot controller UI console should reflect the loaded (converted) segments. Connect to Pinot controller to monitor when all segments are in good state; this is when the conversion is complete.
# Create port-forward to the pinot controller kubectl port-forward -n kfuse pinot-controller-0 9000:9000 # From the browser, go to localhost:9000
-
After conversion finishes, revert the
helix
threads back to the default setting.kubectl port-forward -n kfuse pinot-controller-0 9000:9000 curl -X DELETE "http://localhost:9000/cluster/configs/STATE_TRANSITION.maxThreads" -H "accept: application/json"
-
Revert the cardinality threshold configuration and heap allocation of the Pinot server offline servers in the
custom_values.yaml
file. -
Run the upgrade again. See Upgrade command.
-
In some special cases, you may have to force a re-conversion of segments before the upgrade, delete the pinot-server-offline STS and PVC, and then run the conversion steps. This forces older segments to download from the deep store.
kubectl delete sts -n kfuse pinot-server-offline kubectl delete pvc -l component=server-offline -n kfuse
2.6.6
Pre-upgrade
Kloudfuse introduces a new kfuse-ssd-offline
storage class. By default, it uses:
- gp3
on AWS
- pd-balanced
on GCP
- Standard_LRS
on Azure
If your values.yaml
already defines this class, skip this step.
Delete the existing offline pinot server stateful set and PVCs:
kubectl delete sts -n kfuse pinot-server-offline
kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=server-offline -n kfuse
After the upgrade, Kloudfuse automatically creates PVCs using the updated storage class.
2.5.3
1.3.4
Pre-upgrade
Kfuse services will go offline. |
Migrate old storage class configurations:
./migrate_storage_class.sh
Then verify that PVCs now use the kfuse-ssd
storage class:
kubectl get pvc -n kfuse
Also remove obsolete alerts from Grafana. Delete all alerts in the kloudfuse_alerts
and kubernetes_alerts
folders.
Post-upgrade
Remove legacy credentials from custom_values.yaml
, and delete the kfuse-credentials
secret if present:
config:
AUTH_TYPE: "google"
AUTH_COOKIE_MAX_AGE_IN_SECONDS: 259200
auth:
existingAdminSecret: "kfuse-credentials"
existingSecret: "kfuse-credentials"
Restart pinot servers to apply trace schema changes:
kubectl rollout restart sts -n kfuse pinot-server-realtime
kubectl rollout restart sts -n kfuse pinot-server-offline
1.2.1
Pre-upgrade
To enable advanced monitoring (introduced in version 1.3):
-
Install the Knight agent
-
Configure agent settings as documented
Delete the pinot minion to support retention:
kubectl delete sts -n kfuse pinot-minion
Refresh alerts manually:
-
Go to Alerts → Alert Rules
-
Filter for "Kloudfuse" and "Kubernetes"
-
Delete all matching alerts
1.1.1
Cloud configuration changes
Starting in version 1.2.0, the Helm chart no longer includes aws.yaml
, gcp.yaml
, or azure.yaml
.
You must now define cloud settings in custom_values.yaml
.
You no longer need to pull the chart before installation. Run helm upgrade
directly using the Kloudfuse registry.
Pre-upgrade
Version 1.1.0 introduced a breaking change in PostgreSQL setup. To preserve alerts, back up the database:
kubectl exec -n kfuse alerts-postgresql-0 -- bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dump -U postgres -F c alertsdb' > alertsdb.tar
Post-upgrade
Restore the backup:
kubectl cp -n kfuse alertsdb.tar kfuse-configdb-0:/tmp/alertsdb.tar
kubectl exec -n kfuse kfuse-configdb-0 -- bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_restore -U postgres -Fc --clean --if-exists -d alertsdb < /tmp/alertsdb.tar'
Delete old PVCs:
kubectl delete pvc -n kfuse data-alerts-postgresql-0
kubectl delete pvc -n kfuse data-beffe-postgresql-0
kubectl delete pvc -n kfuse data-fpdb-postgresql-0