Deploy Kloudfuse in a Multi-AZ Kubernetes Cluster
Deploy Kloudfuse across multiple availability zones (multi-AZ) to ensure high availability and fault tolerance. A multi-AZ setup minimizes downtime and maintains observability workflows in the event of a zone failure. Each zone must have at least one pod to run the supported components.
Benefits of Multi-AZ Deployment
-
Protects against zone-level failures
-
Ensures service continuity
-
Balances workload across zones
-
Aligns with SRE best practices
Prerequisites
Before you begin:
-
Kubernetes Cluster Infrastructure
-
This must be a fresh installation. Multi-AZ support is not compatible with upgrades from a single-zone deployment.
-
A Kubernetes cluster with nodes across three availability zones is required.
-
Each zone must have an equal number of nodes, ideally grouped into separate node pools per zone.
-
The total number of nodes must be a multiple of 6.
-
Nodes must not have any additional taints. Kloudfuse performs strict taint validation and will disregard tainted nodes.
-
-
Cloud Managed Postgres
-
Create postgres instances from the cloud
-
AWS RDS
-
GCP Cloud SQL
-
Azure Database for PostgreSQL
-
-
PostgreSQL must run in all three availability zones.
-
Tested version: Postgres 14.11.
-
Create a Kubernetes secret named
kfuse-pg-credentialsthat contains the base64-encoded Postgres password:
-
kubectl create secret generic kfuse-pg-credentials \ --from-literal=postgres=<base64-encoded-password>
-
AWS NLB and Kfuse DNS mapping (on AWS)
-
Use an AWS NLB (Network Load Balancer) for Kloudfuse DNS mapping. Since Elastic IPs are AZ-specific, the DNS for the Kloudfuse endpoint must be a CNAME pointing to the NLB DNS to ensure traffic routing during zone failure.
-
Step 1: Configure Helm Values
In your custom_values.yaml, configure the following fields:
global:
cloudProvider: <aws | gcp | azure>
numNodes: <Total number of nodes across all zones>
multiAzDeployment:
enabled: true
configDB:
host: <Postgres host for configDB>
orchestratorDB:
host: <Postgres host for orchestratorDB>
installKfusePgCredentials: false
This configuration ensures Kloudfuse uses external Postgres and skips deploying its own credentials secret.
Step 2: Disable Embedded PostgreSQL
To use cloud-managed Postgres, disable the internal PostgreSQL services:
ingester:
postgresql:
enabled: false
kfuse-configdb:
enabled: false
Step 3: Automatic Scaling and Anti-Affinity Rules
When multiAzDeployment.enabled is set to true, Kloudfuse automatically:
-
Adjusts
replicaCountfor services based onglobal.numNodes -
Applies pod anti-affinity rules to distribute replicas across availability zones
You do not need to manually set replicaCount for most services.
Service Behavior in Multi-AZ Mode
Services that auto-scale based on numNodes
-
kafka
-
ingester
-
logs-transformer
-
metrics-transformer
-
trace-transformer
-
pinot
-
query-service
-
advance-functions-service
-
events-query-service
-
trace-query-service
-
logs-query-service
-
llm-query-service
-
llm-evaluation-service
-
rum-query-service
-
zapper
-
kfuse-vector
-
kfuse-observability-agent
|
Logs Parser uses the Kafka topic partition count ( |
Services that default to 3 replicas (1 pod per zone)
-
ingress-nginx
-
kafka zookeeper
-
pinot zookeeper
-
redis
-
kfuse-profiling-server (requires cloud storage)
-
ui
-
beffe
Services that always use 1 replica
These components remain single-instance. On zone failure, Kubernetes reschedules the pod to a healthy node in another zone.
-
grafana
-
hydration-service
-
recorder
-
az-service
-
config-mgmt-service
-
rule-manager
-
user-mgmt-service
-
kfuse-auth
-
kfuse-saml
-
kfuse-cloud-exporter (scraper and exporters)
-
kfuse-profiler
Notes
-
Kafka topic partition replication factor and Pinot segment replication factor are automatically managed.
-
Most services calculate replica count automatically. Do not override unless required.
-
Set
global.numNodesto influence replica scaling. -
Ensure your cloud-managed PostgreSQL is accessible from all AZs.