IOPS Tuning for PVCs on Azure
Kloudfuse relies on Kubernetes Persistent Volume Claims (PVCs) for several stateful components. On Azure, the backing storage is Azure Managed Disks. The disk type and IOPS configuration you choose directly controls the throughput available to each component. Under-provisioned disks are a common cause of ingestion lag, Pinot segment replication delays, and Kafka consumer lag.
|
The Pinot deep store (Azure Blob Storage / ADLS) is separate from managed disk PVCs. This page covers local disk-backed PVCs used by Pinot servers, Kafka brokers, ZooKeeper, and PostgreSQL. |
IOPS-Sensitive Components
| Component | Workload Pattern | Recommendation |
|---|---|---|
Pinot (server, controller) |
High random read; sequential segment writes |
|
Kafka |
Sequential write-heavy; sequential read for replay |
|
PostgreSQL ( |
Random read/write, low volume |
|
ZooKeeper (Pinot, Kafka) |
Small random read/write; latency-sensitive |
|
Azure Managed Disk Types
| Disk Type | Max IOPS | Max Throughput | Use Case |
|---|---|---|---|
|
Up to 2,000 |
Up to 500 MB/s |
Dev/test only. Not suitable for production Pinot or Kafka. |
|
Up to 6,000 |
Up to 750 MB/s |
Light workloads. Use |
|
Up to 20,000 |
Up to 900 MB/s |
Default recommendation for Kafka, PostgreSQL, and ZooKeeper. |
|
Up to 80,000 (provisioned) |
Up to 1,200 MB/s (provisioned) |
Recommended for Pinot servers; decouples IOPS and throughput from disk size. |
|
Up to 400,000 (provisioned) |
Up to 10,000 MB/s (provisioned) |
Maximum performance for large multi-AZ Pinot deployments; highest cost. |
Premium SSD v2 and Ultra Disk allow IOPS and throughput to be provisioned independently of disk size — similar to AWS gp3 and io2. Premium SSD (v1) IOPS scale with the disk tier (P-series size).
|
Prerequisites
The Azure Disk CSI driver must be installed in your AKS cluster. It is required to provision Premium SSD v2 and Ultra Disk volumes and to use the skuName, diskIOPSReadWrite, and diskMBpsReadWrite StorageClass parameters.
# Verify the Azure Disk CSI driver is enabled on your AKS cluster
az aks show \
--name <cluster-name> \
--resource-group <resource-group> \
--query "storageProfile.diskCSIDriver.enabled"
If the CSI driver is not enabled:
az aks update \
--name <cluster-name> \
--resource-group <resource-group> \
--enable-disk-driver
See the Azure Disk CSI driver documentation for details.
|
|
Define StorageClasses
Create StorageClass manifests and reference them in your Helm values.
Premium SSD — Default
Suitable for Kafka, PostgreSQL, and ZooKeeper. IOPS and throughput scale with the P-series tier (disk size).
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kfuse-premium-ssd
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_LRS
cachingMode: ReadOnly
kind: Managed
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
Premium SSD v2 — Provisioned IOPS
For Pinot servers and ZooKeeper that need consistent IOPS independent of disk size.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kfuse-premium-ssd-v2
provisioner: disk.csi.azure.com
parameters:
skuName: PremiumV2_LRS
diskIOPSReadWrite: "6000"
diskMBpsReadWrite: "300"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
Ultra Disk — Maximum Performance
For high-concurrency Pinot deployments where consistent low latency is required at scale.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kfuse-ultra-disk
provisioner: disk.csi.azure.com
parameters:
skuName: UltraSSD_LRS
diskIOPSReadWrite: "16000"
diskMBpsReadWrite: "500"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
|
|
Configure Helm Values
Apply the StorageClasses to each component in your custom_values.yaml:
pinot:
server:
persistence:
storageClass: kfuse-premium-ssd-v2
size: 500Gi
controller:
persistence:
storageClass: kfuse-premium-ssd-v2
size: 100Gi
zookeeper:
persistence:
storageClass: kfuse-premium-ssd-v2
size: 20Gi
kafka:
persistence:
storageClass: kfuse-premium-ssd
size: 200Gi
zookeeper:
persistence:
storageClass: kfuse-premium-ssd
size: 20Gi
kfuse-configdb:
primary:
persistence:
storageClass: kfuse-premium-ssd
size: 50Gi
Use volumeBindingMode: WaitForFirstConsumer in every StorageClass. This ensures Azure Managed Disks are created in the same Availability Zone as the pod, avoiding cross-AZ disk attachment failures.
|
IOPS Sizing Guidelines
For Premium SSD v2, IOPS are provisioned independently of disk size. The baseline is 3,000 IOPS and 125 MB/s; additional IOPS are provisioned in the StorageClass.
For Premium SSD (v1), IOPS scale with the disk tier. Use the P-series size table to back-calculate the minimum disk size for your IOPS target:
| P-series Tier | Disk Size | Max IOPS | Max Throughput |
|---|---|---|---|
P10 |
128 GiB |
500 |
100 MB/s |
P20 |
512 GiB |
2,300 |
150 MB/s |
P30 |
1 TiB |
5,000 |
200 MB/s |
P40 |
2 TiB |
7,500 |
250 MB/s |
P50 |
4 TiB |
7,500 |
250 MB/s |
P60 |
8 TiB |
16,000 |
500 MB/s |
Use these rules of thumb to right-size provisioned IOPS for Premium SSD v2 and Ultra Disk:
| Component | Starting IOPS | When to Increase |
|---|---|---|
Pinot server |
3,000–6,000 |
Segment replication lag, high query latency |
Pinot controller |
3,000 |
Rarely needs more |
Kafka broker |
3,000 |
Consumer lag, high producer throughput (>200 MB/s) |
ZooKeeper |
3,000 |
Watch event storms, leader election delays |
PostgreSQL |
3,000 |
Slow alert rule evaluation, slow config API |
Monitoring Disk Performance
Monitor Azure Managed Disk performance from the Azure Portal under Disks, or query Azure Monitor with the following metrics:
-
Disk Read Operations/Sec/Disk Write Operations/Sec— IOPS consumed -
Disk Read Bytes/Sec/Disk Write Bytes/Sec— throughput consumed -
Disk Read Latency/Disk Write Latency— per-operation latency
A sustained disk read or write latency above 5ms, or IOPS consistently at the provisioned limit, indicates the volume is approaching saturation and IOPS should be increased.
You can also inspect Pinot server lag and segment replication metrics from the Pinot control plane dashboard.