IOPS Tuning for PVCs on AWS

Kloudfuse relies on Kubernetes Persistent Volume Claims (PVCs) for several stateful components. On AWS, the backing storage is Amazon EBS. The volume type and IOPS configuration you choose directly controls the throughput available to each component. Under-provisioned EBS volumes are a common cause of ingestion lag, Pinot segment replication delays, and Kafka consumer lag.

The Pinot deep store (S3) is separate from EBS PVCs. This page covers local EBS-backed PVCs used by Pinot servers, Kafka brokers, ZooKeeper, and PostgreSQL.

IOPS-Sensitive Components

Component Workload Pattern Recommendation

Pinot (server, controller)

High random read; sequential segment writes

gp3 with provisioned IOPS, or io2 for high-concurrency clusters

Kafka

Sequential write-heavy; sequential read for replay

gp3 with increased throughput

PostgreSQL (configdb, orchestratordb)

Random read/write, low volume

gp3 at default settings; increase IOPS for busy clusters

ZooKeeper (Pinot, Kafka)

Small random read/write; latency-sensitive

gp3 or io2

EBS Volume Types

Volume Type Max IOPS Max Throughput Use Case

gp2

3 IOPS/GB, burst to 3,000

250 MB/s

Legacy. Migrate to gp3 — same cost, better baseline performance.

gp3

3,000 baseline; up to 16,000 provisioned

125 MB/s baseline; up to 1,000 MB/s provisioned

Default recommendation for all Kloudfuse components.

io1

Up to 64,000 (Nitro instances)

Up to 1,000 MB/s

High-performance Pinot clusters. IOPS provisioned independently of size.

io2 / io2 Block Express

Up to 256,000 (Block Express)

Up to 4,000 MB/s

Maximum performance; recommended for large multi-AZ Pinot deployments.

gp3 is the recommended starting point for all components. Provision additional IOPS and throughput only if metrics show saturation. io1/io2 carry a higher per-IOPS cost.

Prerequisites

The EBS CSI driver must be installed in your EKS cluster. It is required to provision gp3 and io2 volumes and to use the iops and throughput StorageClass parameters.

# Verify the EBS CSI driver add-on is active
aws eks describe-addon \
  --cluster-name <cluster-name> \
  --addon-name aws-ebs-csi-driver \
  --query 'addon.status' --output text

If the add-on is not installed:

aws eks create-addon \
  --cluster-name <cluster-name> \
  --addon-name aws-ebs-csi-driver \
  --service-account-role-arn <ebs-csi-irsa-role-arn>

See the AWS EBS CSI driver documentation for IRSA role setup.

Define StorageClasses

Create StorageClass manifests and reference them in your Helm values.

gp3 — Default

Provides 3,000 IOPS and 125 MB/s out of the box at no extra cost over gp2.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: kfuse-gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
yaml

gp3 — High IOPS

For Pinot servers and ZooKeeper that need more than the 3,000 baseline IOPS.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: kfuse-gp3-high-iops
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "500"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
yaml

io2 — Provisioned IOPS

For high-concurrency Pinot deployments or multi-AZ clusters where consistent low latency is required.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: kfuse-io2
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iops: "16000"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
yaml

io2 volumes incur a per-IOPS charge. Set iops to the minimum required for your workload. Over-provisioning IOPS on io2 significantly increases cost.

Configure Helm Values

Apply the StorageClasses to each component in your custom_values.yaml:

pinot:
  server:
    persistence:
      storageClass: kfuse-gp3-high-iops
      size: 500Gi
  controller:
    persistence:
      storageClass: kfuse-gp3-high-iops
      size: 100Gi
  zookeeper:
    persistence:
      storageClass: kfuse-gp3-high-iops
      size: 20Gi

kafka:
  persistence:
    storageClass: kfuse-gp3
    size: 200Gi
  zookeeper:
    persistence:
      storageClass: kfuse-gp3
      size: 20Gi

kfuse-configdb:
  primary:
    persistence:
      storageClass: kfuse-gp3
      size: 50Gi
yaml
Use volumeBindingMode: WaitForFirstConsumer in every StorageClass. This ensures EBS volumes are created in the same Availability Zone as the pod, avoiding cross-AZ EBS attachment failures.

IOPS Sizing Guidelines

For gp3, the baseline of 3,000 IOPS and 125 MB/s is free. Additional IOPS above 3,000 are charged at ~$0.005 per IOPS-month, and throughput above 125 MB/s at ~$0.04 per MB/s-month.

Use these rules of thumb to size gp3 provisioned IOPS:

Component Starting IOPS When to Increase

Pinot server

3,000–6,000

Segment replication lag, high query latency

Pinot controller

3,000

Rarely needs more

Kafka broker

3,000

Consumer lag, high producer throughput (>200 MB/s)

ZooKeeper

3,000

Watch event storms, leader election delays

PostgreSQL

3,000

Slow alert rule evaluation, slow config API

For io2, IOPS are provisioned 1:1. A 500 GB io2 volume can be provisioned with up to 64,000 IOPS (on Nitro instances), fully independent of disk size.

Migrating from gp2 to gp3

If your cluster was installed with the default gp2 StorageClass, migrate existing PVCs to gp3 to gain the higher baseline without additional cost.

# Modify an existing EBS volume in-place (no downtime required)
aws ec2 modify-volume \
  --volume-id <vol-id> \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125
# Monitor modification progress
aws ec2 describe-volumes-modifications \
  --volume-id <vol-id> \
  --query 'VolumesModifications[0].ModificationState'

New PVCs created after updating the default StorageClass will automatically use gp3.

Monitoring EBS Performance

Monitor EBS volume performance from the AWS Console under EC2 → Volumes, or use CloudWatch metrics:

  • VolumeReadOps / VolumeWriteOps — IOPS consumed

  • VolumeReadBytes / VolumeWriteBytes — throughput consumed

  • VolumeThroughputPercentage — percentage of provisioned throughput in use

  • VolumeConsumedReadWriteOps — for io1/io2, consumed vs provisioned IOPS

A sustained VolumeThroughputPercentage above 80% indicates the volume is approaching its limit and IOPS or throughput should be increased.

You can also inspect Pinot server lag and segment replication metrics from the Pinot control plane dashboard.