Pinot Object Storage

Apache Pinot Deep Store is a permanent, persistent storage layer for segment files, acting as a backup and recovery mechanism rather than a real-time query path. It stores compressed segments, allowing servers to recover data, handle node failures, and facilitate cluster scaling by downloading segments from cloud storage (S3, ADLS, GCS) or HDFS.

Ensure that the deep store is set up in the same region that hosts your Kubernetes cluster.

Add the following configurations in the custom_values.yaml file for use with the Helm installation of Kloudfuse.

Kloudfuse supports two options for object store configuration in GCP:

Use a Service Account Key

Prerequisites

  1. Download the service account key from the GCP console.

  2. Create a Kubernetes secret with the GCP credentials to allow access to the GCS bucket. Name the key file secretKey:

    kubectl create secret generic cloud-storage-secret --from-file=./secretKey -n kfuse

Helm values

Add the following values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Service Account Key
global:
  cloudStorage:
    type: gcs
    useSecret: true
    secretName: cloud-storage-secret
    gcs:
      bucket: "REPLACE BUCKET HERE"

pinot:
  deepStore: (1)
    enabled: true
    prefix: "kfuse/controller/data" (2)
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 prefix: Folder prefix in the specified bucket for deep storage.

Use Google Cloud Workload Identity

Prerequisites

  1. Follow the steps in Authenticate to Google Cloud APIs from GKE workloads to create and associate a service account with the GKE cluster.

  2. Use the following values:

    ROLE_NAME

    roles/storage.admin

    Alternatively, create a custom role with the following permissions:

    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.getIamPolicy

    • storage.objects.list

    • storage.objects.update

    NAMESPACE

    kfuse; create this namespace if you have not done so yet.

    Alternatively, use the namespace of the Kloudfuse deployment.

    KSA_NAME

    default

Helm values

Add the following values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Google Cloud Workload Identity
global:
  cloudStorage:
    type: gcs
    useSecret: false
    gcs:
      bucket: "REPLACE BUCKET HERE"

pinot:
  deepStore: (1)
    enabled: true
    prefix: "kfuse/controller/data" (2)
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 prefix: Folder prefix in the specified bucket for deep storage.