Configure the Object Store for Pinot

Configure the Object Store for Pinot on one of these platforms by adding custom configuration to the custom_values.yaml you use with the helm installation of Kloudfuse.

Ensure that the deepstore is in the same region that host the compute instances of the Kloudfuse stack.

Add the following configurations in the custom_values.yaml to use with the helm installation of Kloudfuse.

GCP

Kloudfuse supports two options for Object Store configuration in GCP:

Option 1 Use a Service Account Key

Prerequisites

  1. Download the key from GCP console for the GCP service account.

  2. Create a Kubernetes secret with GCP secret credentials, which allows access to the GCS bucket. Be sure to name the file secretKey.

    kubectl create secret generic cloud-storage-secret --from-file=./secretKey -n kfuse

Helm Values

Add the following values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Service Account Key
global:
  cloudStorage:
    type: gcs
    useSecret: true
    secretName: cloud-storage-secret
    gcs:
      bucket: "REPLACE BUCKET HERE"

pinot:
  deepStore: (1)
    enabled: true
    prefix: "kfuse/controller/data" (2)
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 prefix: folder prefix in the specified bucket for deep storage.

Option 2 Use Google Cloud Workload Identity

Prerequisites

  1. Follow the steps in Authenticate to Google Cloud APIs from GKE workloads to create and associate a service account with the GKE cluster.

  2. Use the following values:

    ROLE_NAME

    roles/storage.admin

    Alternatively, create a custom role with the following permissions:

    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.getIamPolicy

    • storage.objects.list

    • storage.objects.update

    NAMESPACE

    kfuse; create this namespace if you haven’t done this yet.

    Alternatively, use the namespace of the Kloudfuse deployment.

    KSA_NAME

    default

Helm values

Add the following helm values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Google Cloud Workload Identity
global:
  cloudStorage:
    type: gcs
    useSecret: false
    gcs:
      bucket: "REPLACE BUCKET HERE"

pinot:
  deepStore: (1)
    enabled: true
    prefix: "kfuse/controller/data" (2)
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 prefix: folder prefix in the specified bucket for deep storage.

AWS

Pinot must have an IAM policy with read and write permissions to the S3 bucket for deep storage. Currently, Kloudfuse supports these options for consuming this policy:

Option 1 Use an IAM User Secret Access Key

Prerequisites

  1. Refer to the AWS document Create an IAM user in your AWS account for instruction on how to create an IAM user.

  2. Ensure that the user has the IAM policy for reading and writing in the S3 bucket for deep storage.

  3. After creating the IAM user, generate access key credentials.

    Note the values of the access key and secret key.

  4. Create a Kubernetes secret with AWS secret credentials, which allows access to the S3 bucket.

    kubectl create secret generic cloud-storage-secret --from-literal=accessKey='<accessKey>' --from-literal=secretKey='<secretKey>' -n kfuse

Helm Values

  1. Add the following values to the custom_values.yaml file.

    AWS Configuration with IAM User Secret Access Key
    global:
      cloudStorage:
        type: s3
        useSecret: true
        secretName: cloud-storage-secret
        s3:
          region: "REPLACE BUCKET REGION HERE"
          bucket: "REPLACE BUCKET HERE"
    pinot:
      deepStore:  (1)
        enabled: true
        prefix: "kfuse/controller/data" (2)
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 prefix: folder prefix in the specified bucket for deep storage.

Option 2 Attach the IAM policy to the NodeInstanceRole of the EKS Cluster Node Group

  1. Attach the IAM policy to the NodeInstanceRole of the node that runs the Kloudfuse stack. On an EKS console under the corresponding EKS cluster’s node group detail page, under the Node IAM role ARNaccess, access the NodeInstanceRole.

  2. Add the following values in the custom_values.yaml file. Replace the S3 details with your values.

  3. Set both createSecret and useSecret to false.

    AWS Configuration with Attached IAM policy on the NodeInstanceRole of the EKS Cluster Node Group
    global:
      cloudStorage:
        type: s3
        useSecret: false
        s3:
          region: "REPLACE BUCKET REGION HERE"
          bucket: "REPLACE BUCKET HERE"
    pinot:
      deepStore:  (1)
        enabled: true
        prefix: "kfuse/controller/data" (2)
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 prefix: folder prefix in the specified bucket for deep storage.

Option 3 Use a Kubernetes ServiceAccount Resource with an IAM Role

  1. Ensure that the Kubernetes cluster has a ServiceAccount associated with an IAM role with permissions to read and write from S3.

    For information on how to create a ServiceAccount, see AWS documentation on Assign IAM roles to Kubernetes service accounts.

  2. Ensure that pinot is configured to use the ServiceAccount, and that deepStore is configured properly in the custom_values.yaml file.

    Make sure that both useSecret and createSecret are false.

    AWS Configuration with a Kubernetes ServiceAccount Resource that Assumes an IAM Role
    global:
      cloudStorage:
        type: s3
        useSecret: false
        s3:
          region: "REPLACE BUCKET REGION HERE"
          bucket: "REPLACE BUCKET HERE"
    pinot:
      serviceAccountName: <REPLACE SERVICE ACCOUNT NAME HERE>
      deepStore:  (1)
        enabled: true
        prefix: "kfuse/controller/data" (2)
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 prefix: folder prefix in the specified bucket for deep storage.

Azure

Prerequisites

  1. The storage account must be enabled with Azure Data Lake Storage Gen 2.

    When creating a storage account, select Enable hierarchical namespace option in the Advanced section:

    enable storage account keys
  2. Find the connection string by navigating to Access Keys from the left pane of the storage account, in the Security + networking section.

    find access keys
  3. The container refers to the container name. You can create a container by navigating to Containers from the left pane of the storage account, under the Data storage section.

  4. Create a Kubernetes secret with Azure credentials, which allows access to the Azure Data Lake Storage bucket.

    kubectl create secret generic cloud-storage-secret --from-literal=connectionString=<connectionString> -n kfuse

Helm values

Add the following values in the custom_values.yaml file. Replace the Azure Data Lake details with your values.

Azure Configuration for Azure Data Lake Storage Gen 2 with Access Key
global:
  cloudStorage:
    type: azure
    useSecret: true
    secretName: cloud-storage-secret
    azure:
      container: "REPLACE CONTAINER NAME HERE"

pinot:
  deepStore: (1)
    enabled: true
    prefix: "kfuse/controller/data" (2)
yaml
1 deepStore: Enable/disable storing of Pinot segments in deep store.
2 prefix: folder prefix in the specified bucket for deep storage.