Configure the Object Store for Pinot

Configure the Object Store for Pinot on one of these platforms by adding custom configuration to the custom_values.yaml you use with the helm installation of Kloudfuse.

Ensure that the deepstore is in the same region that host the compute instances of the Kloudfuse stack.

Add the following configurations in the custom_values.yaml to use with the helm installation of Kloudfuse.

GCP

Kloudfuse supports two options for Object Store configuration in GCP:

Option 1: Use a Service Account Key

Prerequisites

  1. Download the key from GCP console for the GCP service account.

  2. Create a Kubernetes secret with GCP secret credentials, which allows access to the GCS bucket. Be sure to name the file secretKey.

    kubectl create secret generic pinot-sd-secret --from-file=./secretKey -n kfuse

Helm Values

Add the following values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Service Account Key
pinot:
    deepStore: (1)
      enabled: true
      type: "gcs"
      useSecret: true
      createSecret: false
      secretName: "pinot-sd-secret"
      dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data" (2)
      gcs:
        projectId: "REPLACE PROJECT ID HERE"
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 dataDir: bucket for deep storage

Option 2: Use Google Cloud Workload Identity

Prerequisites

  1. Follow the steps in Authenticate to Google Cloud APIs from GKE workloads to create and associate a service account with the GKE cluster.

  2. Use the following values:

    ROLE_NAME

    roles/storage.admin

    Alternatively, create a custom role with the following permissions:

    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.getIamPolicy

    • storage.objects.list

    • storage.objects.update

    NAMESPACE

    kfuse; create this namespace if you haven’t done this yet.

    Alternatively, use the namespace of the Kloudfuse deployment.

    KSA_NAME

    default

Helm values

Add the following helm values in the custom_values.yaml file. Replace the GCS details as required.

GCP Configuration with Google Cloud Workload Identity
pinot:
    deepStore: (1)
      enabled: true
      type: "gcs"
      useSecret: false
      createSecret: false
      dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data" (2)
      gcs:
        projectId: "REPLACE PROJECT ID HERE"
yaml
1 deepStore: Enable/disable storage of Pinot segments in deep store.
2 dataDir: bucket for deep storage

AWS

Pinot must have an IAM policy with read and write permissions to the S3 bucket for deep storage. Currently, Kloudfuse supports these options for consuming this policy:

Option 1: Use an IAM User Secret Access Key

  1. Refer to the AWS document Create an IAM user in your AWS account for instruction on how to create an IAM user.

  2. Ensure that the user has the IAM policy for reading and writing in the S3 bucket for deep storage.

  3. After creating the IAM user, generate access key credentials.

    Note the values of the access key and secret key.

  4. Add the following values to the custom_values.yaml file. Replace the S3 details with your values.

  5. Set createSecret and useSecret to true. The accessKey and secretKey are the credentials of the IAM user.

    AWS Configuration with IAM User Secret Access Key
      pinot:
        deepStore: (1)
          enabled: true
          type: "s3"
          useSecret: true (2)
          createSecret: true (3)
          dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data" (4)
          serverSideEncryption: "aws:kms" (5)
          ssekmsKeyId: "" (6)
          ssekmsEncryptionContext: "" (7)
          s3: (8)
            region: "YOUR REGION"
            accessKey: "YOUR AWS ACCESS KEY"
            secretKey: "YOUR AWS SECRET KEY"
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 useSecret: Set to true; typically used when have access to deep store using node-level access credentials. Set to false when don’t need to pass the secret.
    3 createSecret: Set to true; creates a secret with provided credentials.
    4 dataDir: Bucket for deep storage.
    5 serverSideEncryption: (Optional) The server-side encryption algorithm used when storing this object in Amazon S3, aws:kms.
    6 ssekmsKeyId: (Optional) Required when serverSideEncryption=aws:kms. Specifies the AWS KMS key ID to use for object encryption.
    7 ssekmsEncryptionContext: (Optional) Specifies the AWS KMS Encryption Context to use for object encryption. The value of this header is a base64-encoded UTF-8 string holding JSON with the encryption context key-value pairs.
    8 s3: Fill in aws s3 credentials.

Option 2: Attach the IAM policy to the NodeInstanceRole of the EKS Cluster Node Group

  1. Attach the IAM policy to the NodeInstanceRole of the node that runs the Kloudfuse stack. On an EKS console under the corresponding EKS cluster’s node group detail page, under the Node IAM role ARNaccess, access the NodeInstanceRole.

  2. Add the following values in the custom_values.yaml file. Replace the S3 details with your values.

  3. Set both createSecret and useSecret to false.

    AWS Configuration with Attached IAM policy on the NodeInstanceRole of the EKS Cluster Node Group
      pinot:
        deepStore: (1)
          enabled: true
          type: "s3"
          useSecret: false (2)
          createSecret: false (3)
          dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data" (4)
          s3: (5)
            region: "YOUR REGION"
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 useSecret: Set to false when don’t need to pass the secret. Typically used when have access to deep store using node-level access credentials; then set to true.
    3 createSecret: Set to false. If true, creates a secret with provided credentials.
    4 dataDir: Bucket for deep storage.
    5 s3: Fill in aws s3 credentials.

Option 3: Use a Kubernetes ServiceAccount Resource that Assumes an IAM Role

  1. Ensure that the Kubernetes cluster has a ServiceAccount that is associated with an IAM role with permissions to read and write from S3. For information on how to create a ServiceAccount, see AWS documentation on Assign IAM roles to Kubernetes service accounts.

  2. Ensure that pinot is configured to use the ServiceAccount, and that deepStore is configured properly in the custom_values.yaml file. Make sure that both useSecret and createSecret are false.

    AWS Configuration with a Kubernetes ServiceAccount Resource that Assumes an IAM Role
      pinot:
        serviceAccountName: <REPLACE SERVICE ACCOUNT NAME HERE>
        deepStore: (1)
          enabled: true
          type: "s3"
          useSecret: false (2)
          createSecret: false (3)
          dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data" (4)
          s3: (5)
            region: "YOUR REGION"
    yaml
    1 deepStore: Enable/disable storing of Pinot segments in deep store.
    2 useSecret: Set to false when don’t need to pass the secret. Typically used when have access to deep store using node-level access credentials; then set to true.
    3 createSecret: Set to false. If true, creates a secret with provided credentials.
    4 dataDir: Bucket for deep storage.
    5 s3: Fill in aws s3 credentials.

Azure

Prerequisites

  1. The storage account must be enabled with Azure Data Lake Storage Gen 2.

    When creating a storage account, select Enable hierarchical namespace option in the Advanced section:

    enable storage aggount keys
  2. Find the access key by navigating to Access Keys from the left pane of the storage account, in the Security + networking section.

    find access keys
  3. The fileSystemName refers to the container name. You can create a container by navigating to Containers from the left pane of the storage account, under the Data storage section.

Helm values

Add the following values in the custom_values.yaml file. Replace the Azure Data Lake details with your values.

Azure Configuration for Azure Data Lake Storage Gen 2 with Access Key

+

  pinot:
    deepStore: (1)
      enabled: true
      type: "adl2"
      dataDir: "adl2://[REPLACE CONTAINER NAME HERE]/kfuse/controller/data" (2)
      adl2: (3)
        accountName: "YOUR AZURE STORAGE ACCOUNT NAME"
        accessKey: "STORAGE ACCOUNT ACCESS KEY"
        fileSystemName: "STORAGE ACCOUNT CONTAINER NAME"
yaml
1 deepStore: Enable/disable storing of Pinot segments in deep store.
2 dataDir: Bucket for deep storage.
3 adl2: Fill in Azure Data Lake Storage credentials.