AWS CloudWatch Metrics Integration

Configure AWS Kinesis Firehose

Use different Firehose accounts for logs and metrics.

Create a new delivery stream in the account that emits the metrics, in the Kinesis Firehose AWS console.

Specify the following attribute values:

Source

Direct PUT

Destination

HTTP Endpoint

Destination settings

Provide the external-facing endpoint of the Kloudfuse cluster as part of the following URL address format:

https://<external facing endpoint of Kfuse cluster>/ingester/kinesis/metrics
Access token key

Provide when required

Content encoding

GZIP

  1. Provide an existing S3 bucket, or create a new one for storing Kinesis records as a backup.

    Backing up only failed data should be sufficient.

  2. Change the name of the stream, as necessary.

Configure AWS CloudWatch Metrics Stream

In the account that emits the metrics, in the Cloudwatch AWS console, navigate to the Metrics section on the left side of the console, select Streams, and create a new metric stream.

  1. Select the metric namespaces to send to the stream; the default is all metrics.

  2. In the configuration section, select an existing Firehose owned by your account, and the select theKinesis Firehose you created earlier.

  3. Under Change Output Format, make sure to select JSON for the output format.

  4. Change the name of the stream if necessary.

Enable AutoScaling Group Metrics

Perform these steps in the account that emits the metrics.

  1. Open the Amazon EC2 console.

  2. Choose Auto Scaling Groups from the navigation pane.

  3. Enable the checkbox next to your Auto Scaling group.

    A split pane opens up at the bottom of the page.

  4. On the Monitoring tab, select the Auto Scaling group metrics collection, and enable the checkbox located under Auto Scaling, at the top of the page.

Enable Collection of Request Metrics in S3

In the account that emits the metrics, follow the instructions in AWS documentation for Creating a CloudWatch metrics configuration for all the objects in your bucket.

Enable Enrichment of AWS Metrics

The metrics sent by AWS CloudWatch to the Kinesis Firehose include minimal labels. Kloudfuse enables you to attach more labels and user-defined custom tags to the ingested metrics, from within the AWS console, by scraping AWS.

To enable this enrichment of AWS metrics, follow these steps:

  1. Modify yaml in the global section of the custom-values.yaml file:

    global:
      enrichmentEnabled:
        - aws
  2. Create IAM scraper role in the AWS account where the services that emit the metrics run.

    Attach the following policy, for Kloudfuse to scrape the additional labels from AWS. See AWS documentation Define custom IAM permissions with customer managed policies

    Create a scraper role with custom policies
    			"Action": [
    				"acm:ListCertificates",
    				"acm:ListTagsForCertificate",
    				"apigateway:GET",
    				"athena:ListWorkGroups",
    				"athena:ListTagsForResource",
    				"autoscaling:DescribeAutoScalingGroups",
    				"cloudwatch:ListMetrics",
    				"cloudwatch:GetMetricStatistics",
    				"dynamodb:ListTables",
    				"dynamodb:DescribeTable",
    				"dynamodb:ListTagsOfResource",
    				"ec2:DescribeInstances",
    				"ec2:DescribeInstanceStatus",
    				"ec2:DescribeSecurityGroups",
    				"ec2:DescribeNatGateways",
    				"ec2:DescribeVolumes",
    				"ecs:ListClusters",
    				"ecs:ListContainerInstances",
    				"ecs:ListServices",
    				"ecs:DescribeContainerInstances",
    				"ecs:DescribeServices",
    				"ecs:ListTagsForResource",
    				"elasticache:DescribeCacheClusters",
    				"elasticache:ListTagsForResource",
    				"elasticfilesystem:DescribeFileSystems",
    				"elasticfilesystem:DescribeBackupPolicy",
    				"elasticloadbalancing:DescribeTags",
    				"elasticloadbalancing:DescribeLoadBalancers",
    				"es:ListDomainNames",
    				"es:DescribeDomains",
    				"es:ListTags",
    				"events:ListRules",
    				"events:ListTagsForResource",
    				"events:ListEventBuses",
    				"firehose:DescribeDeliveryStream",
    				"firehose:ListDeliveryStreams",
    				"firehose:ListTagsForDeliveryStream",
    				"fsx:DescribeFileSystems",
    				"fsx:ListTagsForResource",
    				"glue:ListJobs",
    				"glue:GetTags",
    				"kafka:ListTagsForResource",
    				"kafka:ListClustersV2",
    				"kinesis:ListStreams",
    				"kinesis:ListTagsForStream",
    				"kinesis:DescribeStream",
    				"lambda:GetPolicy",
    				"lambda:List*",
    				"lambda:ListTags",
    				"logs:DescribeLogGroups",
    				"logs:ListTagsForResource",
    				"logs:ListTagsLogGroup",
    				"mq:ListBrokers",
    				"mq:DescribeBroker",
    				"rds:DescribeDBInstances",
    				"rds:ListTagsForResource",
    				"rds:DescribeEvents",
    				"redshift:DescribeClusters",
    				"redshift:DescribeTags",
    				"route53:ListHealthChecks",
    				"route53:ListTagsForResource",
    				"s3:ListAllMyBuckets",
    				"s3:GetBucketTagging",
    				"sns:ListTagsForResource",
    				"sns:ListTopics",
    				"sqs:ListQueues",
    				"sqs:ListQueueTags",
    				"wafv2:ListWebACLs",
    				"wafv2:ListRuleGroups",
    				"wafv2:ListTagsForResource"
    			]
  3. Modify the Trust Relationship for the policy of the scrape role ARN to add the node-group (Node IAM Role ARN), in which Kloudfuse is running on, as the Principal on the Account.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Statement1",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::ACCOUNT-NUMBER:role/eksctl-XXXXX-nodegroup-ng-XXXXXX-NodeInstanceRole-XXXXXXXXXX"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
  4. Ensure that the permissions map to the node pool that hosts the EKS cluster for Kloudfuse.

  5. Enable Kloudfuse to consume the new role; there are two approaches: through AWS credentials or through Role ARNs.

    • AWS credentials

    • Role ARNs

    Add your AWS credentials as a secret, and use the secret in the ingester config.

    1. Retrieve your aws credentials; see Configure tool authentication with AWS.

    2. In the Kloudfuse namespace, create a kube secret name aws-access-key, with keys accessKey and secretKey.

      kubectl create secret generic aws-access-key --from-literal=accessKey=<AWS_ACCESS_KEY_ID> --from-literal=secretKey=<AWS_SECRET_ACCESS_KEY>
    3. Specify the secretName in the custom-values.yaml file.

      ingester:
        config:
          awsScraper:
            secretName: aws-access-key
    4. By default, Kloudfuse attempts to scrape from all regions and all AWS Services. Customize this by adding the following configuration in the custom-values.yaml file:

      ingester:
        config:
          awsScraper:
            secretName: aws-access-key
            namespaces:
              - <add namespace>
            regions:
              - <add region>

    Add Role ARNs in the ingester config: This option enables you to scrape multiple AWS accounts.

    1. Add the scraper Role ARNs that you created with the new permissions to the awsRoleArns list to your custom-values.yaml file.

      ingester:
        config:
          awsRoleArns:
            - role: <ADD ROLE ARN HERE>
    2. By default, Kloudfuse attempts to scrape from all regions and all AWS Namespaces. Customize this by adding the following configuration in the custom-values.yaml file:

      ingester:
        config:
          awsRoleArns:
            role: <ADD ROLE ARN HERE>
            namespaces:
              - <add namespace>
            regions:
              - <add region>
  6. To modify the node-group IAM role where Kloudfuse Platform runs, add the following permissions policy to the node-group (Node IAM Role ARN) to assume the role.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": <REPLACE SCRAPER ROLE ARN HERE>
            }
        ]
    }
  7. Complete a helm upgrade to save the changes.

    helm upgrade --create-namespace --install kfuse . -f <custom_values.yaml>

Reduce Cost of Metrics Ingestion

AWS CloudWatch metrics ingestion can be a high-cost operation. The driving factor here is the AWS CW:MetricsStreamUsage attribute, especially the MetricsUpdate statistical aggregate.

To reduce the cost of operating CloudWatch metrics ingestion, consider these factors:

Volume of Ingested Metrics

Control this by sending only the necessary Namespaces and metrics to the stream.

In other words, avoid selecting All Namespaces and All Metrics when configuring ingestion.

Some namespaces are very costly when deriving metrics. These include AWS NLB and AWS Lambda because they feature both a high volume of metrics, and multiple dimensions.

Data Retention

Our research indicates that you should modify the retention period of the CloudWatch metrics data by changing the retention setting for the log group of the firehose stream.

Sampling Frequency

The frequency of data sampling by CloudWatch is controlled internally by the AWS CloudWatch implementation.