AWS CloudWatch Metrics Integration

Table of Contents

Configure AWS Kinesis Firehose
Configure AWS CloudWatch Metrics Stream
Enable AutoScaling Group Metrics
Enable Collection of Request Metrics in S3
Enable Enrichment of AWS Metrics
AWS Namespace Enrichment
Reduce Cost of Metrics Ingestion

Configure AWS Kinesis Firehose

Use different Firehose accounts for logs and metrics.

Create a new delivery stream in the account that emits the metrics, in the Kinesis Firehose AWS console.

Specify the following attribute values:

Source: Direct PUT

Destination: HTTP Endpoint

Destination settings

Provide the external-facing endpoint of the Kloudfuse cluster as part of the following URL address format:

https://<external facing endpoint of Kfuse cluster>/ingester/kinesis/metrics

Access token key: Provide when required

Content encoding

GZIP

Provide an existing S3 bucket, or create a new one for storing Kinesis records as a backup.

Backing up only failed data should be sufficient.
Change the name of the stream, as necessary.

Configure AWS CloudWatch Metrics Stream

In the account that emits the metrics, in the Cloudwatch AWS console, navigate to the Metrics section on the left side of the console, select Streams, and create a new metric stream.

Select the metric namespaces to send to the stream; the default is all metrics.
In the configuration section, select an existing Firehose owned by your account, and the select theKinesis Firehose you created earlier.
Under Change Output Format, make sure to select JSON for the output format.
Change the name of the stream if necessary.

Enable AutoScaling Group Metrics

Perform these steps in the account that emits the metrics.

Open the Amazon EC2 console.
Choose Auto Scaling Groups from the navigation pane.
Enable the checkbox next to your Auto Scaling group.

A split pane opens up at the bottom of the page.
On the Monitoring tab, select the Auto Scaling group metrics collection, and enable the checkbox located under Auto Scaling, at the top of the page.

Enable Collection of Request Metrics in S3

In the account that emits the metrics, follow the instructions in AWS documentation for Creating a CloudWatch metrics configuration for all the objects in your bucket.

Enable Enrichment of AWS Metrics

The metrics sent by AWS CloudWatch to the Kinesis Firehose include minimal labels. Kloudfuse enables you to attach more labels and user-defined custom tags to the ingested metrics, from within the AWS console, by scraping AWS.

To enable this enrichment of AWS metrics, follow these steps:

Modify yaml in the global section of the custom-values.yaml file:
```
global:
  enrichmentEnabled:
    - aws
```
yaml

Create IAM scraper role in the AWS account where the services that emit the metrics run.

Attach the following policy, for Kloudfuse to scrape the additional labels from AWS. See AWS documentation Define custom IAM permissions with customer managed policies

Create a scraper role with custom policies

			"Action": [
				"acm:ListCertificates",
				"acm:ListTagsForCertificate",
				"apigateway:GET",
				"athena:ListWorkGroups",
				"athena:ListTagsForResource",
				"autoscaling:DescribeAutoScalingGroups",
				"bedrock:ListFoundationModels",
				"bedrock:ListTagsForResource",
				"cloudwatch:ListMetrics",
				"cloudwatch:GetMetricStatistics",
				"dynamodb:ListTables",
				"dynamodb:DescribeTable",
				"dynamodb:ListTagsOfResource",
				"ec2:DescribeInstances",
				"ec2:DescribeInstanceStatus",
				"ec2:DescribeSecurityGroups",
				"ec2:DescribeNatGateways",
				"ec2:DescribeVolumes",
				"ecs:ListClusters",
				"ecs:ListContainerInstances",
				"ecs:ListServices",
				"ecs:DescribeContainerInstances",
				"ecs:DescribeServices",
				"ecs:ListTagsForResource",
				"elasticache:DescribeCacheClusters",
				"elasticache:DescribeServerlessCaches",
				"elasticache:ListTagsForResource",
				"elasticfilesystem:DescribeFileSystems",
				"elasticfilesystem:DescribeBackupPolicy",
				"elasticloadbalancing:DescribeTags",
				"elasticloadbalancing:DescribeLoadBalancers",
				"es:ListDomainNames",
				"es:DescribeDomains",
				"es:ListTags",
				"events:ListRules",
				"events:ListTagsForResource",
				"events:ListEventBuses",
				"firehose:DescribeDeliveryStream",
				"firehose:ListDeliveryStreams",
				"firehose:ListTagsForDeliveryStream",
				"fsx:DescribeFileSystems",
				"fsx:ListTagsForResource",
				"glue:ListJobs",
				"glue:GetTags",
				"kafka:ListTagsForResource",
				"kafka:ListClustersV2",
				"kinesis:ListStreams",
				"kinesis:ListTagsForStream",
				"kinesis:DescribeStream",
				"lambda:GetPolicy",
				"lambda:List*",
				"lambda:ListTags",
				"logs:DescribeLogGroups",
				"logs:ListTagsForResource",
				"logs:ListTagsLogGroup",
				"mq:ListBrokers",
				"mq:DescribeBroker",
				"mediaconvert:ListQueues",
				"mediaconvert:ListTagsForResource",
				"qbusiness:ListApplications",
				"qbusiness:GetApplication",
				"qbusiness:ListTagsForResource",
				"rds:DescribeDBInstances",
				"rds:DescribeDBClusters",
				"rds:ListTagsForResource",
				"rds:DescribeEvents",
				"redshift:DescribeClusters",
				"redshift:DescribeTags",
				"route53:ListHealthChecks",
				"route53:ListTagsForResource",
				"s3:ListAllMyBuckets",
				"s3:GetBucketTagging",
				"ses:ListConfigurationSets",
				"ses:GetConfigurationSet",
				"ses:ListTagsForResource",
				"sns:ListTagsForResource",
				"sns:ListTopics",
				"sqs:ListQueues",
				"sqs:ListQueueTags",
				"states:ListStateMachines",
				"states:ListActivities",
				"states:ListTagsForResource",
				"timestream:ListDatabases",
				"timestream:ListTables",
				"timestream:DescribeDatabase",
				"timestream:DescribeTable",
				"timestream:ListTagsForResource",
				"wafv2:ListWebACLs",
				"wafv2:ListRuleGroups",
				"wafv2:ListTagsForResource",
				"cloudfront:ListDistributions",
				"cloudfront:GetDistribution",
				"cloudfront:ListTagsForResource"
			]

yaml

Modify the Trust Relationship for the policy of the scrape role ARN to add the node-group (Node IAM Role ARN), in which Kloudfuse is running on, as the Principal on the Account.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::ACCOUNT-NUMBER:role/eksctl-XXXXX-nodegroup-ng-XXXXXX-NodeInstanceRole-XXXXXXXXXX"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

yaml

Ensure that the permissions map to the node pool that hosts the EKS cluster for Kloudfuse.

Specify the AWS namespaces to scrape. Starting in 4.0.0, the awsNamespaces list defaults to an empty list, so you must explicitly enumerate the namespaces you want to scrape.

Namespace values use the format AWS/<ServiceName> as defined by AWS. For example, a typical production deployment monitoring compute, database, storage, and serverless workloads would include:

ingester:
  config:
    awsNamespaces:
      - "AWS/EC2"           # EC2 instances
      - "AWS/AutoScaling"   # Auto Scaling groups
      - "AWS/EBS"           # EBS volumes
      - "AWS/RDS"           # RDS databases
      - "AWS/ElastiCache"   # ElastiCache (Redis and Memcache)
      - "AWS/Lambda"        # Lambda functions
      - "AWS/ApplicationELB" # Application Load Balancers
      - "AWS/NetworkELB"    # Network Load Balancers
      - "AWS/S3"            # S3 buckets
      - "AWS/SQS"           # SQS queues
      - "AWS/ECS"           # ECS clusters and services

yaml

For the full list of supported namespaces, see AWS Services.

Enable Kloudfuse to consume the new role; there are two approaches — through AWS credentials or through Role ARNs — described in the tabs below.

AWS credentials

Add your AWS credentials as a secret, and use the secret in the ingester config.

Retrieve your aws credentials; see Configure tool authentication with AWS.

In the Kloudfuse namespace, create a kube secret name aws-access-key, with keys accessKey and secretKey.

kubectl create secret generic aws-access-key \
  --from-literal=accessKey=<AWS_ACCESS_KEY_ID> \
  --from-literal=secretKey=<AWS_SECRET_ACCESS_KEY>

Specify the secretName in the custom-values.yaml file.

ingester:
  config:
    awsScraper:
      secretName: aws-access-key

yaml

To restrict scraping to specific namespaces or regions, add the following to custom-values.yaml:

ingester:
  config:
    awsScraper:
      secretName: aws-access-key
      namespaces:
        - <add namespace>
      regions:
        - <add region>

yaml

Role ARNs

Add Role ARNs in the ingester config: This option enables you to scrape multiple AWS accounts.

Add the scraper Role ARNs that you created with the new permissions to the awsRoleArns list to your custom-values.yaml file.
```
ingester:
  config:
    awsRoleArns:
      - role: <ADD ROLE ARN HERE>
```
yaml

To restrict scraping to specific namespaces or regions, add the following to custom-values.yaml:

ingester:
  config:
    awsRoleArns:
      role: <ADD ROLE ARN HERE>
      namespaces:
        - <add namespace>
      regions:
        - <add region>

yaml

For Global Services like AWS CloudFront, a scraper role in the us-east-1 region is required. Ensure that you create the scraper role in us-east-1 and also configure a Firehose delivery stream and CloudWatch Metric stream in the us-east-1 region for global services.

ingester:
  config:
    awsRoleArns:
      - role: <ADD US-EAST-1 ROLE ARN HERE>
        regions:
          - us-east-1

yaml

To modify the node-group IAM role where Kloudfuse Platform runs, add the following permissions policy to the node-group (Node IAM Role ARN) to assume the role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": <REPLACE SCRAPER ROLE ARN HERE>
        }
    ]
}

yaml

Complete a helm upgrade to save the changes.

helm upgrade --create-namespace --install kfuse . -f <custom_values.yaml>

AWS Namespace Enrichment

Kloudfuse enriches metadata for metrics from the following AWS services. Each service includes all AWS tags plus the specific metadata fields listed below:

The awsNamespaces Value column lists the value to add under the awsNamespaces variable in the ingester section of your custom-values.yaml cluster setup file. Note that some service display names differ from the AWS CloudWatch namespace — for example, OpenSearch uses AWS/ES.

AWS Service awsNamespaces Value Enriched Metadata Fields

AutoScaling

AWS/AutoScaling

All AWS tags on the AutoScaling Group

Firehose

AWS/Firehose

All AWS tags on the Delivery Stream

RDS

AWS/RDS

For DB Instances:

allocatedstoragegb, availability_zone, backupretentionperioddays
dbinstancearn, dbinstanceidentifier, dbinstanceclass, dbiresourceid, dbname
engine, engineversion, multiaz, networktype, publicly_accessible
secondary_availability_zone, storagetype
host (DbiResourceId), hostname (Endpoint Address)
All AWS tags

For DB Clusters:

allocatedstoragegb, availability_zones, backupretentionperioddays
dbclusterarn, dbclusterresourceid, databasename
engine, enginemode, engineversion, global_write_forwarding_status
multiaz, networktype, storagetype
All AWS tags

EKS

AWS/EKS

arn, cluster_name, endpoint, platform_version, role_arn, status, kube_server_version
All AWS tags

EBS

AWS/EBS

availability_zone, multiattachenabled, outpostarn, size, snapshotid, state
throughput, volumeid, volume_type, volume_name
device (if attached)
All AWS tags

EC2

AWS/EC2

For Instances:

availability_zone, image_id, instance_id, instance_type, kernel
iam_profile (ARN), host (instance ID), autoscaling_group, service
All AWS tags

For NAT Gateways:

natgatewayid
All AWS tags

ELB

AWS/ELB

canonicalhostedzonename, canonicalhostedzonenameid, dnsname, loadbalancername, scheme, vpcid
host (CanonicalHostedZoneName), hostname (CanonicalHostedZoneName), name (LoadBalancerName)
All AWS tags

AWS/AmazonMQ

brokerarn, brokerid, brokername, brokerstate, deploymentmode
enginetype, engineversion, hostinstancetype, storagetype
All AWS tags

AWS/S3

All AWS tags on the S3 Bucket

EFS

AWS/EFS

filesystemarn, name
aws_elasticfilesystem_default_backup (enabled/disabled)
All AWS tags

ELBv2

AWS/ApplicationELB, AWS/NetworkELB, AWS/GatewayELB

loadbalancerarn, name (LoadBalancerName), host (DNSName)
All AWS tags

ELBv2 covers Application (ALB), Network (NLB), and Gateway (GWLB) load balancers, each under its own CloudWatch namespace. Include only the namespaces for the load balancer types you run.

ACM

AWS/CertificateManager

All AWS tags on the Certificate

ElastiCache

AWS/ElastiCache

For ElastiCache Clusters:

cache_node_type, name (CacheClusterId), engine, engine_version
preferred_availability_zone, replication_group
All AWS tags

For Serverless Caches:

name (ServerlessCacheName), engine, status, create_time

CloudFront

AWS/CloudFront

All AWS tags on Distributions

Route53

AWS/Route53

All AWS tags on Health Checks

SNS

AWS/SNS

All AWS tags on Topics

Redshift

AWS/Redshift

All AWS tags on Clusters

OpenSearch

AWS/ES

elasticsearch_version (EngineVersion), name (DomainName), dedicated_master_enabled
instance_type, zone_awareness_enabled, ebs_enabled
All AWS tags

SQS

AWS/SQS

All AWS tags on Queues

Lambda

AWS/Lambda

function_arn, functionname, memory_size, runtime
architecture (first in list), storage_size (EphemeralStorage Size)
All AWS tags

DynamoDB

AWS/DynamoDB

All AWS tags on Tables

ApiGateway

AWS/ApiGateway

apiid (Id)
All AWS tags

ApiGatewayV2

AWS/ApiGateway

apiname (Name)
All AWS tags

Glue

AWS/Glue

All AWS tags on Jobs

Athena

AWS/Athena

All AWS tags on WorkGroups

ECS

AWS/ECS

For Clusters and Services:

All AWS tags

EventBridge

AWS/Events

For Rules and Event Buses:

All AWS tags

Kafka

AWS/Kafka

All AWS tags on Clusters

Kinesis

AWS/Kinesis

All AWS tags on Streams

Logs

AWS/Logs

All AWS tags on Log Groups

WAF

AWS/WAFV2

For Web ACLs and Rule Groups:

All AWS tags

FSx

AWS/FSx

generation (1 or 2, for ONTAP only), file_system_type
All AWS tags

Bedrock

AWS/Bedrock

For Foundation Models:

model_name, provider_name
All AWS tags

QBusiness

AWS/QBusiness

For Applications:

display_name
All AWS tags

MediaConvert

AWS/MediaConvert

For Queues:

name, status, type
All AWS tags on the Queue

States (Step Functions)

AWS/States

For State Machines:

name, type
All AWS tags on the State Machine

For Activities:

name
All AWS tags on the Activity

Timestream

AWS/Timestream

For Databases:

database_name, arn
All AWS tags on the Database

For Tables:

table_name, database_name, arn
All AWS tags on the Table

SES

AWS/SES

For Configuration Sets:

name (ConfigurationSetName)
tls_policy (DeliveryOptions.TlsPolicy)
sending_enabled (SendingOptions.SendingEnabled)
All AWS tags on the Configuration Set

Reduce Cost of Metrics Ingestion

AWS CloudWatch metrics ingestion can be a high-cost operation. The driving factor here is the AWS CW:MetricsStreamUsage attribute, especially the MetricsUpdate statistical aggregate.

To reduce the cost of operating CloudWatch metrics ingestion, consider these factors:

Volume of Ingested Metrics: Control this by sending only the necessary Namespaces and metrics to the stream.

In other words, avoid selecting All Namespaces and All Metrics when configuring ingestion.

Some namespaces are very costly when deriving metrics. These include AWS NLB and AWS Lambda because they feature both a high volume of metrics, and multiple dimensions.
Data Retention: Our research indicates that you should modify the retention period of the CloudWatch metrics data by changing the retention setting for the log group of the firehose stream.
Sampling Frequency: The frequency of data sampling by CloudWatch is controlled internally by the AWS CloudWatch implementation.