Continuous Profiling with Kloudfuse Profiler Server

Continuous Profiling is a powerful addition to the Kloudfuse observability platform. While traditional monitoring methods —- metrics, logs, and tracing — provide valuable insights, they leave gaps in understanding application performance at a granular level. Continuous Profiling addresses this lack of coverage by offering in-depth, line-level insights into the application’s code, exposing the precise details of resource utilization.

This low-overhead feature gathers profiles from production systems and stores them for subsequent analysis. It provides a comprehensive view of the application and its behavior in production, including CPU usage, memory allocation, and disk I/O, and ensures that every line of code operates efficiently.

Benefits of Continuous Profiling

Granular Insights

Continuous Profiling offers a detailed view of application performance that goes beyond traditional observability tools, providing line-level insights into resource utilization.

In-Depth Code Analysis

With a comprehensive understanding of code performance and system interactions, developers can easily identify how specific code segments use resources, facilitating thorough analysis and optimization.

Optimization Opportunities

By pinpointing inefficient lines of code, Continuous Profiling helps address performance bottlenecks and improve resource utilization across all applications.

Effective Capacity Planning

The profiling data supports informed capacity planning and scaling efforts, ensuring that your application can meet growing demands while maintaining optimal performance.

Cost Reduction

By identifying resource spikes in CPU and memory usage, Continuous Profiling aids in optimizing these areas to lower costs.

Configuration

Enable kfuse-profiling in the custom-values.yaml file.

global:
  kfuse-profiling:
    enabled: true

Kloudfuse default configuration saves the data in the PVC with size 50GB.

Long-Term Retention

To retain profiling data for a longer time, change the configuration settings. Depending on the storage provider, configure one of the following options in the custom-values.yaml file: AWS S3 or GCP Bucket. Remember that profiling data uses parquet storage format.

Scrape the Profiling Data

Alloy queries the pprof endpoints of your Golang application, collects the profiles, and forwards them to the Kfuse Profiler server.

Prerequisites

  1. Ensure your Golang application exposes pprof endpoints.

  2. In pull mode, the Alloy collector periodically retrieves profiles from Golang applications, specifically targeting the /debug/pprof/* endpoints.

  3. Set up Go profiling in pull mode to generate profiles. See Grafana documentation on Set up Go profiling in pull mode.

  4. Set up Java profiling to generate profiles. See Grafana documentation on Java.

Configure Scraping

  1. Configure the alloy scraper in a new file, alloy-values.yaml. Download a copy of the default alloy-values.yaml file from the Grafana repository, and customize the alloy configMap section.

  2. Configure the pyroscope.write block in the Alloy Configuration file to define the endpoint where to send profiling data.

    Define Endpoint for Sending Profiling Data
    pyroscope.write "write_job_name" { (1)
        endpoint {
            url = "https://<KFUSE ENDPOINT/DNS NAME>/profile" (2)
        }
    }
    1 Change write_job_name to appropriate name, like kfuse_profiler_write.
    2 Change url to https://<KFUSE ENDPOINT/DNS NAME>/profile.
  3. Configure pyroscope.scrape block in the Alloy Configuration file to define the scraping configuration for profiling data.

    Define the Scraping Configuration for Profiling Data
    pyroscope.scrape "scrape_job_name" { (1)
            targets    = concat(discovery.relabel.kubernetes_pods.output) (2)
            forward_to = [pyroscope.write.write_job_name.receiver] (3)
    
            profiling_config { (4)
                    profile.process_cpu { (5)
                            enabled = true
                    }
    
                    profile.godeltaprof_memory { (6)
                            enabled = true
                    }
    
                    profile.memory { // disable memory, use godeltaprof_memory instead
                            enabled = false
                    }
    
                    profile.godeltaprof_mutex  { (7)
                            enabled = true
                    }
    
                    profile.mutex { // disable mutex, use godeltaprof_mutex instead
                            enabled = false
                    }
    
                    profile.godeltaprof_block { (8)
                            enabled = true
                    }
    
                    profile.block { // disable block, use godeltaprof_block instead
                            enabled = false
                    }
    
                    profile.goroutine {
                            enabled = true (9)
                    }
            }
    }
    1 Change scrape_job_name to an appropriate name, like kfuse_profiler_scrape.
    2 Use discovery.relabel.kubernetes_pods.output as a target for pyroscope.scrape block to discover Kubernetes targets. Follow Grafana documentation to learn how to set up specific regex rules Discover Kubernetes targets.
    3 forward_to: Connects the scrape job to the write job.
    4 profiling_config: Enables or disables specific profiles.
    5 profile.process_cpu: Enables CPU profiling.
    6 profile.godeltaprof_memory: Enables delta memory profiling.
    7 profile.godeltaprof_mutex: Enables delta mutex profiling.
    8 profile.godeltaprof_block: Enables delta block profiling.
    9 profile.goroutine: Enables goroutine profiling.
  4. Configure the rest of the fields in Alloy Configuration file.

    Alloy Configuration
    alloy:
      configMap:
        create: true (1)
        content: |- (2)
          logging { (3)
            level = "info"
            format = "logfmt"
          }
          discovery.kubernetes "pyroscope_kubernetes" {
            	role = "pod"
            }
    
            discovery.relabel "kubernetes_pods" {
            	targets = concat(discovery.kubernetes.pyroscope_kubernetes.targets)
    
            	rule {
            		action        = "drop"
            		source_labels = ["__meta_kubernetes_pod_phase"]
            		regex         = "Pending|Succeeded|Failed|Completed"
            	}
    
            	rule {
            		action = "labelmap"
            		regex  = "__meta_kubernetes_pod_label_(.+)"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__meta_kubernetes_namespace"]
            		target_label  = "kubernetes_namespace"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__meta_kubernetes_pod_name"]
            		target_label  = "kubernetes_pod_name"
            	}
    
            	rule {
            		action        = "keep"
            		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_scrape"]
            		regex = "true"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_application_name"]
            		target_label = "service_name"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_spy_name"]
            		target_label = "__spy_name__"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_scheme"]
            		regex = "(https?)"
            		target_label = "__scheme__"
            	}
    
            	rule {
            		action        = "replace"
            		source_labels = ["__address__", "__meta_kubernetes_pod_annotation_pyroscope_io_port"]
            		regex = "(.+?)(?::\\d+)?;(\\d+)"
            		replacement = "$1:$2"
            		target_label = "__address__"
            	}
    
            	rule {
            		action = "labelmap"
            		regex  = "__meta_kubernetes_pod_annotation_pyroscope_io_profile_(.+)"
            		replacement = "__profile_$1"
            	}
            }
            pyroscope.scrape "pyroscope_scrape" { (4)
            	clustering {
            		enabled = true
            	}
    
            	targets    = concat(discovery.relabel.kubernetes_pods.output)
            	forward_to = [pyroscope.write.pyroscope_write.receiver]
    
            	profiling_config {
            		profile.memory {
            			enabled = true
            		}
    
            		profile.process_cpu {
            			enabled = true
            		}
    
            		profile.goroutine {
            			enabled = true
            		}
    
            		profile.block {
            			enabled = false
            		}
    
            		profile.mutex {
            			enabled = false
            		}
    
            		profile.fgprof {
            			enabled = false
            		}
            	}
            }
            pyroscope.write "pyroscope_write" {
            	endpoint {
                url = "https://<KFUSE ENDPOINT/DNS NAME>/profile"
              }
            }
    
        name: null (5)
        key: null (6)
    
      clustering:
        enabled: false (7)
        name: "" (8)
        portName: http (9)
    
      stabilityLevel: "generally-available" (10)
      storagePath: /tmp/alloy (11)
      listenAddr: 0.0.0.0 (12)
      listenPort: 12345 (13)
      listenScheme: HTTP (14)
      uiPathPrefix: / (15)
      enableReporting: true (16)
      extraEnv: [] (17)
      envFrom: [] (18)
      extraArgs: [] (19)
      extraPorts: [] (20)
      # - name: "faro"
      #   port: 12347
      #   targetPort: 12347
      #   protocol: "TCP"
      #   appProtocol: "h2c"
    
      mounts:
        varlog: false (21)
        dockercontainers: false
        extra: [] (22)
    
      securityContext: {} (23)
    
      resources: {} (24)
    1 create: true Create a new ConfigMap for the config file.
    2 content: Content to assign to the new ConfigMap. This passes into tpl, allowing for templating from values.
    3 Start Alloy config
    4 pyroscope.scrape: Specifies the targets sources for scrape profiling data. See
    5 name: Name of existing ConfigMap to use. Used when create is false.
    6 key: Key in ConfigMap, to get config.
    7 clustering enabled: Deploy Alloy in a cluster, to allow for load distribution.
    8 clustering name: Name for the Alloy cluster. Used for differentiating between clusters.
    9 portName: Name of the port for clustering, useful if running inside an Istio Mesh
    10 stabilityLevel: Minimum stability level of components and behavior to enable. Must be one of experimental, public-preview, or generally-available.
    11 storagePath: Path to where Grafana Alloy stores data, such as Write-Ahead Log. By default, data is lost between reboots.
    12 listenAddr: Address for listening to traffic. Value of 0.0.0.0 exposes the UI to other containers.
    13 listenPort: Port for listening to traffic.
    14 listenScheme: Scheme for readiness probes. If enabling tls in your configs, set to "HTTPS"
    15 uiPathPrefix: Base path that exposes the UI.
    16 enableReporting: Enables sending Grafana Labs anonymous usage stats to help improve Grafana Alloy.
    17 # — Extra environment variables to pass to the Alloy container. extraEnv: []
    18 envFrom: Maps all keys on a ConfigMap or Secret as environment variables. See Kubernetes documentation for EnvFromSource v1 core.
    19 extraArgs: Extra arguments to pass to alloy run. See Grafana documentation for The run command.
    20 extraPorts: Extra ports to expose on the Alloy container.
    21 varlog: Mount /var/log from the host into the container to collect logs.
    22 extra: Extra volume mounts to add into the Grafana Alloy container. Does not affect the watch container.
    23 securityContext: Security context to apply to the Grafana Alloy container.
    24 resources: Resource requests and limits to apply to the Grafana Alloy container.
  5. Refer to Grafana documentation on Deploy Grafana Alloy on Kubernetes to correctly install alloy in the namespace where you want to scrape the data.

  6. Update alloy using the alloy-values.yaml file you set up. Use the same namespace as in the previous step, where you installed alloy.

    helm upgrade --namespace <namespace> alloy grafana/alloy -f <path/to/alloy-values.yaml>