Data Scrubbing
The Scrubbing tool in Kloudfuse provides a powerful way to permanently delete MELT (Metrics, Events, Logs, and Traces) data based on specific filters. This feature helps you manage storage costs, comply with data retention policies, and remove sensitive or unwanted data from your observability platform.
| Data deletion through the Scrubbing tool is permanent and cannot be undone. Always review your filters carefully before confirming a scrubbing job. |
|
Scrubbing only works on the observability data that is ingested and sealed in a Pinot segment. Jobs may remain incomplete or have random status if realtime data is selected for scrubbing. To identify which data can be scrubbed, please check the last sealed segments in pinot deepstore location. Only data before the last segment timestamp can be scrubbed. |
Overview
The Scrubbing feature allows you to:
-
Permanently delete telemetry data based on configurable filters
-
Clean up Logs, APM, Metrics, and Events data selectively
-
Apply label-based filters to target specific data
-
Select time ranges for data deletion
-
Monitor scrubbing job progress and history
-
View detailed logs and statistics for each scrubbing operation
Accessing the Scrubbing Tool
The Scrubbing tool is accessible from the Admin section:
-
Navigate to the Admin section in the left sidebar
-
Click on Scrubbing from the admin menu options
-
The main dashboard displays all scrubbing jobs with their current status
Understanding the Scrubbing Jobs Dashboard
The Scrubbing Jobs dashboard provides a comprehensive view of all scrubbing operations:
Dashboard Columns
| Column | Description |
|---|---|
Job Name |
User-defined identifier for the scrubbing job |
Filters |
Label filters applied to the data (e.g., |
Stream |
Type of data being scrubbed (Logs, APM, Metrics, or Events) |
Status |
Current job status (Incomplete, Completed, or Cancelled) |
Progress |
Visual progress bar and count of processed items |
Time Range |
Date and time range of data being scrubbed |
Created |
When the scrubbing job was initiated |
Creating a Scrubbing Job
To create a new scrubbing job:
-
Click the Start scrubbing job button in the top-right corner
-
The confirmation dialog opens with configuration options
Step 1: Select Stream Type
Choose the data stream to scrub:
-
Logs: Application and system log data
-
APM: Application Performance Monitoring traces and spans
-
Metrics: Time-series metric data
-
Events: System and application events
Step 2: Select Time Range
Configure the time period for data deletion:
-
Use the Last hour dropdown for quick time selections
-
The interface shows "Select the time range for data that will be permanently deleted"
-
Default selection is "Last hour"
Step 3: Configure Label Filters
Label filters determine which data will be deleted. The interface shows:
-
Label Filters (applies to all [stream-type]): Define filters using label-value pairs
-
Two dropdown selectors: "Select…" = "Select…" format
-
The label changes dynamically based on stream type (e.g., "applies to all logs")
-
Examples from actual data:
kube_service = "opbeans-go",action = "add_client",app_shipping_zip_code = "95054"
Step 4: Preview Data
Before confirming the scrubbing job:
Logs Preview Section
-
Shows sample logs matching your filter criteria with the text "Sample logs matching your filter criteria ([time-range])"
-
Displays Log line count with exact numbers (e.g., "2,039,047")
-
Includes detailed statistics breakdown:
-
Total: Overall count (e.g., "2.04M")
-
debug: Specific count (e.g., "918")
-
error: Specific count (e.g., "9.31K")
-
fatal: Specific count (e.g., "2")
-
info: Specific count (e.g., "1.99M")
-
notice: Specific count (e.g., "115")
-
trace: Specific count (e.g., "10.92K")
-
warn: Specific count (e.g., "30.98K")
-
-
Time-based histogram showing data distribution with "Compare" option and "Same time yesterday"
-
Paginated table view with detailed log entries showing:
-
Date: Timestamp (e.g., "2025-09-18 17:38:04")
-
Container Name: Service name (e.g., "imageprovider", "cartservice")
-
Host: Kubernetes node (e.g., "gke-moscatel-moscatel-np-us-west1-a-df65173f-9zjj")
-
Kube Namespace: Namespace (e.g., "otel-trace")
-
Kube Service: Service identifier (often shows "-")
-
Kube Cluster Name: Cluster name (e.g., "moscatel")
-
Message: Full log message content
-
Pod Name: Complete pod name (e.g., "my-otel-demo-imageprovider-597ff6cd84-vqkvb")
-
Source: Log source (e.g., "imageprovider", "cartservice")
-
Step 4: Review Table Data
The preview section includes a paginated table with navigation controls:
-
Rows per page: Configurable (default appears to be 10)
-
Page navigation: Numbered page controls (1, 2, 3, 4, 5) and "Go to next page"
-
Additional action: "Open in Logs page" button for detailed log exploration
Monitoring Scrubbing Progress
The Scrubbing Jobs dashboard shows all jobs with detailed progress information:
Job Status Types
-
Incomplete: Job is currently running (orange status, shows current progress like "0% 0 / 566")
-
Completed: Job finished successfully (green status, shows "100%" with final counts)
-
Cancelled: Job was stopped before completion (shows "0%" with partial progress)
Progress Display
Each job in the dashboard shows:
-
Progress Bar: Visual green progress bar showing completion percentage
-
Percentage: Numeric percentage (0% to 100%)
-
Item Count: Fractional display showing "processed / total" items
-
Real Examples from System:
-
Active job: "0% 0 / 566" (incomplete job in progress)
-
Small completed job: "100% 49 / 49" (quoteservice logs)
-
Medium completed job: "100% 2,877 / 2,877" (metrics with add_client action)
-
Large completed job: "100% 601,874 / 601,874" (otel-demo logs)
-
Very large cancelled job: "0% 0 / 14,401,108" (cancelled due to size)
-
Best Practices
Before Creating a Scrubbing Job
-
Verify Filters: Double-check label filters to ensure correct data selection
-
Review Time Range: Confirm the date range matches your intention
-
Check Preview: Always review the preview data before confirming
-
Document Purpose: Use descriptive job names for audit trails
Common Use Cases
Compliance and Data Retention
-
Remove data older than retention policies require
-
Delete sensitive information from specific services
-
Clean up test or development data from production
Troubleshooting
Limitations
-
Maximum time range per job may be limited
-
Large scrubbing operations may impact system performance
-
Some system-critical data may be protected from deletion
-
Concurrent scrubbing job limits may apply