User Guide to Data Scrubbing

Overview

The Scrubbing feature allows you to:

  • Permanently delete telemetry data based on configurable filters

  • Clean up Logs, APM, Metrics, and Events data selectively

  • Apply label-based filters to target specific data

  • Select time ranges for data deletion

  • Monitor scrubbing job progress and history

  • View detailed logs and statistics for each scrubbing operation

Accessing the Scrubbing Tool

The Scrubbing tool is accessible from the Admin section:

  1. Navigate to the Admin section in the left sidebar

  2. Click on Scrubbing from the admin menu options

  3. The main dashboard displays all scrubbing jobs with their current status

Scrubbing Jobs List showing job history

Understanding the Scrubbing Jobs Dashboard

The Scrubbing Jobs dashboard provides a comprehensive view of all scrubbing operations:

Dashboard Columns

Column Description

Job Name

User-defined identifier for the scrubbing job

Filters

Label filters applied to the data (e.g., kube_service = "app_name")

Stream

Type of data being scrubbed (Logs, APM, Metrics, or Events)

Status

Current job status (Incomplete, Completed, or Cancelled)

Progress

Visual progress bar and count of processed items

Time Range

Date and time range of data being scrubbed

Created

When the scrubbing job was initiated

Job Status Types

  • Completed: Job finished successfully (shown in green)

  • Incomplete: Job is currently running (shown in orange)

  • Cancelled: Job was stopped before completion

Creating a Scrubbing Job

To create a new scrubbing job:

  1. Click the Start scrubbing job button in the top-right corner

  2. The confirmation dialog opens with configuration options

Step 1: Select Stream Type

Choose the data stream to scrub:

  • Logs: Application and system log data

  • APM: Application Performance Monitoring traces and spans

  • Metrics: Time-series metric data

  • Events: System and application events

Step 2: Select Time Range

Configure the time period for data deletion:

  • Use the Last hour dropdown for quick time selections

  • The interface shows "Select the time range for data that will be permanently deleted"

  • Default selection is "Last hour"

Step 3: Configure Label Filters

Label filters determine which data will be deleted. The interface shows:

  • Label Filters (applies to all [stream-type]): Define filters using label-value pairs

  • Two dropdown selectors: "Select…​" = "Select…​" format

  • The label changes dynamically based on stream type (e.g., "applies to all logs")

  • Examples from actual data: kube_service = "opbeans-go", action = "add_client", app_shipping_zip_code = "95054"

Step 4: Preview Data

Before confirming the scrubbing job:

Confirm scrubbing job dialog showing stream selection

Logs Preview Section

  • Shows sample logs matching your filter criteria with the text "Sample logs matching your filter criteria ([time-range])"

  • Displays Log line count with exact numbers (e.g., "2,039,047")

  • Includes detailed statistics breakdown:

    • Total: Overall count (e.g., "2.04M")

    • debug: Specific count (e.g., "918")

    • error: Specific count (e.g., "9.31K")

    • fatal: Specific count (e.g., "2")

    • info: Specific count (e.g., "1.99M")

    • notice: Specific count (e.g., "115")

    • trace: Specific count (e.g., "10.92K")

    • warn: Specific count (e.g., "30.98K")

  • Time-based histogram showing data distribution with "Compare" option and "Same time yesterday"

  • Paginated table view with detailed log entries showing:

    • Date: Timestamp (e.g., "2025-09-18 17:38:04")

    • Container Name: Service name (e.g., "imageprovider", "cartservice")

    • Host: Kubernetes node (e.g., "gke-moscatel-moscatel-np-us-west1-a-df65173f-9zjj")

    • Kube Namespace: Namespace (e.g., "otel-trace")

    • Kube Service: Service identifier (often shows "-")

    • Kube Cluster Name: Cluster name (e.g., "moscatel")

    • Message: Full log message content

    • Pod Name: Complete pod name (e.g., "my-otel-demo-imageprovider-597ff6cd84-vqkvb")

    • Source: Log source (e.g., "imageprovider", "cartservice")

Step 4: Review Table Data

The preview section includes a paginated table with navigation controls:

  • Rows per page: Configurable (default appears to be 10)

  • Page navigation: Numbered page controls (1, 2, 3, 4, 5) and "Go to next page"

  • Additional action: "Open in Logs page" button for detailed log exploration

Step 5: Confirm Scrubbing

  1. Review all settings in the confirmation dialog

  2. Verify the log line count and data breakdown statistics

  3. Check the sample data in the preview table

  4. Click Confirm scrubbing (red button) to start the job

  5. Or click Cancel to abort without making changes

Monitoring Scrubbing Progress

The Scrubbing Jobs dashboard shows all jobs with detailed progress information:

Job Status Types

  • Incomplete: Job is currently running (orange status, shows current progress like "0% 0 / 566")

  • Completed: Job finished successfully (green status, shows "100%" with final counts)

  • Cancelled: Job was stopped before completion (shows "0%" with partial progress)

Progress Display

Each job in the dashboard shows:

  • Progress Bar: Visual green progress bar showing completion percentage

  • Percentage: Numeric percentage (0% to 100%)

  • Item Count: Fractional display showing "processed / total" items

  • Real Examples from System:

    • Active job: "0% 0 / 566" (incomplete job in progress)

    • Small completed job: "100% 49 / 49" (quoteservice logs)

    • Medium completed job: "100% 2,877 / 2,877" (metrics with add_client action)

    • Large completed job: "100% 601,874 / 601,874" (otel-demo logs)

    • Very large cancelled job: "0% 0 / 14,401,108" (cancelled due to size)

Time Range Display

Jobs show their configured time ranges in the format: * "YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS" * Examples: "2025-09-15 15:57:28 to 2025-09-15 16:02:28" or "2025-08-26 00:00:00 to 2025-08-26 23:59:59"

Best Practices

Before Creating a Scrubbing Job

  1. Verify Filters: Double-check label filters to ensure correct data selection

  2. Review Time Range: Confirm the date range matches your intention

  3. Check Preview: Always review the preview data before confirming

  4. Document Purpose: Use descriptive job names for audit trails

  1. Start Small: Test with narrow filters and short time ranges first

  2. Monitor Performance: Watch system impact during large scrubbing operations

  3. Schedule Wisely: Run large jobs during off-peak hours

  4. Keep Records: Document scrubbing operations for compliance

Safety Considerations

  • All scrubbing operations are logged for audit purposes

  • Completed jobs cannot be reversed

  • Consider exporting critical data before deletion

  • Use specific filters to avoid unintended data loss

Common Use Cases

Compliance and Data Retention

  • Remove data older than retention policies require

  • Delete sensitive information from specific services

  • Clean up test or development data from production

Performance Optimization

  • Remove high-volume debug logs after troubleshooting

  • Clean up verbose trace data from specific time periods

  • Delete metrics from decommissioned services

Storage Management

  • Free up space by removing obsolete log data

  • Delete temporary debugging information

  • Clean up data from failed or test deployments

Troubleshooting

Job Stays in Incomplete Status

  • Check system resources and performance

  • Verify network connectivity

  • Review job filters for conflicts

No Data in Preview

  • Verify label filters are correct

  • Check time range selection

  • Ensure data exists for the specified criteria

Job Cancelled Unexpectedly

  • Check user permissions

  • Review system logs for errors

  • Verify storage availability

Limitations

  • Maximum time range per job may be limited

  • Large scrubbing operations may impact system performance

  • Some system-critical data may be protected from deletion

  • Concurrent scrubbing job limits may apply