Common Setup Issues

This guide covers the most common problems encountered when instrumenting applications with OpenTelemetry and sending traces to Kloudfuse APM, with steps to diagnose and resolve each issue.

No Traces Appear in UI

Symptoms

  • APM Trace Explorer is empty after setup

  • Service does not appear in the service list

  • No spans appear even after sending requests

Diagnose

  1. Confirm the application is actually emitting spans — add a test span at startup and check application logs for exporter errors.

  2. Verify kf-agent is running and reachable from the application pod:

    kubectl get pods -n <namespace>
    kubectl exec -it <app-pod> -- nc -zv kf-agent 4317
    bash
  3. Check the OTLP endpoint format — gRPC uses http://kf-agent:4317 (no trailing slash, no path); HTTP/JSON uses http://kf-agent:4318.

  4. Confirm service.name is set — without it the SDK defaults to unknown_service:<process> and may not appear where you expect it.

  5. Confirm the sampler is not set to always_off or traceidratio with arg 0.

Fix

Enable debug logging for the SDK to see exporter errors in the application log output:

# Java agent
-Dotel.javaagent.debug=true

# Python
OTEL_LOG_LEVEL=debug opentelemetry-instrument python app.py
bash

For Go applications, do not import OpenTelemetry internal packages. Instead, enable debug logging through your application’s normal logger and log any exporter or provider initialization errors using public OpenTelemetry APIs. Look for lines containing OTLP export failed, connection refused, or StatusCode. Fix the endpoint or firewall rules and retry.

Partial or Broken Traces

Symptoms

  • Trace timeline shows orphan spans with no parent

  • trace_id is the same but spans appear under separate traces

  • Parent-child relationships are missing between services

Causes

  • Trace context headers (traceparent, tracestate) are not forwarded on outbound HTTP or gRPC calls

  • An intermediate service strips or rewrites headers

  • Services use different propagator formats (e.g., B3 vs W3C TraceContext)

Fix

Ensure W3C TraceContext propagation is configured globally and that headers are forwarded on every outbound call.

Java — the agent propagates automatically for all supported HTTP clients. If using RestTemplate or WebClient manually, do not recreate the client inside a span — let the agent intercept it.

Python — inject headers on outbound requests using the propagator:

from opentelemetry.propagate import inject

headers = {}
inject(headers)   # adds traceparent / tracestate
response = requests.get("http://other-service/api", headers=headers)
python

Go — use otelhttp.NewTransport so the HTTP client injects headers automatically:

import "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"

client := &http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
client.Do(req)
go

If services use different propagator formats, set a consistent list on all services:

OTEL_PROPAGATORS=tracecontext,baggage
bash

Spans Are Dropped or Silently Lost

Symptoms

  • Traces appear incomplete under high load but are fine at low throughput

  • Application logs show BatchSpanProcessor queue is full or dropping span

Causes

The BatchSpanProcessor has a finite in-memory queue. When the application produces spans faster than the exporter can flush them, excess spans are dropped without error.

Fix

Increase the queue and batch sizes using environment variables (applies to Java, Python, and Go):

OTEL_BSP_MAX_QUEUE_SIZE=8192        # default 2048
OTEL_BSP_MAX_EXPORT_BATCH_SIZE=1024 # default 512
OTEL_BSP_SCHEDULE_DELAY=2000        # flush interval in ms, default 5000
OTEL_EXPORTER_OTLP_COMPRESSION=gzip # reduce network overhead
bash

If drops persist, consider reducing trace volume with sampling:

OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1   # sample 10% of root traces
bash

See Performance Tuning for a full reference of BSP environment variables.

Sampling Issues

Symptoms

  • Only a small fraction of traces appear, even at low throughput

  • Trace Explorer shows gaps in timeline

  • parentbased_traceidratio produces inconsistent coverage

Causes

  • Sample rate set too low — OTEL_TRACES_SAMPLER_ARG=0.01 keeps only 1% of traces

  • parentbased_traceidratio inherits the parent’s decision: if an upstream service samples out a trace, all downstream spans for that trace are also dropped

  • traceidratio (without parentbased_) ignores the parent decision, which can produce inconsistent traces

Fix

Use parentbased_traceidratio for consistent, end-to-end sampling, and set the same rate on all services in a call chain:

OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.05   # 5% of new traces; inherited by downstream
bash

Set OTEL_TRACES_SAMPLER_ARG=1.0 temporarily to capture 100% of traces during debugging.

API Key or Authentication Errors

Symptoms

  • kf-agent logs show 401 Unauthorized or 403 Forbidden

  • No data appears in the UI despite correct endpoint and instrumentation

Fix

  1. Confirm the API key is valid and scoped for trace ingestion in the Kloudfuse UI under Administration → API Keys.

  2. Pass the key as an OTLP header, not as a query parameter:

    OTEL_EXPORTER_OTLP_HEADERS=kf-api-key=<your-key>
    bash
  3. In Kubernetes, store the key in a Secret and reference it as an environment variable — do not embed it in a ConfigMap or Helm values file committed to source control:

    env:
      - name: OTEL_EXPORTER_OTLP_HEADERS
        valueFrom:
          secretKeyRef:
            name: kloudfuse-api-key
            key: value
    yaml

Java Agent Not Attaching

Symptoms

  • No spans appear from a Java application despite having the agent JAR

  • Application starts without any opentelemetry-javaagent log lines

  • Caused by: java.lang.ClassNotFoundException for OTel classes

Causes

  • -javaagent: flag is not reaching the JVM — the flag must be passed to the JVM, not the application

  • In containers, the entrypoint uses exec java directly but JAVA_TOOL_OPTIONS is not set

  • Multiple -javaagent: flags conflict (e.g., a profiler and the OTel agent)

  • The JAR path contains spaces or is not accessible at the path specified

Fix

In containers, use JAVA_TOOL_OPTIONS — the JVM reads it before any command-line flags regardless of the entrypoint:

env:
  - name: JAVA_TOOL_OPTIONS
    value: "-javaagent:/agent/opentelemetry-javaagent.jar"
yaml

Verify the agent is attaching by looking for this line in the startup log:

[otel.javaagent] ... opentelemetry-javaagent ... attached
bash

If the line is absent, the flag is not reaching the JVM. Check that JAVA_TOOL_OPTIONS is set in the correct container and that the JAR exists at the specified path:

kubectl exec -it <pod> -- ls -l /agent/opentelemetry-javaagent.jar
bash

For Gradle or Maven wrapper scripts that spawn a sub-process, the -javaagent: flag set in JAVA_TOOL_OPTIONS is automatically inherited by child JVMs.

Java Agent 2.x: ClassNotFoundException with Manual API Usage

Symptoms

  • java.lang.NoClassDefFoundError: io/opentelemetry/api/GlobalOpenTelemetry at startup

  • The javaagent IS attached (its startup log lines appear) but main() crashes immediately

  • The javaagent log shows Loading instrumentation opentelemetry-api entries appearing after the exception

  • The error stack trace points to application code that calls GlobalOpenTelemetry.getTracer()

Causes

The OTel Java agent 2.x bridges manual API usage (calls to GlobalOpenTelemetry, Tracer, etc.) through lazily-loaded instrumentation modules. These modules intercept and transform io.opentelemetry.api.* classes at load time. However, because the modules are loaded lazily — triggered by the first class reference in application code — there is a race: main() can reference GlobalOpenTelemetry before the agent has installed the transformer for it, causing a ClassNotFoundException.

Additionally, in javaagent 2.x the io.opentelemetry.api package is not injected directly into the bootstrap classloader. The agent relies on bytecode transformation to make its internal (shaded) SDK implementation visible to the application. If the transformation has not yet been registered when the class is first loaded, the class is not found by either the bootstrap or application classloader.

This only affects applications that call the OTel API directly in main() without an auto-instrumented framework (Spring, Servlet, gRPC, etc.) running first to trigger the lazy module load.

Fix

For manual span creation without an auto-instrumented framework, replace the javaagent with the OTel Java SDK initialized directly in code. Use AutoConfiguredOpenTelemetrySdk to read all OTEL_* environment variables automatically — the same variables the javaagent would have consumed.

Add the following dependencies (Maven):

<dependencies>
  <!-- Reads OTEL_* env vars and sets up the SDK as global -->
  <dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-sdk-extension-autoconfigure</artifactId>
    <version>1.40.0</version>
  </dependency>
  <!-- OTLP HTTP exporter -->
  <dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
    <version>1.40.0</version>
  </dependency>
</dependencies>
xml

Initialize the SDK at the start of main(), before creating any tracers:

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Scope;
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk;

public class MyService {
    public static void main(String[] args) throws Exception {
        // Initialize from OTEL_* env vars and register as global.
        // Reads: OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,
        //        OTEL_EXPORTER_OTLP_HEADERS, OTEL_EXPORTER_OTLP_PROTOCOL, etc.
        AutoConfiguredOpenTelemetrySdk.builder()
                .setResultAsGlobal()
                .build();

        Tracer tracer = GlobalOpenTelemetry.getTracer("my-service");

        Span span = tracer.spanBuilder("my-operation")
                .setSpanKind(SpanKind.SERVER)
                .startSpan();
        try (Scope scope = span.makeCurrent()) {
            // ... do work ...
        } finally {
            span.end();
        }
    }
}
java

The OTEL_* environment variables are identical to those used with the javaagent — no changes to deployment configuration are required.

The javaagent is still the recommended approach when an auto-instrumented framework (Spring Boot, Jakarta Servlet, gRPC) is present, because the framework triggers the lazy module load before any manual API calls are made. The direct SDK approach is appropriate for standalone batch jobs, background loops, and any application where no auto-instrumented entry point precedes manual span creation.

Python Fork Safety (Gunicorn / uWSGI)

Symptoms

  • Traces appear for the first few requests then stop completely

  • Application hangs or deadlocks after startup under Gunicorn or uWSGI

  • BrokenPipeError or ResourceWarning: unclosed socket in logs

Causes

The BatchSpanProcessor creates background threads and opens a gRPC connection before the server forks worker processes. When the OS forks a process, open file descriptors and threading state are copied into each worker. The forked gRPC connection is no longer valid, and the background flush thread does not exist in the child process.

Fix

Initialize the TracerProvider after the fork by using the server’s post-fork hook, not at module import time.

Gunicorn — use post_fork in the config file:

# gunicorn.conf.py
def post_fork(server, worker):
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

    provider = TracerProvider()
    provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
    trace.set_tracer_provider(provider)
python

uWSGI — use the @postfork decorator:

from uwsgidecorators import postfork

@postfork
def init_tracing():
    # same provider setup as above
    ...
python

When using opentelemetry-instrument CLI, pass --no-auto-instrumentation and initialize manually inside post_fork instead.

Go: Spans Not Linked Across Goroutines

Symptoms

  • Spans created in goroutines appear as root spans instead of children

  • Service map shows the same service calling itself

  • trace_id differs between the parent handler and work done in goroutines

Causes

Go’s context.Context is not shared automatically between goroutines. If you launch a goroutine without passing the context, the goroutine’s spans have no parent.

Fix

Always pass ctx explicitly into every goroutine and function that creates spans:

// Wrong — goroutine has no trace context
go func() {
    _, span := tracer.Start(context.Background(), "background-job")
    defer span.End()
}()

// Correct — parent context is passed in
go func(ctx context.Context) {
    _, span := tracer.Start(ctx, "background-job")
    defer span.End()
}(ctx)
go

For work queues or goroutine pools, propagate the context through the work item struct rather than capturing it from the enclosing scope (which may have already finished by the time the goroutine runs).

Spans Not Visible in Trace Explorer and Missing from Service Map

Symptoms

  • Spans do not appear in the Trace Explorer but the service does appear under Services

  • Request throughput and latency metrics are absent for the service

  • Error rate is always zero even when requests fail

  • Service-level aggregations (p50/p95/p99) show no data

Causes

Kloudfuse derives service-level metrics (throughput, latency, error rate) from spans with SpanKind = SERVER. If every span from a service has SpanKind = INTERNAL — the default when no kind is specified — the spans are stored and searchable but are not counted as inbound requests for that service.

SpanKind = INTERNAL is the correct kind for in-process business logic that sits inside a request, not for the entry point of the request itself. Using it on the outermost handler span means the platform cannot distinguish a service boundary from an internal function call.

SpanKind Reference

SpanKind Value When to use

SERVER

2

The span represents an inbound request handled by this service — HTTP, gRPC, message consumer. Use this on the outermost handler span. Required for service map and request metrics.

CLIENT

3

An outbound call to another service or external resource — HTTP client, database query, cache read.

PRODUCER

4

Publishing a message to a queue or topic (Kafka, SQS, RabbitMQ).

CONSUMER

5

Receiving and processing a message from a queue or topic.

INTERNAL

1

In-process business logic with no cross-service boundary. This is the default when no kind is set. Do not use it for the entry-point span of a service.

Fix

Always set SpanKind = SERVER on the outermost span of any service that handles inbound requests.

Java — the javaagent sets the correct kind automatically for all supported HTTP and gRPC servers. For manual spans representing an inbound call, set the kind explicitly:

Span span = tracer.spanBuilder("process-request")
    .setSpanKind(SpanKind.SERVER)
    .startSpan();
java

Pythonopentelemetry-instrument with opentelemetry-instrumentation-flask (or Django, FastAPI) sets SpanKind.SERVER automatically. For manual spans:

with tracer.start_as_current_span("process-request", kind=trace.SpanKind.SERVER) as span:
    ...
python

Go — use otelhttp.NewHandler to wrap HTTP handlers. It sets SpanKind = SERVER, emits the correct HTTP semantic convention attributes, and extracts incoming W3C trace context automatically:

import "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"

// otelhttp sets SpanKind = SERVER and attributes http.method, http.status_code,
// http.target, net.host.name, net.protocol.version automatically.
http.Handle("/api/orders", otelhttp.NewHandler(ordersHandler, "GET /api/orders"))
go

For manual Go spans, pass trace.WithSpanKind:

ctx, span := tracer.Start(ctx, "handle-request",
    trace.WithSpanKind(trace.SpanKindServer),
)
defer span.End()
go

Go: Semantic Convention Limitations in otelhttp v0.53.x

otelhttp v0.53.x (compatible with go.opentelemetry.io/otel v1.28.x) emits the older, now-deprecated HTTP attribute names:

Emitted by otelhttp ≤ v0.55 Stable replacement (otelhttp ≥ v0.56)

http.method

http.request.method

http.status_code

http.response.status_code

http.target

url.path

http.scheme

url.scheme

net.host.name

server.address

net.sock.peer.addr

network.peer.address

Both the old and new attribute names are accepted by Kloudfuse. If you need the stable names — for example to match a filter or dashboard that uses http.request.method — upgrade otelhttp to v0.56.0 or later and align all otel core modules to v1.31.0 or later at the same time.

The version alignment requirement is not limited to the three modules below — it applies to every go.opentelemetry.io/otel/ module in your go.mod. Common ones include otel/metric, otel/log, otel/bridge/, and any other otel/exporters/* you import. Running go mod tidy will catch any that are out of sync.

# Upgrade all otel core modules together, then the contrib package
go get go.opentelemetry.io/otel@v1.31.0
go get go.opentelemetry.io/otel/sdk@v1.31.0
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp@v1.31.0
# add any other go.opentelemetry.io/otel/* modules your project uses at the same version
go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.56.0
go mod tidy
otelhttp and the otel core modules must be upgraded together. Mismatching minor versions (e.g. otelhttp v0.56 with otel core v1.28) will cause a build error because the contrib package imports the otel API interfaces directly and they must match exactly. Run go mod tidy after every version change to resolve the full dependency graph.

Wrong OTLP Protocol (gRPC vs HTTP)

Symptoms

  • Export fails with HTTP 415 Unsupported Media Type or HTTP 405 Method Not Allowed

  • gRPC exporter logs failed to connect or transport: error while dialing

  • Data arrives at agent but is malformed

Fix

kf-agent listens on two ports with different protocols:

Port Protocol Endpoint format

4317

gRPC (OTLP/gRPC)

http://kf-agent:4317 — no path

4318

HTTP/JSON (OTLP/HTTP)

http://kf-agent:4318 — no path; SDK appends /v1/traces automatically

Match the SDK exporter type to the port. Most SDK defaults use gRPC on port 4317. If you switch to HTTP, update both the endpoint and the exporter package:

# gRPC (default)
OTEL_EXPORTER_OTLP_ENDPOINT=http://kf-agent:4317

# HTTP
OTEL_EXPORTER_OTLP_ENDPOINT=http://kf-agent:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
bash

SDK Version Conflicts

Symptoms

  • NoSuchMethodError or ClassCastException in Java at startup

  • Python shows ImportError: cannot import name 'TracerProvider' from 'opentelemetry.sdk.trace'

  • Go build fails with incompatible module versions

Fix

Java — always use the BOM to align versions. Do not mix opentelemetry-sdk and opentelemetry-api versions:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.opentelemetry</groupId>
      <artifactId>opentelemetry-bom</artifactId>
      <version>1.40.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
xml

Python — pin all opentelemetry-* packages to the same version in requirements.txt. Run opentelemetry-bootstrap -a install after updating to reinstall the correct instrumentation packages for your installed libraries.

Go — run go mod tidy after updating any go.opentelemetry.io/* module. Check that the otel core module version matches the contrib package versions — they must be aligned to the same minor version series.

TLS / Connection Errors to kf-agent

Symptoms

  • Exporter logs show x509: certificate signed by unknown authority

  • failed to obtain credentials or transport: authentication handshake failed

  • Traces arrive in development but not in production (where TLS is enforced)

Fix

If kf-agent requires TLS, configure the exporter with the CA certificate. For Java:

-Dotel.exporter.otlp.certificate=/etc/ssl/certs/kf-agent-ca.pem
bash

For Go:

import "google.golang.org/grpc/credentials"

creds, _ := credentials.NewClientTLSFromFile("/etc/ssl/certs/kf-agent-ca.pem", "")
exporter, _ := otlptracegrpc.New(ctx,
    otlptracegrpc.WithEndpoint("kf-agent:4317"),
    otlptracegrpc.WithTLSCredentials(creds),
)
go

For development or internal clusters where TLS is not required, use WithInsecure() (Go) or omit the certificate property (Java/Python). Never disable TLS verification in production — use the correct CA certificate instead.

High Cardinality Causing Ingestion Throttling

Symptoms

  • Trace ingestion rate drops or data is throttled in the Kloudfuse Usage Dashboard

  • Span attributes contain user IDs, UUIDs, or request payloads

  • Service map is slow to render with hundreds of unique operation names

Fix

Avoid using high-cardinality values as span names or attribute keys. Span names should identify the operation type, not the specific value:

Avoid Use instead

GET /users/a3f9b1c2-…​

GET /users/{id}

process order 98312

process-order (put the order ID in an attribute)

SELECT * FROM orders WHERE id=12345

SELECT orders (use db.statement attribute for the full query)

Use attributes for variable data and keep the span name templated:

Span span = tracer.spanBuilder("process-order").startSpan();
span.setAttribute("order.id", orderId);   // attribute, not span name
java

See Cardinality Management for a full list of anti-patterns and mitigations.

Unsupported Environment or Language

Symptoms

  • No OpenTelemetry SDK available for the runtime or language version

  • Manual span creation fails at import or compile time

Fix

  • Consult the OpenTelemetry Registry for community-maintained SDKs and instrumentation libraries.

  • For runtimes without a native SDK, use a sidecar or proxy approach: route traffic through Envoy with the OpenTelemetry extension, or emit spans using the OTLP HTTP/JSON wire format from any HTTP client.

  • For legacy JVM languages (Groovy, Scala, Kotlin), the OpenTelemetry Java agent instruments all JVM bytecode regardless of the source language.

Agent Deployment Failures

Symptoms

  • Helm chart fails to install or upgrade

  • kf-agent pods are in CrashLoopBackOff or Pending

  • Agent starts but immediately exits

Fix

  1. Check pod status and events:

    kubectl describe pod -n <namespace> -l app=kf-agent
    kubectl logs -n <namespace> -l app=kf-agent --previous
    bash
  2. Verify required Helm values are set: clusterName, apiKey, and imagePullPolicy.

  3. Confirm network policies allow inbound traffic on ports 4317 (gRPC) and 4318 (HTTP) from application pods.

  4. If pods are in Pending, check resource requests — the agent requires at least 256 MiB memory.

See Kubernetes Setup for full deployment prerequisites.

Multiple Services Show the Same Name

Symptoms

  • All traces appear under unknown_service or a single service name

  • Different applications are merged in the service map

Fix

Set a unique service.name per application. The most reliable way is via environment variable so it can be configured per deployment without code changes:

OTEL_SERVICE_NAME=payments-api
OTEL_RESOURCE_ATTRIBUTES=service.namespace=payments,deployment.environment.name=production
bash

For Java with the agent:

-Dotel.service.name=payments-api
-Dotel.resource.attributes=service.namespace=payments,deployment.environment.name=production
bash

Avoid setting service.name to a pod name or hostname — use a stable logical name that identifies the service role, not the instance.

Validating Your Setup

After instrumentation, confirm data is flowing correctly:

  1. Open Kloudfuse UI → APM → Trace Explorer

  2. Filter by service.name = <your-service>

  3. Send a test request to your application — a trace should appear within a few seconds

  4. Click a trace and verify: span names, attributes, parent-child relationships, and timing

  5. Check APM → Service Map to confirm dependencies between services are visible

If traces appear but attributes are missing, re-check that resource attributes (service.namespace, deployment.environment.name) are set and that the SDK version supports the semantic conventions you expect.

Still Need Help?

Collect the following before contacting support:

  • SDK language and version (opentelemetry-javaagent-X.Y.Z.jar, opentelemetry-sdk==X.Y.Z, go.opentelemetry.io/otel vX.Y.Z)

  • Exporter debug log output (a few lines showing the failed export)

  • A sample trace_id from Kloudfuse UI that illustrates the problem

  • Your OTEL_* environment variables (redact OTEL_EXPORTER_OTLP_HEADERS)

  • kubectl describe output for the kf-agent pod if the agent is not reachable

References

The following OpenTelemetry documentation was used as the basis for the guidance on this page.

Specification and Concepts

Java

Python