Skip to main content
Back to Blog
12 July 202515 min read

OpenTelemetry: The Observability Standard

OpenTelemetryObservabilityTracingMetrics

Implementing distributed tracing and metrics with OpenTelemetry. Instrumentation patterns, collector configuration, and integration with observability backends.


OpenTelemetry: The Observability Standard

OpenTelemetry (OTel) is the industry standard for collecting telemetry data—traces, metrics, and logs—from applications. It provides vendor-neutral instrumentation, enabling portability across observability backends like Jaeger, Prometheus, Datadog, and more.

OpenTelemetry Architecture

Core Components

OpenTelemetry Architecture:

Application
├── SDK
│   ├── TracerProvider
│   ├── MeterProvider
│   └── LoggerProvider
├── API (vendor-neutral)
└── Instrumentation Libraries

         │ OTLP (OpenTelemetry Protocol)
         ▼

OpenTelemetry Collector
├── Receivers (OTLP, Jaeger, Prometheus, etc.)
├── Processors (batch, memory_limiter, attributes)
└── Exporters (Jaeger, Prometheus, OTLP, etc.)

         │
         ▼

Observability Backends
├── Jaeger (traces)
├── Prometheus (metrics)
├── Elasticsearch (logs)
└── Commercial (Datadog, New Relic, etc.)

Signal Types

SignalPurposeExample
TracesRequest flow across servicesHTTP request → DB query → Response
MetricsNumerical measurementsRequest count, latency percentiles
LogsDiscrete eventsError messages, audit events
BaggageContext propagationUser ID, tenant ID

Application Instrumentation

Node.js Setup

// tracing.ts import { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'; import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc'; import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics'; import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'; import { Resource } from '@opentelemetry/resources'; import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'; const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: process.env.SERVICE_NAME || 'api-service', [SemanticResourceAttributes.SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4317', }); const metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4317', }); const sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 10000, }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, '@opentelemetry/instrumentation-http': { ignoreIncomingRequestHook: (req) => { // Ignore health checks return req.url === '/health' || req.url === '/ready'; }, }, }), ], }); sdk.start(); process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('SDK shut down successfully')) .catch((error) => console.error('Error shutting down SDK', error)) .finally(() => process.exit(0)); });

Manual Instrumentation

// manual-tracing.ts import { trace, context, SpanKind, SpanStatusCode } from '@opentelemetry/api'; import { metrics } from '@opentelemetry/api'; const tracer = trace.getTracer('my-service', '1.0.0'); const meter = metrics.getMeter('my-service', '1.0.0'); // Create custom metrics const requestCounter = meter.createCounter('http_requests_total', { description: 'Total number of HTTP requests', }); const requestDuration = meter.createHistogram('http_request_duration_ms', { description: 'HTTP request duration in milliseconds', unit: 'ms', }); // Manual span creation export const processOrder = async (orderId: string): Promise<Order> => { return tracer.startActiveSpan('processOrder', { kind: SpanKind.INTERNAL, attributes: { 'order.id': orderId, }, }, async (span) => { try { // Child span for database operation const order = await tracer.startActiveSpan('fetchOrder', async (childSpan) => { childSpan.setAttribute('db.system', 'postgresql'); childSpan.setAttribute('db.operation', 'SELECT'); try { const result = await database.findOrder(orderId); childSpan.setStatus({ code: SpanStatusCode.OK }); return result; } catch (error) { childSpan.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message, }); childSpan.recordException(error as Error); throw error; } finally { childSpan.end(); } }); // Process the order const processedOrder = await validateAndProcess(order); span.setAttribute('order.status', processedOrder.status); span.setStatus({ code: SpanStatusCode.OK }); return processedOrder; } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message, }); span.recordException(error as Error); throw error; } finally { span.end(); } }); }; // Express middleware with metrics export const metricsMiddleware = (req: Request, res: Response, next: NextFunction) => { const startTime = Date.now(); res.on('finish', () => { const duration = Date.now() - startTime; const labels = { method: req.method, path: req.route?.path || req.path, status_code: res.statusCode.toString(), }; requestCounter.add(1, labels); requestDuration.record(duration, labels); }); next(); };

OpenTelemetry Collector

Collector Configuration

# otel-collector-config.yaml receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 prometheus: config: scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true processors: batch: timeout: 10s send_batch_size: 1000 memory_limiter: check_interval: 1s limit_mib: 1000 spike_limit_mib: 200 attributes: actions: - key: environment value: ${ENVIRONMENT} action: upsert resource: attributes: - key: k8s.cluster.name value: ${CLUSTER_NAME} action: upsert filter: traces: span: - 'attributes["http.route"] == "/health"' - 'attributes["http.route"] == "/ready"' tail_sampling: decision_wait: 10s policies: - name: error-policy type: status_code status_code: status_codes: [ERROR] - name: slow-policy type: latency latency: threshold_ms: 1000 - name: probabilistic-policy type: probabilistic probabilistic: sampling_percentage: 10 exporters: otlp/jaeger: endpoint: jaeger-collector:4317 tls: insecure: true prometheusremotewrite: endpoint: http://prometheus:9090/api/v1/write tls: insecure: true elasticsearch: endpoints: [http://elasticsearch:9200] logs_index: otel-logs traces_index: otel-traces debug: verbosity: detailed extensions: health_check: endpoint: 0.0.0.0:13133 zpages: endpoint: 0.0.0.0:55679 service: extensions: [health_check, zpages] pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, attributes, tail_sampling] exporters: [otlp/jaeger] metrics: receivers: [otlp, prometheus] processors: [memory_limiter, batch, attributes] exporters: [prometheusremotewrite] logs: receivers: [otlp] processors: [memory_limiter, batch, attributes] exporters: [elasticsearch] telemetry: logs: level: info metrics: address: 0.0.0.0:8888

Kubernetes Deployment

# otel-collector-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: otel-collector namespace: observability spec: replicas: 2 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: serviceAccountName: otel-collector containers: - name: collector image: otel/opentelemetry-collector-contrib:0.91.0 args: - --config=/conf/otel-collector-config.yaml ports: - containerPort: 4317 # OTLP gRPC - containerPort: 4318 # OTLP HTTP - containerPort: 8888 # Metrics - containerPort: 13133 # Health check env: - name: ENVIRONMENT valueFrom: fieldRef: fieldPath: metadata.namespace - name: CLUSTER_NAME value: production-cluster volumeMounts: - name: config mountPath: /conf resources: requests: memory: "256Mi" cpu: "200m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: / port: 13133 readinessProbe: httpGet: path: / port: 13133 volumes: - name: config configMap: name: otel-collector-config --- apiVersion: v1 kind: Service metadata: name: otel-collector namespace: observability spec: selector: app: otel-collector ports: - name: otlp-grpc port: 4317 - name: otlp-http port: 4318

Context Propagation

Cross-Service Tracing

// context-propagation.ts import { context, propagation, trace } from '@opentelemetry/api'; import { W3CTraceContextPropagator } from '@opentelemetry/core'; // Set up propagator propagation.setGlobalPropagator(new W3CTraceContextPropagator()); // Inject context into outgoing request export const callService = async (url: string, data: any): Promise<any> => { const headers: Record<string, string> = {}; // Inject current context into headers propagation.inject(context.active(), headers); const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json', ...headers, // traceparent, tracestate }, body: JSON.stringify(data), }); return response.json(); }; // Extract context from incoming request (middleware) export const extractContext = (req: Request, res: Response, next: NextFunction) => { const extractedContext = propagation.extract(context.active(), req.headers); context.with(extractedContext, () => { next(); }); };

Key Takeaways

  1. Vendor neutral: OTel works with any observability backend

  2. Auto-instrumentation: Start quickly with automatic instrumentation

  3. Manual spans: Add custom spans for business-critical operations

  4. Collector deployment: Use the Collector for processing and routing

  5. Tail sampling: Sample intelligently based on trace characteristics

  6. Context propagation: Ensure trace context flows across service boundaries

  7. Resource attributes: Add metadata like service name, version, environment

  8. Start with traces: Distributed tracing provides the most insight initially

OpenTelemetry provides a unified approach to observability. Invest in proper instrumentation to gain visibility into distributed systems.

Share this article