
Meet your instructor, Arun Krishnan, with 13+ years in monitoring and observability. Gain unbiased, practical open telemetry insights on performance, reliability, and industry best practices, independent of vendor.
I connect the dots in OpenTelemetry by introducing concepts early and revisiting them later with plain explanations. You’ll meet collector, OTLP, signals, meter providers, propagators, and exporters as they appear.
Explore the 11-section course outline, connecting each part to OTCA domains, while covering observability fundamentals, OpenTelemetry instrumentation, signals, SDK architecture, collector patterns, and standards.
Explore observability as the practice of inferring a software system’s internal state from telemetry signals—metrics, logs, and traces—and use it to control and maintain stability, availability, reliability, and performance.
Explore the three telemetry signals (metrics, logs, and traces) and how each reveals a software system's state, from measurements to request paths. Distinguish monitoring from observability.
Explore the differences between monitoring and observability, and learn how observability combines metrics, logs, and traces to reveal root causes, enable context-rich incident detection, and improve system reliability.
Explore the core components of an observability solution, including data collection, a back-end server, and transmission, using APIs or SDKs to collect telemetry, proxies, and back-end deployments (on-prem, cloud, hybrid).
Review observability as understanding a system’s internal state from telemetry signals: metrics, logs, and traces. Contrast monitoring with observability and note that OpenTelemetry handles collection and transmission for backend demo.
Set up your learning environment with two apps—Hello Telemetry and the Opentelemetry Astronomy Shop demo—to explore observability concepts and instrumentation using Java and Python.
Explore the OpenTelemetry demo by running the Docker deployment, cloning the repo, and launching the 18-service microservices app with observability tools Prometheus, Jaeger, OpenSearch, and Grafana.
See how the Opentelemetry demo operates across frontend and backend services. Track components such as envoy, next.js frontend, product catalog, cart, checkout, currency, email service, shipping, and fraud detection.
Review the learning environment and the two apps, Hello Telemetry and the OpenTelemetry demo. Note architecture, library versus custom code, and the three instrumentation methods for zero-code instrumentation.
OpenTelemetry foundations unify observability data collection with a single API, enabling generation, collection, and export of metrics, logs, and traces through instrumentation and OTLP with the collector.
Explore how OpenTelemetry APIs and SDKs enable telemetry data collection, distinguishing zero-code and code-based instrumentation, and learn when to import API or API-plus-SDK libraries.
Learn zero-code instrumentation with OpenTelemetry, where agents auto-instrument libraries or modules using bytecode manipulation, monkeypatching, or Ebpf, focusing on edges like inbound and outbound connections and databases across six languages.
Implement zero-code instrumentation for the hello telemetry Java app using the OpenTelemetry Java agent, configure via Docker and environment variables, and send traces to a collector for viewing in Jaeger.
Apply zero-code instrumentation to a Python service using OpenTelemetry distro, OTLP exporter, and monkeypatching; configure requirements, Dockerfile, bootstrap, and environment variables to emit traces to Jaeger.
Explore zero-code instrumentation for the Hello telemetry app, collecting http and jvm metrics, log records, and traces, and learn how code-based instrumentation captures custom spans, metrics, and logs Opentelemetry misses.
Explore zero code instrumentation with OpenTelemetry by comparing automatic instrumentation in Java and Python services, and review how the OpenTelemetry agent attaches to the JVM and runs with Docker.
Learn to implement code-based instrumentation with Opentelemetry APIs and SDKs to collect telemetry, metrics, and logs across languages, revealing time spent in custom code.
Recap OpenTelemetry’s origins and guiding principles, compare zero-code, code-based, and library-based instrumentation, and clarify API versus SDK roles; review hands-on demos and upcoming telemetry signals.
Explore the four OpenTelemetry telemetry signals—metrics, logs, traces, and baggage—and learn how instrumentation yields output measurements of system state, with code-based approaches kicking off with metrics.
Explore how OpenTelemetry captures metrics as point-in-time values, using a global meter provider to create meters and exporters, with instruments defined by name, kind, unit, and description and semantic conventions.
Explore OpenTelemetry metrics, including counter, gauge, and histogram. Learn how counters accumulate, including asynchronous and up down variants, gauges reflect current values, and histograms reveal distribution and performance.
Explore code-based instrumentation by removing zero-code agents in the hello telemetry app, adjust docker configurations, and prepare to create custom metrics for Java and Python services.
Add OpenTelemetry dependencies to the pom.xml and initialize the SDK with a periodic metric reader and OTLP gRPC exporter to count database requests.
Explore OpenTelemetry demo metrics with Prometheus and Grafana, viewing counter, up-down, gauge, and histogram data; filter by labels and visualize trends in dashboards.
Explore how traces are built from spans across app components, generated by zero code and code-based instrumentation, and assembled into a transaction for waterfall views.
Learn the pieces of information in an OpenTelemetry span, including name, context with trace and span IDs, timing, attributes, events, links, status, and kind (client, server, internal, producer, consumer).
Instrument a Java service to generate and export traces using code-based spans, including database, compute, and sleep spans with the Otlp gRPC span exporter, and view results in Yeager.
Learn to set up custom spans for the Python service using a tracer provider, batch span processor, and otlp exporter, view traces in Jaeger, and prepare for context propagation.
Discover how context propagation links spans into a trace using span context and trace IDs. See how Opentelemetry uses internal propagation and W3C trace context propagators to connect spans.
Analyze traces and spans from the OpenTelemetry demo, view them in Jaeger UI, inspect trace details and JSON, and explore Grafana dashboards to visualize service performance.
Explore how logs record system events, how Opentelemetry collects logs from existing frameworks, and how it enhances them to correlate with traces for deeper observability.
Create Python service logs with OpenTelemetry, using a development state for Python logs, configure the OTLP log exporter and batch log processor, and verify entries in the collector logs.
Analyze OpenTelemetry log data from a demo, sent to OpenSearch and visualized in Grafana, exploring log records, span IDs, and trace IDs, and service metrics.
Explore how zero code and code-based instrumentation work together with the OpenTelemetry agent to auto-initialize the SDK, generating traces, metrics, and logs in Java and Python services.
OpenTelemetry baggage helps propagate data like user IDs across services as a key-value context carrier, enriching logs, traces, and metrics for faster troubleshooting.
Create baggage in Java, inject it into requests to Python, then extract it in Python to enrich metrics, traces, and logs visible in Jaeger.
Explore the four core telemetry signals—metrics, traces, logs, and baggage—with profiling introduced as the emerging fifth, and learn how meters, instruments, and spans connect across services via context propagation.
Explore the OpenTelemetry architecture, separating the API contract from the SDK implementation, and see how a missing SDK yields a no-op and no telemetry.
Explore the OpenTelemetry SDK architecture with three providers (meter, tracer, log), their plugin interfaces, and common exporters and samplers to customize metrics, traces, and logs.
Explore how the three independent OpenTelemetry providers export metrics via OTLP gRPC, traces to Zipkin, and logs to separate endpoints, all assembled into a global SDK.
Configure the OpenTelemetry SDK using code, environment variables, or declarative YAML files, with programmatic configuration defining service name and endpoints and taking precedence over declarative and env var defaults.
Pin exact OpenTelemetry API or SDK versions and upgrade components one at a time to keep production stable. Track development versus stable signals (alpha, beta, rc) and review releases quarterly.
Explore the OpenTelemetry api and sdk separation, no-op behavior, and composable providers for metrics, traces, and logs, with configurable setup and otlp foundations.
Understand how OpenTelemetry exports telemetry data to a backend using otlp via the collector from instrumented applications. Recognize otlp as the protocol created by OpenTelemetry and preview collector topics.
Explore otlp, the opentelemetry protocol for encoding, transporting, and delivering telemetry data from apps to observability backends, via gRPC or http 1.1, with collectors and intermediaries.
Recap the core standards of open telemetry: OTLP enables efficient data transport (gRPC or HTTP 1.1) and semantic conventions standardize attributes, names, and meanings across signals.
Explore OpenTelemetry collector as an intermediary between data sources and observability backends, deployable as an agent or gateway, enabling offload and multi-format ingestion via receivers, processors, and exporters in pipelines.
Install the OpenTelemetry collector by choosing core or contrib, then install on Windows or via Docker; learn how the config file defines functionality and ports 4317 and 4318.
Explore how the OpenTelemetry collector's configuration file organizes receivers, processors, exporters, and extensions, and how pipelines connect them for metrics, logs, and traces.
Explore how OpenTelemetry collector processors receive data from receivers, apply rules to add, modify, rename fields, batch data for efficient transmission, and limit memory usage.
Exporters in the OpenTelemetry collector forward processed data to backends such as standard out, an OTLP endpoint like Jaeger, or Prometheus, and require configuration.
Explore how the service section enables components defined in receivers, processors, exporters, and extensions, and configure multi-component pipelines that route traces, metrics, and logs to multiple destinations.
Set up monitoring for the MySQL database with the OpenTelemetry collector’s MySQL receiver, configure endpoint and credentials, and verify metrics in logs.
Explore open telemetry collector deployment patterns, including agent, gateway, and the combined approach. Identify when to use each pattern and understand their tradeoffs for local versus centralized processing.
Scale telemetry collectors by vertical tuning—memory limiter, batch processor, and Kubernetes limits—and then expand horizontally behind a load balancer for stateless signals; for tail-based sampling, use a two-tier load-balancing exporter.
Compare sdk exporters and collector exporters in OpenTelemetry to decide when to use simple batching and head-based sampling versus advanced tail-based sampling, data transformation, and routing to multiple backends.
Review how the OpenTelemetry collector acts as an intermediary between data sources and observability backends, managing receivers, processors, exporters, and service pipelines, including deployment patterns and scaling strategies.
Learn to monitor and debug the telemetry pipeline, diagnosing context propagation failures, collector issues, data loss, and evolving schemas to keep OpenTelemetry observability healthy.
Identify context propagation failures by looking for fragmented traces where spans lack a shared trace ID, and fix mismatched propagators, header stripping, or missing trace parent header injection.
Master debugging tools for the OpenTelemetry collector, including the debug exporter, validate, internal telemetry, and zPages, and follow a left-to-right checklist across receivers, processors, and exporters to diagnose data flow.
Explore how OpenTelemetry handles error handling and delivery failures, including retries, sending queues, persistent queues, and backpressure to prevent data loss.
Schema management uses a schema URL, schema files, and the schema translate processor to normalize telemetry across mixed semantic convention versions, enabling gradual instrumentation upgrades.
Recap: the observability pipeline is a careful system; trace data flows from SDK to backend, facing propagation issues, collector tools, error handling, backpressure, and schema evolution.
Transform telemetry into outcomes by applying SRE principles, including SLIs, SLOs, SLAs, and error budgets, to drive reliability and actionable telemetry for OTCA prep.
Trace the origins of site reliability engineering at Google in 2003 and how publishing external reliability targets shaped DevOps, SRE practices, and observability.
OpenTelemetry serves as the SLI data source, enabling latency and error-rate SLIs through semantic conventions and service attributes to drive reliable SLOs across environments.
Recap site reliability engineering origins and how SLIs, SLOs, and error budgets turn telemetry into reliable, user-centric outcomes using OpenTelemetry as the data source.
Build real OpenTelemetry skills you can apply from day one - from observability fundamentals to Collector configuration, with OTCA prep included.
OpenTelemetry has become the industry standard for observability and with the Linux Foundation's OTCA certification, there's now a formal way to prove your expertise. This course is built to help you do both understand OpenTelemetry deeply and prepare for the certification exam with confidence.
Designed for developers, SREs, and operations engineers with little to no prior observability experience, this course takes you from the fundamentals all the way through to the architecture and configuration knowledge the OTCA exam expects.
What You'll Learn
Observability Foundations: What observability is, how it differs from monitoring, reliability metrics (MTTD, MTTR, MTBF), and SRE principles including SLIs, SLOs, and error budgets
OpenTelemetry Core Concepts: The API/SDK split, the three instrumentation approaches (zero-code, code-based, and libraries), and how they fit together
All Four Signals, Hands-On: Metrics, traces, logs, and baggage — implemented from scratch in both Java and Python using a purpose-built demo app
SDK Architecture & Composability: Providers, plugin interfaces, samplers, processors, exporters, and the three ways to configure the SDK
Semantic Conventions & OTLP: The standards that make OpenTelemetry data portable across tools and vendors
The OpenTelemetry Collector: Receivers, processors, exporters, extensions, pipelines, deployment patterns (Agent, Gateway, Kubernetes DaemonSet/Sidecar), and scaling strategies
Debugging & Maintenance: Context propagation failures, Collector pipeline debugging, error handling, data loss prevention, and schema management
Emerging Signals: A preview of Profiling and where OpenTelemetry is heading next
Hands-On From the Start
You'll work with two real applications throughout the course:
hello-telemetry: A simple Java + Python + MySQL app I built specifically for this course, so you can see every instrumentation concept in isolation, without distraction
OpenTelemetry Demo: The official community-maintained microservices demo, so you can see OpenTelemetry at production scale
Every concept is paired with hands-on exercises, and the code is available on GitHub so you can follow along at your own pace.
Who This Course Is For
Developers who want to instrument their applications properly and understand what happens under the hood
SREs and operations engineers responsible for observability pipelines and the OpenTelemetry Collector
Anyone preparing for the OpenTelemetry Certified Associate (OTCA) exam
Teams evaluating OpenTelemetry for adoption in their organisation
Ready to get started? Enrol today and take the first step towards mastering OpenTelemetry and earning your OTCA certification.