
Explore the enterprise value of Datadog observability, set up LLM observability, and instrument LLM applications with hands-on tracing of AI workflows, including quality, cost, security, compliance, and production patterns.
Understand why LLM observability is essential for enterprise production, addressing non-deterministic AI, cost, security, and performance with real-time traces and dashboards.
Explore Datadog LLM observability’s four core capabilities—tracing, evaluations, experiments, and cost monitoring—for debugging, quality checks, AB testing, and budget-aware production insights.
Sign up for the Datadog free trial, configure a Python LLM app, and load environment variables to run and trace a chat completion in production.
Explore span types and SDK integrations to observe LLM traces, including LLM, workflow, agent, tool retrieval, and embedding spans, and review duration, input tokens, and costs with the Python SDK.
Instrument direct llm api calls with custom spans and annotations to build multi-step pipelines and enterprise observability using llm obs decorators and metadata tags.
Instrument multi-step LLM workflows in production by tracing embedding, retrieval, context assembly, and generation, enabling nested spans and detailed latency, metadata, and error visibility.
Demonstrates a hands-on rag pipeline with full observability using Datadog LLM Observability, including ChromaDB vector store, cosine similarity, embeddings, retrieval, and LLM-generated responses.
Demonstrates LangChain RAG pipeline auto-instrumentation with zero-code tracing, using a vector store, embeddings, and a chat prompt template to query enterprise docs with Datadog LLM observability.
Trace and monitor non-deterministic AI agents in production with Datadog, visualizing decision paths, tool calls, and multi-agent orchestration through a hands-on customer support agent example.
Explore the orchestrator plus workers pattern in enterprise ai, with a pipeline of specialized agents (research, analysis, generation, validation) that plan, execute, and synthesize results, plus end-to-end tracing.
Explore common agent debugging scenarios like infinite loops, wrong tool selection, and latency spikes, and use state updates, termination conditions, step limits, prompt refinement, and observability to trace issues.
Create and manage data sets, run LLM experiments with evaluators, and compare results in Datadog LLM Observability to make educated deployment decisions.
Create a golden evaluation set for production support data sets in LLM experiments, covering easy baseline, hard, adversarial, and off topic categories, with metadata labels to filter experiment results.
Learn to run end-to-end LLM experiments in Datadog by pairing a dataset, a task, and evaluators including contains key info, semantic similarity, and safety checks, then compare results in dashboards.
Demonstrates A/B testing prompts to compare concise versus empathetic variants using a designed data set, evaluating empathy scores, actionability, and semantic similarity to pick the better prompt.
Configure and manage automated quality evaluations for LLM outputs using Datadog observability, building datasets from production traces and running experiments before deployment. Track metrics such as toxicity, topic relevancy, failure to answer, and completions with zero configuration, dashboards, and alerts across OpenAI, Azure OpenAI, and Google Cloud Vertex AI.
Create an evaluation in LM Observability by naming the eval, attaching an LM account, selecting GPT-4 mini, configuring a system prompt and input variables, and set pass criteria for monitoring.
Identify three cost optimization strategies—model selection, prompt optimization, and caching—and use Datalog cost monitoring to measure savings and set alerts for budget thresholds.
Adopt enterprise-ready patterns for production LLMs, including SOC 2, HIPAA, GDPR, PII reduction, and audit trails, with optional PII scrubbing via a sensitive data scanner before or after processing.
Set up and test a PII scrubber using regex to redact emails and credit cards, then scrub inputs before LLM annotation and verify redacted data appears in Datadog traces.
Explore the sensitive data scanner in organization settings, configure code and storage scanning, connect to GitHub, GitLab, and Azure DevOps, and manage scanning rules and groups for llm observability.
Learn hands-on how to configure a Datadog llm observability data scanner, build a custom pii redaction group, and verify redactions for ssn, passport, and emails in production dashboards.
Design secure, compliant llm apps with a pii scrubber and security monitor to detect prompt-injection patterns and scrub data before responses, using Datadog dashboards.
Explore a quick production deployment architecture for LLM observability in Datadog, where the application uses the DTrace SDK or agentless mode and enables APM correlation across traces, logs, and metrics.
Deploy to staging, run prompt experiments, and set alerts to validate your observable ai workflow with datadog llm observability, including tracing, evaluations, cost monitoring, and security patterns.
Are your LLM applications running blind in production?
You've deployed an AI agent, a RAG pipeline, or an LLM-powered chatbot.
But can you answer these questions?
How much did that runaway agent loop cost before someone noticed?
Why did hallucination rates spike last Tuesday?
Which step in your RAG pipeline is returning irrelevant documents?
How do you prove to compliance that you're protecting customer PII in LLM conversations?
If you can't answer these questions with data, you have a production problem.
Traditional APM tools see your LLM as a black box. They measure latency and error rates, but they can't show you token flows, prompt effectiveness, or quality degradation.
LLMs are fundamentally different—non-deterministic, multi-step, token-priced, and quality-sensitive.
You need LLM-native observability.
Introducing Datadog LLM Observability
This course is the definitive guide to Datadog's LLM Observability platform for enterprise teams.
If you're already using Datadog for APM, infrastructure, or security, this integrates directly into your existing stack—no new tools to learn, no separate dashboards to monitor.
What you'll build:
Throughout this course, you'll instrument a production-grade Customer Support AI Agent with:
Multi-turn conversation tracing
Tool integration (order lookup, refund processing)
Custom quality evaluations
Cost monitoring dashboard
PII scrubbing compliance
This isn't a toy example—it's the architecture real enterprise teams deploy.