
Explore Dynatrace OneAgent monitoring modes, including full stack, infrastructure, and discovery, and learn to select the appropriate mode for end-to-end monitoring or basic health checks.
Learn to automate Dynatrace agent deployment across multiple servers using Ansible, by setting up an Ansible server on Ubuntu, preparing two VMs, and applying a full agent install playbook.
Learn how to deploy nginx to two virtual machines with Ansible, using an inventory file, ping checks, and a playbook to automate multi-server setup from a single control machine.
https://github.com/Sumanth17-git/aiops_training.git
Learn to use Dynatrace to diagnose production problems and pinpoint high cpu usage patterns, including vm hosted apps, system processes, file and network activity, and traffic spikes.
Learn to debug CPU spikes with Dynatrace using basic and quick approaches, inspect host and process metrics, and pinpoint the root cause in the com.bank.transaction code via method hotspots.
Set up Dynatrace OneAgent on a Kubernetes cluster with Helm, using the downloaded YAML, deploy the agent, and verify Dynatrace pods to monitor the cluster.
Explore full stack monitoring for complete Kubernetes observability with disk and network analysis, compare application and platform monitoring, and learn migration steps via simple YAML changes.
Learn how tagging in Dynatrace organizes services and resources with labels and rules, enabling quick search and analysis of end-to-end applications, illustrated by Paytm’s SBI and HDFC monitoring.
Are you ready to take your performance monitoring and troubleshooting skills to the next level in 2026? In this course, I’ll guide you step-by-step to master Dynatrace using real-world troubleshooting scenarios. Whether you’re an IT professional, developer, or SRE, this course provides practical, hands-on skills you can apply immediately in production environments.
Here’s what you’ll learn:
Detect and fix CPU, memory, database, network, and service performance issues efficiently
Create dashboards, alerts, and anomaly detection rules to prevent incidents before they escalate
Simulate user journeys, analyze real-user behavior, and perform deep end-to-end response time diagnostics
Leverage Davis AI for accurate root-cause analysis, problem correlation, and proactive risk detection
Automate Incident & Problem Management with Jira Integration to accelerate workflows
Build automated remediation pipelines to eliminate repetitive L1 support effort
Implement SLOs, SLIs & SLAs to improve system reliability and business outcomes
Manage cloud observability data using Grail Data Lake, buckets, and secure access control
Parse and enrich logs, and extract metrics through OpenPipeline for advanced analytics
Apply Segmentation vs Management Zones to control visibility and enable team-specific access
Monitor Kubernetes, cloud workloads (AWS, Azure, GCP), databases, and microservices with full-stack insights
I’ll share real-world examples, guided labs, and actionable insights to help you troubleshoot, optimize, and automate applications and infrastructure with confidence. Whether you're managing Kubernetes clusters, mission-critical transactions, or user experience improvements — you’ll gain the expertise that modern SRE teams demand.
Join me on this journey to become a Dynatrace Observability and SRE expert.
Let’s get started today!