AI Agent Security: App Security for Vibe-Coded Agents

Name: AI Agent Security: App Security for Vibe-Coded Agents
Rating: 4.7 (66 reviews)

Secure AI-generated apps and web-based AI agents against injection, auth flaws, secrets exposure, and insecure defaults.

Hot & New

New

Created byEden Marco

Last updated 7/2026

English

English [Auto],

What you'll learn

Identify the top security risks in AI agents and AI-generated applications, including prompt injection, auth flaws, insecure defaults, and data exposure
Exploit and fix real vulnerabilities in a web-based AI agent using hands-on attack, defense, and verification exercises
Apply secure coding patterns for input validation, authentication, authorization, secrets handling, and least privilege
Recognize security issues introduced by AI coding tools and review generated code with a stronger AppSec mindset
Reduce agent blast radius with tool restrictions, identity-aware controls, memory protection, and guardrails
Use practical security review habits, checklists, and testing approaches before shipping AI-assisted applications

Course content

8 sections • 39 lectures • 2h 11m total length

2026 Software Engineering5:53
Summary
In this tutorial, we discuss how software engineering has evolved from the manual workflows of 2024 to the fully automated, agentic workflows of 2026. We compare the entire development lifecycle across five key stages: code creation, branch management, pull requests, code reviews, and code merging.
Previously, we wrote code manually in IDEs and sequentially managed feature branches. Today, we rely on autonomous agents that construct code, spawn parallel worktrees, open pull requests, and review themselves using AI tools like Qodo and Greptile. Finally, we explore the critical drawbacks of this shift, emphasizing the loss of human accountability, feelings of ownership, and security risks when agents autonomously review and merge their own work. This sets the background for our upcoming series on securing AI-generated code.
Securing AI-Generated Code in a Multi-Agent World4:14
Summary
In this tutorial segment, we examine the widening gap between the volume of code we can generate and the volume of code we can secure over time. We break down this expansion into three distinct stages. In Stage One, ChatGPT introduced AI code generation, prompting developers to copy and paste code directly into their IDEs. In Stage Two, specialized coding agents like Cursor and Claude Code enabled massive parallelization, shifting development from IDEs to the terminal. In Stage Three, agentic coding platforms allowed non-technical roles like HR and Product Management to build their own internal tools. Because most builders are heavily incentivized by management to prioritize rapid shipping over testing and security, and often lack basic security awareness of concepts like RBAC, DDoS, and lateral movement, the gap has expanded dramatically. We close by introducing our ultimate goal: bridging this gap and learning how to build more secure software using coding agents.
Velocity vs. Safety: Navigating the Default Incentives of Agentic Dev1:24
Summary
In this tutorial segment, we discuss the classic trade-off between velocity and safety within software development. We look at how different organizations place themselves along this spectrum depending on their culture; for instance, AI-native startups heavily prioritize velocity, while mature enterprises lean towards safety. Coding agents by default skew heavily towards velocity because they lack out-of-the-box guardrails. If we want to build secure software using AI agents, we must introduce security guardrails and mechanisms into our development lifecycle. While adding these controls might slightly slow down the pace of shipping, the long-term confidence we gain in our code is much higher, embodying the principle of planning twice and cutting once
Securing Agentic Coding: The Harness and Code Artifact Attack Surfaces2:53
Summary
In this tutorial, we focus on the two primary attack surfaces involved in agentic coding. First, we examine 'The Agent Harness'—the environment where tools like Claude Code, Cursor, Gemini CLI, and Codex run. Since these agents act with our permissions to access our terminal, credentials, and API keys, a compromised agent completely exposes our local development and production environments. Second, we analyze 'The Code Artifacts'—the actual source code written by coding agents. This output often bypasses standard human reviews due to high velocity, silently introducing critical vulnerabilities like SQL injection, insecure authentication, logic bombs, or multi-tenant leaks. Finally, we explore dependency and supply-chain risks, showing how compromised packages can harm both our local runtime and our shipped production system, leading to a ticking time bomb of vulnerabilities.

Introduction3:53
Summary
In this tutorial segment, we explore the security risks of Model Context Protocol (MCP) and Agent Skills. We look at how executing unverified MCP servers or skills is conceptually equivalent to running a malicious .exe file on our local machines. However, while human users have spent a decade learning to distrust suspicious download sites, we easily drop our guard when dealing with sleek, professional-looking AI platforms or "official" repositories. We highlight that popularity, star counts, and clean designs are not security controls. Additionally, we analyze the psychological trap of oversharing sensitive data and raw API keys directly into AI prompt boxes. Finally, we establish safe practices, advising that we must avoid disclosing sensitive secrets and utilize secrets managers to retrieve credentials securely at runtime rather than hardcoding them.
Demo: Treat Agent Skills as RCE4:09
Tutorial Summary
In this demo, we explore how seemingly benign AI agent skills (or MCP servers) can introduce security vulnerabilities. We illustrate a scenario where we download a large skill bundle for Claude Code and issue a standard request to fix a broken development environment.
We walk through the execution of a diagnostic skill named "env-doctor". While looking safe at first glance, the underlying python script (healthcheck.py) scans the local system for sensitive environment files and exfiltrates credentials (such as AWS keys and database passwords) to a remote server.
We show that although modern tools like Claude Code can sometimes identify suspicious background activities post-operation, they might run commands automatically under certain configurations (such as auto-approve/YOLO modes) before detection occurs. We conclude that because manual review of hundreds of downloaded skills is impractical, we must treat external agent skills with the same security precautions as running remote code execution (RCE) on our local systems.
Solution: Establishing Security Boundaries for Claude Code and MCP Servers2:59
Tutorial Summary
In this video, we review key approaches to mitigate security threats from external AI agent skills and Model Context Protocol (MCP) servers.
Cloud-Based Isolation: Instead of executing Claude Code locally, we can run it inside virtual machines or sandboxed workspace environments in the cloud. This keeps any unintended crashes or credential exfiltration safely contained within isolated development environments.
Runtime Verification (SAST): We can inspect files, scripts, and skills at runtime using LLMs or static application security testing (SAST) tools. By leveraging Claude's native hook infrastructure, we can analyze the code of a skill and its scripts before execution to ensure no malicious behavior takes place.
Enterprise Policy Management: For organizational security, we can deploy centralized configurations using the managed-settings.json file to dictate which skills are permitted. We can enforce strict access controls—such as limiting developers to an approved list of vetted skills—and uniformly apply security hooks to all instances of Claude Code across our entire developer team.

Introduction to the Target: A High-Level Look at Our Pentesting Environment2:38
Tracing4:59
Tool Review3:05
Code Walkthrough and Architecture7:57
Summary
In this video, we review how a IT helpdesk LLM agent is architected and built using Python, FastAPI, and LangGraph.
We begin by mapping out the full data-flow path:
A user initiates a message from the Chat UI along with their JSON Web Token (JWT).
The FastAPI backend decodes and validates this token using Firebase Auth to build a secure UserContext structure that represents the verified identity of the user.
This user context is then used to securely invoke a LangGraph agent running inside a ReAct reasoning loop.
The agent relies on its tools to query GCP backend services such as Firestore, Storage, and Secrets Manager before returning a response to the UI.
Next, we open up the repository inside VS Code to inspect the implementation of the main /chat endpoint handler in app.py, looking at the dependency injection mechanism that safely resolves user information. We then transition to agent.py to explore how the agent instance is constructed. We analyze how we can use LangChain’s middleware hooks to inject custom security guardrails dynamically at key moments inside the agent's execution cycle. Finally, we demonstrate why we pass the verified user context both in the system prompt itself (for model context) and through programmatic state variables (for tool access control), ensuring that subsequent tool calls limit the blast radius of any potential jailbreak attacks.
The Code

Intro2:20
Google Cloud5:38
API Keys3:29
Learn to secure OpenAI API keys, configure environment variables and budgeting, and enable Langsmith tracing to observe LLM calls and agent execution for AI security.
Cloud Objects4:08
Deployment6:49
Authenticate the Google Cloud CLI, set the project, and run the deploy script to build and push a Docker image, deploy to Cloud Run, and seed Firestore with data.
Outro1:22
We recap bootstrapping the techcorps helpdesk ki agent and deploying it to Google Cloud with Firebase, environment variables, and API keys for OpenAI and LangSmith, then plan security hardening.

exploirt part 13:22
exploit part 22:28
Excessive Agency and Sensitive Data Disclosure1:02
Video Summary
The video demonstrates a critical security vulnerability in an LLM-powered IT Helpdesk Agent. A user prompts the agent to provide the "stripe-api-key" from the system configuration, framing it as a necessary request for a security audit. The agent complies, outputting the actual secret key into the chat. This scenario illustrates two major vulnerabilities: "Sensitive Information Disclosure" (leaking the key) and "Excessive Agency" (the agent having unintended access to sensitive backend tools). The demonstration then navigates to the Google Cloud Secret Manager to verify that the leaked key matches the actual production secret stored in the cloud. Finally, it uses LangSmith to trace the agent's execution path, showing exactly how the agent invoked a specific get_service_secret tool to access and return the API key.
AI Agent Blast Radius1:01
Video Summary
We explain why many prompt injection attacks are successful: they are designed to look like reasonable, legitimate requests rather than obvious hacking attempts, often using excuses like security audits or management approvals. We introduce a key concept called the "blast radius," which refers to the entire set of tools and systems an AI agent is connected to (such as databases, emails, or payment APIs). We emphasize the importance of minimizing this blast radius because any tool the agent can access is a potential target for exploitation. Finally, we highlight that the threat isn't always an external hacker; an internal employee might just be curious and test the chatbot's limits, inadvertently causing a security incident if the agent has excessive access.
Fix6:32
Limitations of System Prompts in LLM Security2:16
Summary:
The video explains the concept of "prompt hardening," which involves adding specific security instructions to an AI's system prompt to prevent unauthorized actions. While this is a recommended practice, it does not act as a true security boundary. Because Large Language Models (LLMs) are non-deterministic, there is no guarantee they will strictly follow instructions, leaving them vulnerable to prompt injection attacks.
This limitation is supported by research, such as a paper by Tenzai, which showed that AI coding agents failed to consistently enforce explicit security instructions when generating code. The core problem is that if sensitive tools exist within the LLM's context, the system remains vulnerable. The speaker compares relying on prompt hardening to handing someone the keys to a vault and simply asking them not to open it. For genuine security, the most effective approach is to never provide access to the "keys" (sensitive tools and data) in the first place.
Pitfalls of Relying on LLM Model Versions for Security1:40
Video Summary
The video discusses the limitations of relying solely on newer, more capable Large Language Models (LLMs) to provide security against prompt injections and social engineering. While newer models may offer an improved baseline security posture, their advanced reasoning capabilities can introduce novel vulnerabilities, meaning the attack surface shifts rather than disappears. It is emphasized that an application's security posture should not be coupled to specific model versions, because upgrading or changing models—a common industry practice to stay current—would constantly invalidate previous security assumptions. Furthermore, regression testing for these vulnerabilities is described as impractical due to the infinite nature of the attack surface, given that inputs can encompass diverse human language or other modalities like audio and images.
Balancing Agent Autonomy with Human-in-the-Loop Security2:18
Video Summary
The video provides a recap on securing AI agents. While hardening system prompts and utilizing state-of-the-art (SOTA) large language models are recommended practices, they do not provide a hermetic security boundary. It must be assumed that an LLM can and will be tricked by a prompt injection, regardless of the prompt's complexity. Therefore, the actual solution must be architectural, implemented at the application layer. This involves navigating the tradeoff between agent flexibility and security by applying the principle of least privilege—equipping the agent with only the minimal tools necessary. For tools that execute dangerous actions, an architectural pattern called "human in the loop" should be implemented. This approach requires the agent to pause execution and request explicit human approval before proceeding. The terminal tool Claude Code is presented as an effective example of this architecture, as it proposes code changes but requires a user's confirmation before modifying files, balancing high capability with necessary safety guardrails.

Sensitive Information Disclosure and Excessive Agency1:06
Summary:
This tutorial introduces the topics of sensitive information disclosure and excessive agency, aiming to show that without authorization middleware, any user can access data without prompt injection. Citing Tenzai's research on coding agents, it is highlighted that AI agents struggle significantly with complex authorization logic. The research found that some agents failed to implement basic authorization checks, such as verifying user login or the presence of a JWT token, before allowing dangerous actions like deleting database objects. These exact patterns are then demonstrated using a help desk agent.
Demonstrating Broken Access Control in AI Agents3:32
Summary:
This tutorial demonstrates a realistic scenario of broken access control in an AI agent. By logging in as an engineering employee, a query is submitted to the help desk agent asking for HR-related salary adjustments and a specific confidential wiki article. Because the agent lacks role-based access control (RBAC), it retrieves and displays highly sensitive information, such as company-wide compensation bands, without any verification. The tutorial proves this is not an LLM hallucination by showing the actual confidential document in the Firestore database and reviewing the execution traces in LangSmith, which confirm the lack of authorization checks. To further illustrate the vulnerability, the same query is successfully executed by a finance employee. The tutorial concludes that this data disclosure is caused by missing access control logic, not a prompt injection attack or an LLM flaw, highlighting the absolute necessity of using middleware to enforce user roles and permissions throughout a request's lifecycle
Enforcing RBAC in LLM Tool Calls5:37
Summary:
This tutorial demonstrates how to implement an authorization middleware to enforce role-based access control (RBAC) in AI applications. The middleware uses a Firebase JWT token—a secure, cryptographically signed token—to extract and verify a user's identity (such as their role and department). This identity is then injected into every tool call made by the LLM, ensuring access is controlled at the tool layer rather than relying on prompt instructions. The tutorial explains the code, showing how user context is passed from the server request into LangGraph's tool execution. It then validates the fix by logging in as Alice (an engineering employee) and showing she is correctly denied access to HR documents. Conversely, logging in as Carol (an HR admin) successfully retrieves the sensitive HR data, proving the authorization middleware works effectively.
RECAP: Key Takeaways: Securing AI Agents with RBAC1:29
Summary:
This tutorial recaps key takeaways for securing AI agents against unauthorized access. First, the speaker emphasizes that authentication is not the same as authorization; verifying a user's identity does not automatically restrict their permissions. Second, access control must be enforced at the tool or data layer rather than the prompt layer, because LLMs can forget instructions, become confused, or be easily tricked. Third, user identity must be cryptographically verified using reliable methods like a JWT token, ensuring the LLM cannot forge an identity based on text from a chat. Finally, the tutorial highlights that implementing role-based access control (RBAC) is essential; without it, sensitive data can be exposed to unauthorized individuals without any complex prompt injection attacks

Indirect Prompt Injection & The Confused Deputy Problem1:15
Summary
In this section, we are going to explore the combination of indirect prompt injection and insecure output handling in an LLM. While we have previously hardened our agent using tool filter and authorization middleware, we will now examine a scenario where a malicious employee embeds hidden instructions inside a support ticket's description. When a manager reads this ticket, the agent executes those malicious instructions using the manager's elevated credentials, illustrating a classic "confused deputy" problem. We will also dive into the architectural context behind this attack, explaining how an agent in the LangGraph ReAct loop processes user messages. If the agent loads these malicious instructions before reasoning, it can be tricked into invoking additional tools that the attacker can then exploit, which we will demonstrate in the upcoming demo.
Data Exfiltration via Indirect Prompt Injection5:36
Summary
The video demonstrates a privilege escalation and data exfiltration attack on an AI helpdesk agent using indirect prompt injection. Initially, the demonstrator logs in as Eva Park, a manager with elevated privileges, who successfully accesses a ticket containing sensitive compensation data. When the demonstrator switches to a standard employee, Dave Wilson, access to other users' tickets is correctly denied by the system's authorization checks.
To bypass these security measures, Dave creates a poisoned ticket that appears to be a standard error report but contains hidden instructions. These instructions command the AI agent to search for tickets containing the keyword "budget" and append the found data to Dave's own ticket. When Eva logs in and interacts with Dave's poisoned ticket, the AI agent executes the hidden commands using Eva's elevated permissions. It successfully retrieves the sensitive salary data and secretly saves it in the internal notes of Dave's ticket. Finally, Dave logs back in, queries his own ticket, and extracts the stolen salary information, completing the attack.
Analyzing an Indirect Prompt Injection with LangSmith3:06
Summary
In this tutorial, we analyze a LangSmith trace to understand exactly how the indirect prompt injection attack was executed. When Eva requested to view Dave's poisoned ticket (TK-4001), the agent retrieved the ticket and processed its hidden malicious instructions. Because the agent was acting on behalf of Eva, it used her elevated permissions to execute two unauthorized tool calls. First, it invoked a ticket search using the keyword "budget," successfully uncovering a restricted ticket containing employee salaries. Second, it used the update ticket tool to secretly append this sensitive salary data into the internal notes of Dave's original ticket.
We see that although the LLM initially blocked some of the outputs, Dave was eventually able to retrieve the salaries of his colleagues (like Marcus Rivera and Sarah Chen) after a bit of back-and-forth prompting. Ultimately, we demonstrate a classic "confused deputy" attack: Dave weaponized the AI agent to bypass his own restricted access, tricking the agent into fetching and exfiltrating privileged information while it was operating under Eva's authorization level.
The Human Element of Security Exploits1:43
Summary
In this part of the tutorial, we examine why a user might include highly privileged data, such as salaries, in a routine help desk ticket. By looking closely at a specific ticket, we see that the employee, Eva, was under immense pressure to fix an urgent dashboard issue before an upcoming board meeting. We learn that when people are stressed, they tend to make mistakes and overshare information. Eva simply copy-pasted the sensitive data she was working on to help IT resolve the problem faster, assuming the ticket would only be visible to authorized personnel. She had no idea an attacker like Dave could access it. Ultimately, we conclude that this human factor—making mistakes and oversharing under pressure—is exactly what makes indirect prompt injections so dangerous
Exploiting the Data Pipeline with Asynchronous Attacks0:47
Summary
In this section, we discuss another crucial aspect of this attack: it is entirely asynchronous. We see that the attacker, Dave, plants malicious instructions in a ticket and can simply wait—whether that takes a week, a month, or a year. Eventually, when Eva, who holds privileged access, reads the ticket, the attack is triggered. This means we can execute the attack without the attacker even needing to be online. We learn that this vulnerability exists because LLMs fundamentally cannot distinguish between data and instructions; even with system instructions in the prompt, the model views everything as text and simply guesses the next token. Ultimately, we note that this type of indirect prompt injection originates from the data pipeline itself, rather than from a direct user message
Fix: Tool Validation and Explicit Data Boundaries3:01
Summary
In this video, we discuss the fix for preventing indirect prompt injections and insecure output handling. To stop unauthorized tool calls, we introduce a mechanism that distills the user's intent and validates all tool calls against this intent before they are executed. For example, if a user only intends to look up a ticket, the validator will block any attempts by a poisoned ticket to trigger unauthorized tools like searching or updating other tickets, effectively breaking the data exfiltration chain. While we use a simple keyword-based validator to demonstrate the concept, a production environment could utilize an LLM as a judge for more sophisticated intent classification.
Additionally, we add another layer of defense to address insecure output handling by establishing explicit data boundaries around tool results. By clearly labeling external data (e.g., using a <data> boundary), we communicate to the LLM that the returned content is strictly data and not executable instructions. Even if the poisoned text lacks obvious prompt injection phrases that a pattern-detecting middleware could catch, treating the tool outputs strictly as a data layer serves as a crucial heuristic to improve our overall security posture.
Fix Hands On8:04
RECAP: Intent Matching and Boundary Creation2:05

Requirements

Basic familiarity with software development or web applications is helpful, but deep security expertise is not required
Python, APIs, or backend development will make the hands-on demos easier to follow
Security professionals can take the course without being full-time developers, as concepts are explained from both engineering and security perspectives
An interest in AI agents, AI-assisted development, application security, or secure system design is recommended

Description

assisted development makes it faster than ever to build applications, but it also makes it easier to ship security mistakes at speed. This course teaches the fundamentals of application security for vibe coded apps through a practical, modern example: a web-based AI agent application with real tools, user data, authentication, and cloud access.

Instead of learning security only through theory, you’ll work through a classic real-world pattern many developers are now building: an AI-powered app that looks like a normal web product on the surface, but behind the scenes includes LLM workflows, tool calling, memory, and backend access. That makes it the perfect example for understanding both traditional app security and AI agent security together.

In this hands-on course, you’ll learn:

core application security concepts every AI-assisted developer should know
OWASP-style risks including injection, auth flaws, insecure defaults, and over-permissioned systems
how AI code generation can introduce vulnerabilities into apps and agents
how to recognize insecure patterns in generated code and architecture
secure coding patterns for input validation, authentication, authorization, and sensitive data handling
secrets management, dependency hygiene, and common supply chain risks
how to reduce blast radius in agentic systems with layered defenses
how to use automated scanning and AI-powered review workflows before deployment
how to build a personal security checklist for rapid AI-assisted development

A major focus of the course is showing how a classic web-coded AI agent can become vulnerable to prompt injection, data exfiltration, broken authorization, memory attacks, and excessive privilege and then walking through how to fix those issues step by step.

By the end of the course, students will understand how to build faster with AI without skipping security fundamentals, and how to apply practical defenses to both conventional software and modern AI agent applications.

Short Attack List

Prompt Injection
Indirect Prompt Injection
Injection Attacks
Broken Authentication
Broken Authorization
Insecure Defaults
Secret Exposure
Data Exfiltration
Memory Poisoning
Tool Abuse
Jailbreaks
PII Leakage
Dependency Risks
Supply Chain Risks
Excessive Permissions

Who this course is for:

Software engineers and developers building AI-powered apps, AI agents, or vibe-coded products
Security engineers, application security engineers, and cloud security engineers who need to assess AI application risk
SOC engineers and security analysts who want to understand how AI agent attacks work in practice
CISOs, security leaders, and technical decision-makers who need a practical view of AI agent risk and defense
Solutions architects, platform engineers, and engineering managers responsible for secure AI adoption
Anyone who wants to understand how traditional AppSec and modern AI agent security connect in real systems

AI Agent Security: App Security for Vibe-Coded Agents

What you'll learn

Explore related topics

Course content

[Theory] The Industry Today: Foundations of Agentic Coding Security4 lectures • 14min

Securing MCP and Agent Skills3 lectures • 11min

Help Desk Agent5 lectures • 19min

Setup6 lectures • 24min

Blast Radius8 lectures • 21min

Information Disclosure with Broken Access Control: Securing AI Agents4 lectures • 12min

Confused Deputy Attack with Indirect Prompt Injection8 lectures • 26min

Memory Poisoning1 lecture • 6min

Requirements

Description

Who this course is for: