
Video Summary
The video demonstrates a critical security vulnerability in an LLM-powered IT Helpdesk Agent. A user prompts the agent to provide the "stripe-api-key" from the system configuration, framing it as a necessary request for a security audit. The agent complies, outputting the actual secret key into the chat. This scenario illustrates two major vulnerabilities: "Sensitive Information Disclosure" (leaking the key) and "Excessive Agency" (the agent having unintended access to sensitive backend tools). The demonstration then navigates to the Google Cloud Secret Manager to verify that the leaked key matches the actual production secret stored in the cloud. Finally, it uses LangSmith to trace the agent's execution path, showing exactly how the agent invoked a specific get_service_secret tool to access and return the API key.
Video Summary
We explain why many prompt injection attacks are successful: they are designed to look like reasonable, legitimate requests rather than obvious hacking attempts, often using excuses like security audits or management approvals. We introduce a key concept called the "blast radius," which refers to the entire set of tools and systems an AI agent is connected to (such as databases, emails, or payment APIs). We emphasize the importance of minimizing this blast radius because any tool the agent can access is a potential target for exploitation. Finally, we highlight that the threat isn't always an external hacker; an internal employee might just be curious and test the chatbot's limits, inadvertently causing a security incident if the agent has excessive access.
Summary:
The video explains the concept of "prompt hardening," which involves adding specific security instructions to an AI's system prompt to prevent unauthorized actions. While this is a recommended practice, it does not act as a true security boundary. Because Large Language Models (LLMs) are non-deterministic, there is no guarantee they will strictly follow instructions, leaving them vulnerable to prompt injection attacks.
This limitation is supported by research, such as a paper by Tenzai, which showed that AI coding agents failed to consistently enforce explicit security instructions when generating code. The core problem is that if sensitive tools exist within the LLM's context, the system remains vulnerable. The speaker compares relying on prompt hardening to handing someone the keys to a vault and simply asking them not to open it. For genuine security, the most effective approach is to never provide access to the "keys" (sensitive tools and data) in the first place.
Video Summary
The video discusses the limitations of relying solely on newer, more capable Large Language Models (LLMs) to provide security against prompt injections and social engineering. While newer models may offer an improved baseline security posture, their advanced reasoning capabilities can introduce novel vulnerabilities, meaning the attack surface shifts rather than disappears. It is emphasized that an application's security posture should not be coupled to specific model versions, because upgrading or changing models—a common industry practice to stay current—would constantly invalidate previous security assumptions. Furthermore, regression testing for these vulnerabilities is described as impractical due to the infinite nature of the attack surface, given that inputs can encompass diverse human language or other modalities like audio and images.
Video Summary
The video provides a recap on securing AI agents. While hardening system prompts and utilizing state-of-the-art (SOTA) large language models are recommended practices, they do not provide a hermetic security boundary. It must be assumed that an LLM can and will be tricked by a prompt injection, regardless of the prompt's complexity. Therefore, the actual solution must be architectural, implemented at the application layer. This involves navigating the tradeoff between agent flexibility and security by applying the principle of least privilege—equipping the agent with only the minimal tools necessary. For tools that execute dangerous actions, an architectural pattern called "human in the loop" should be implemented. This approach requires the agent to pause execution and request explicit human approval before proceeding. The terminal tool Claude Code is presented as an effective example of this architecture, as it proposes code changes but requires a user's confirmation before modifying files, balancing high capability with necessary safety guardrails.
Summary:
This tutorial introduces the topics of sensitive information disclosure and excessive agency, aiming to show that without authorization middleware, any user can access data without prompt injection. Citing Tenzai's research on coding agents, it is highlighted that AI agents struggle significantly with complex authorization logic. The research found that some agents failed to implement basic authorization checks, such as verifying user login or the presence of a JWT token, before allowing dangerous actions like deleting database objects. These exact patterns are then demonstrated using a help desk agent.
Summary:
This tutorial demonstrates a realistic scenario of broken access control in an AI agent. By logging in as an engineering employee, a query is submitted to the help desk agent asking for HR-related salary adjustments and a specific confidential wiki article. Because the agent lacks role-based access control (RBAC), it retrieves and displays highly sensitive information, such as company-wide compensation bands, without any verification. The tutorial proves this is not an LLM hallucination by showing the actual confidential document in the Firestore database and reviewing the execution traces in LangSmith, which confirm the lack of authorization checks. To further illustrate the vulnerability, the same query is successfully executed by a finance employee. The tutorial concludes that this data disclosure is caused by missing access control logic, not a prompt injection attack or an LLM flaw, highlighting the absolute necessity of using middleware to enforce user roles and permissions throughout a request's lifecycle
Summary:
This tutorial demonstrates how to implement an authorization middleware to enforce role-based access control (RBAC) in AI applications. The middleware uses a Firebase JWT token—a secure, cryptographically signed token—to extract and verify a user's identity (such as their role and department). This identity is then injected into every tool call made by the LLM, ensuring access is controlled at the tool layer rather than relying on prompt instructions. The tutorial explains the code, showing how user context is passed from the server request into LangGraph's tool execution. It then validates the fix by logging in as Alice (an engineering employee) and showing she is correctly denied access to HR documents. Conversely, logging in as Carol (an HR admin) successfully retrieves the sensitive HR data, proving the authorization middleware works effectively.
Summary:
This tutorial recaps key takeaways for securing AI agents against unauthorized access. First, the speaker emphasizes that authentication is not the same as authorization; verifying a user's identity does not automatically restrict their permissions. Second, access control must be enforced at the tool or data layer rather than the prompt layer, because LLMs can forget instructions, become confused, or be easily tricked. Third, user identity must be cryptographically verified using reliable methods like a JWT token, ensuring the LLM cannot forge an identity based on text from a chat. Finally, the tutorial highlights that implementing role-based access control (RBAC) is essential; without it, sensitive data can be exposed to unauthorized individuals without any complex prompt injection attacks
Summary
In this section, we are going to explore the combination of indirect prompt injection and insecure output handling in an LLM. While we have previously hardened our agent using tool filter and authorization middleware, we will now examine a scenario where a malicious employee embeds hidden instructions inside a support ticket's description. When a manager reads this ticket, the agent executes those malicious instructions using the manager's elevated credentials, illustrating a classic "confused deputy" problem. We will also dive into the architectural context behind this attack, explaining how an agent in the LangGraph ReAct loop processes user messages. If the agent loads these malicious instructions before reasoning, it can be tricked into invoking additional tools that the attacker can then exploit, which we will demonstrate in the upcoming demo.
Summary
The video demonstrates a privilege escalation and data exfiltration attack on an AI helpdesk agent using indirect prompt injection. Initially, the demonstrator logs in as Eva Park, a manager with elevated privileges, who successfully accesses a ticket containing sensitive compensation data. When the demonstrator switches to a standard employee, Dave Wilson, access to other users' tickets is correctly denied by the system's authorization checks.
To bypass these security measures, Dave creates a poisoned ticket that appears to be a standard error report but contains hidden instructions. These instructions command the AI agent to search for tickets containing the keyword "budget" and append the found data to Dave's own ticket. When Eva logs in and interacts with Dave's poisoned ticket, the AI agent executes the hidden commands using Eva's elevated permissions. It successfully retrieves the sensitive salary data and secretly saves it in the internal notes of Dave's ticket. Finally, Dave logs back in, queries his own ticket, and extracts the stolen salary information, completing the attack.
Summary
In this tutorial, we analyze a LangSmith trace to understand exactly how the indirect prompt injection attack was executed. When Eva requested to view Dave's poisoned ticket (TK-4001), the agent retrieved the ticket and processed its hidden malicious instructions. Because the agent was acting on behalf of Eva, it used her elevated permissions to execute two unauthorized tool calls. First, it invoked a ticket search using the keyword "budget," successfully uncovering a restricted ticket containing employee salaries. Second, it used the update ticket tool to secretly append this sensitive salary data into the internal notes of Dave's original ticket.
We see that although the LLM initially blocked some of the outputs, Dave was eventually able to retrieve the salaries of his colleagues (like Marcus Rivera and Sarah Chen) after a bit of back-and-forth prompting. Ultimately, we demonstrate a classic "confused deputy" attack: Dave weaponized the AI agent to bypass his own restricted access, tricking the agent into fetching and exfiltrating privileged information while it was operating under Eva's authorization level.
Summary
In this part of the tutorial, we examine why a user might include highly privileged data, such as salaries, in a routine help desk ticket. By looking closely at a specific ticket, we see that the employee, Eva, was under immense pressure to fix an urgent dashboard issue before an upcoming board meeting. We learn that when people are stressed, they tend to make mistakes and overshare information. Eva simply copy-pasted the sensitive data she was working on to help IT resolve the problem faster, assuming the ticket would only be visible to authorized personnel. She had no idea an attacker like Dave could access it. Ultimately, we conclude that this human factor—making mistakes and oversharing under pressure—is exactly what makes indirect prompt injections so dangerous
Summary
In this section, we discuss another crucial aspect of this attack: it is entirely asynchronous. We see that the attacker, Dave, plants malicious instructions in a ticket and can simply wait—whether that takes a week, a month, or a year. Eventually, when Eva, who holds privileged access, reads the ticket, the attack is triggered. This means we can execute the attack without the attacker even needing to be online. We learn that this vulnerability exists because LLMs fundamentally cannot distinguish between data and instructions; even with system instructions in the prompt, the model views everything as text and simply guesses the next token. Ultimately, we note that this type of indirect prompt injection originates from the data pipeline itself, rather than from a direct user message
Summary
In this video, we discuss the fix for preventing indirect prompt injections and insecure output handling. To stop unauthorized tool calls, we introduce a mechanism that distills the user's intent and validates all tool calls against this intent before they are executed. For example, if a user only intends to look up a ticket, the validator will block any attempts by a poisoned ticket to trigger unauthorized tools like searching or updating other tickets, effectively breaking the data exfiltration chain. While we use a simple keyword-based validator to demonstrate the concept, a production environment could utilize an LLM as a judge for more sophisticated intent classification.
Additionally, we add another layer of defense to address insecure output handling by establishing explicit data boundaries around tool results. By clearly labeling external data (e.g., using a <data> boundary), we communicate to the LLM that the returned content is strictly data and not executable instructions. Even if the poisoned text lacks obvious prompt injection phrases that a pattern-detecting middleware could catch, treating the tool outputs strictly as a data layer serves as a crucial heuristic to improve our overall security posture.
assisted development makes it faster than ever to build applications, but it also makes it easier to ship security mistakes at speed. This course teaches the fundamentals of application security for vibe coded apps through a practical, modern example: a web-based AI agent application with real tools, user data, authentication, and cloud access.
Instead of learning security only through theory, you’ll work through a classic real-world pattern many developers are now building: an AI-powered app that looks like a normal web product on the surface, but behind the scenes includes LLM workflows, tool calling, memory, and backend access. That makes it the perfect example for understanding both traditional app security and AI agent security together.
In this hands-on course, you’ll learn:
core application security concepts every AI-assisted developer should know
OWASP-style risks including injection, auth flaws, insecure defaults, and over-permissioned systems
how AI code generation can introduce vulnerabilities into apps and agents
how to recognize insecure patterns in generated code and architecture
secure coding patterns for input validation, authentication, authorization, and sensitive data handling
secrets management, dependency hygiene, and common supply chain risks
how to reduce blast radius in agentic systems with layered defenses
how to use automated scanning and AI-powered review workflows before deployment
how to build a personal security checklist for rapid AI-assisted development
A major focus of the course is showing how a classic web-coded AI agent can become vulnerable to prompt injection, data exfiltration, broken authorization, memory attacks, and excessive privilege and then walking through how to fix those issues step by step.
By the end of the course, students will understand how to build faster with AI without skipping security fundamentals, and how to apply practical defenses to both conventional software and modern AI agent applications.
Short Attack List
Prompt Injection
Indirect Prompt Injection
Injection Attacks
Broken Authentication
Broken Authorization
Insecure Defaults
Secret Exposure
Data Exfiltration
Memory Poisoning
Tool Abuse
Jailbreaks
PII Leakage
Dependency Risks
Supply Chain Risks
Excessive Permissions