Module 14

AI Security Vulnerabilities

Last updated 2026-06-02

Key points

Lesson 1: What is AI Security Vulnerabilities and why it matters

AI security vulnerabilities are flaws in software that attackers can exploit, and they matter immensely for AI development because AI systems are now both creating and finding these flaws at unprecedented speed. According to research cited in the transcripts, 48% of AI-generated code contains security vulnerabilities, and AI coding assistants are writing more code than ever before, expanding the "attack surface" (the total points where an attacker can try to enter or extract data) faster than human teams can review. This means every AI-generated function or autocompleted block is a potential vulnerability needing inspection.

More concerning, advanced AI models have demonstrated the ability to independently discover over 500 high-severity vulnerabilities in production open-source software that humans missed. AI agent traffic has grown roughly 7,800% year-over-year, yet most security teams cannot detect or stop AI agents before they act. As one transcript states, every person with bad intentions now has a tool better at finding exploits than most professional security teams. This creates a dangerous dynamic where AI can both introduce vulnerabilities and exploit them.

For developers, the key takeaway is that critical evaluation skills are essential. Treat AI output like code from a junior developer - review it carefully, test thoroughly, and never assume it's correct. Human review remains essential. AI accelerates, but humans validate. The combination is powerful, but either alone is incomplete. Security tools that reason about code the way attackers do are becoming necessary, but the window where AI helps defenders more than attackers is open right now and may not stay open long.

Sources

Lesson 2: How to use AI Security Vulnerabilities: step-by-step

To use AI security vulnerabilities step by step, start with prompt injection (tricking an AI into following malicious instructions). An injected prompt can make an AI agent steal SSH keys, drain credentials, or exfiltrate your codebase. The scariest part is that traditional scanners miss these attacks — one AI found 500 zero-day vulnerabilities that every other tool failed to detect by simply reading code. You need to isolate your agents.

The concrete fix is Docker Sandboxes (isolated microVMs for AI agents). Run `docker sandbox run claude /your-project-path` to start. This creates a private Docker daemon, file system, and network stack per sandbox. The agent can install packages and spin up containers inside its VM — but it cannot touch your host machine or see your host’s containers. Network isolation prevents an injected agent from phoning home to an attacker. Sandboxes cannot talk to each other or access services on your host’s localhost. An HTTP filtering proxy controls which external endpoints agents reach.

Use `docker sandbox exec <sandbox-name>` to get a bash shell for debugging or installing tools. Workspaces sync bidirectionally at the same absolute path. When done, `docker sandbox remove` cleans everything. This approach supports Claude, Codex, Gemini, and Kira. Traditional containers share the host kernel, creating a kernel escape risk — Docker Sandboxes contain the blast radius completely. Even if an agent goes rogue, your production containers stay untouched.

Sources

Lesson 3: Best practices and pitfalls

AI security vulnerabilities often come down to three areas: prompt injection, insecure tool access, and insufficient isolation. Prompt injection (tricking an AI into following malicious instructions) can make an AI agent steal your SSH keys, exfiltrate your codebase, or phone home to an attacker. Traditional containers share the host kernel, which is a security risk for AI agents. A compromised agent can exploit kernel vulnerabilities to escape and access your host machine. Docker sandboxes solve this by running each agent in a lightweight microVM (an isolated virtual machine with its own kernel). On Mac OS, it uses Apple's virtualization framework; on Windows, Hyper-V. Each sandbox gets its own private Docker demon, file system, and network stack. Even if an agent goes rogue, it cannot see your host's containers or access your host's services. Network isolation is also critical — sandboxes enforce strict boundaries and include an HTTP filtering proxy to control which external endpoints agents can reach. To use Docker sandboxes, run `docker sandbox run claude` then your project path. Your workspace syncs automatically. If the agent needs debugging or tool installation, use `docker sandbox exec`. Full capabilities, zero host access. The scariest pitfall is assuming traditional scanning is enough. AI-generated code expands the attack surface faster than human security teams can review. Tools like cloud code security now reason about code the way attackers do, constructing proofs to confirm whether a vulnerability is exploitable. Nothing deploys without human approval — AI finds the bugs, humans make the decisions. For self-hosted setups, point the Continue VS Code extension at a local endpoint and switch between models. Use disposable Ubuntu VMs (virtual machines) through Multipass to test anything safely. Canonical’s LTS anything program keeps every dependency patched for up to 15 years, even if the original vendor disappears. Prompt injection through tool descriptions and data exfiltration through tool chaining are real concerns; a human-in-the-loop API with request user interaction is a solid start. Most security teams are not equipped to detect or stop AI agents before they act — 92% of security leaders lack the tools to respond in time. The best practice is defense in depth: give AI agents real autonomy only when truly isolated, and never skip human review for any deployed fix.

Sources