Back to browse
Teapot – A methodology for pen testing voice AI agents

Teapot – A methodology for pen testing voice AI agents

by xmhatx·Feb 18, 2026·7 points·10 comments

AI Analysis

●●SolidBig BrainNiche Gem

Voice-specific prompt injection framework, but testing methodology alone isn't a shipping product.

Strengths
  • Identifies real attack surface unique to STT+LLM+TTS pipeline that text-based testing misses
  • Systematizes attack patterns across six concrete phases (Transcription, Exploration, Attack surface, Prompt injection, Output, Tool abuse)
  • Addresses genuine gap: existing OWASP LLM Top 10 assumes text-only interfaces
Weaknesses
  • Published as documentation only; no open-source test harness, automation tools, or proof-of-concept code shipped
  • Unclear if methodology is novel enough to warrant a full brand—borrows structure from standard pentesting (recon, attack, eval)
Category
Target Audience

Security researchers, penetration testers, AI product teams

Similar To

OWASP LLM Top 10 · Standard prompt injection frameworks (e.g., Garak) · Voice AI red-teaming practices at frontier labs

Post Description

Hello HN, I am Brian Cardinale, a penetration tester and security researcher at SecureCoders. We have been performing more and more AI based security assessments. We were presented a unique challenge of testing a system where the only interface was voice based, and as much as I like talking on the phone , we decided to create a test harness to facilitate the actual testing in a more systematic way. The technical test harness was the easy part, though. Creating test goals and attack strategies to help facilitate repeated and comprehensive testing became the real challenge. As such, we have been working on documenting our processes to share with the greater community and as a starting point for discussion. These systems present unique challenges where cleverness appears to be the name of the game. Such as suggesting for the agent to share its thoughts in “Inner Monologue” tags instead of “thinking” tags because those were specifically excluded in the agents prompt. Ya know, just silly things. Anyway, if reading is not your thing, I also did a walkthrough video of this methodology here: https://www.youtube.com/watch?v=XNmqCXsEc8Y

tl;dr: AI testing is tricky, we are documenting and sharing our tricks

Do you have any favorite AI jailbreak tricks?

Similar Projects

AI/ML●●Solid

AI agents that run real user interviews

MCP integration brings real voice interviews to agents stuck guessing user needs.

Ship ItSolve My Problem
jtccc
342mo ago