Field Notes

AI auto-triager bots combining untrusted input + tools + real privileges are an RCE-shaped…

AI auto-triager bots combining untrusted input + tools + real privileges are an RCE-shaped footgun.

Prompt injection means “awesome auto-fixer-robot” is also “cool, tell me how you want me to hack our systems.”

You don’t need to ban agents.

A little extra work goes a long way toward preventing your overly helpful triager from handing the keys to internet ne’er-do-wells.

“Just don’t give it a shell” isn’t a strategy. If the bot can do anything, there’s always one more way to turn words into malicious outcomes.

Don’t give it “tools.” Give it approved operations that are safe + reversible:

Label / unlabel
Comment with a structured triage summary + questions
Route to a team / project / queue
Mark likely duplicates / request logs / request repro steps
Set severity / priority tags

Then gate anything that meaningfully changes state (especially anything that can reach code execution paths): CI/CD, dependencies, workflows, releases, and credentials.

If you’re using AI that can take actions and read content from the great unwashed masses, assume prompt injection will happen.

It’s not theoretical.

Agents with privileges are a remote-control interface, and the only question is who else will find the remote.