Field Notes

A new Blackhat talk shows how to force LLMs to leak their training data

By Nate Lee, co-founder of TrustMind — 13 Aug 2025

A new Blackhat talk shows how to force LLMs to leak their training data. The headlines are calling it a "breach."

It's not.

This is the kind of security misdirection that wastes our time and budgets. It gets teams chasing phantom risks while the real dangers go ignored.

The "attack" is a lab experiment where researchers had to know the exact information they wanted to extract beforehand to coax the model into repeating it. It’s an interesting trick for IP debates maybe, but it's not a practical security threat.

More importantly, it fundamentally confuses two different things. If your confidential corporate strategy is in a public model's training set, you have a catastrophic data leak that has nothing to do with this "attack."
The vast majority of security and privacy risk isn't in the model's weights; it's in the insecure application architectures we build around them.

The real danger is a poorly designed system with what Simon Willison perfectly calls the "Lethal Trifecta":

😱 It passes untrusted data through the LLM
🕵️ It has access to confidential data.
🌐 It has the ability to act or communicate externally.

Security teams need to stop asking "Do you train on our data?" and start learning what makes for a secure agentic system. That’s exactly why we developed the trait based guid to building secure agentic systems with the Cloud Security Alliance (Linked in the comments), to focus on the architectural principles that prevent actual harm.

Your takeaway shouldn't be to fear the model. It should be to go threat model your application's architecture today. That's where the real vulnerabilities that will cause tomorrow's breaches and leaks are hiding.

#AIsecurity #ApplicationSecurity #RiskManagement