Field Notes

A new Blackhat talk shows how to force LLMs to leak their training data in what headlines call a…

By Nate Lee, co-founder of TrustMind — 12 Aug 2025

A new Blackhat talk shows how to force LLMs to leak their training data in what headlines call a "breach."

It's not.

This is the kind of misdirection that wastes time and budgets. It gets teams chasing phantom risks while the real dangers go ignored.

Let's break it down:

The "attack" is a lab experiment. Researchers had to know an exact sentence they wanted to extract beforehand to coax the model into repeating it and even then, it worked less than 1% of the time.

It’s an interesting parlor trick for IP debates maybe, but it's not a practical security threat.

More importantly, it fundamentally confuses two different things.

If your confidential corporate strategy is in a public model's training set, you have a catastrophic data leak that has nothing to do with this "attack."
The vast majority of security and privacy risk isn't in the model's weights; it's in the insecure application architectures we build around them.

The real danger is a poorly designed system with what Simon Willison perfectly calls the "Lethal Trifecta":

It takes in untrusted data.
It has access to confidential data.
It has the ability to act or communicate externally.

Security teams need to stop asking "Do you train on our data?" and start learning what makes for a secure agentic system. That’s exactly why we developed the trait based agentic system security guide with the Cloud Security Alliance (Linked in comments!), to focus on the architectural principles that prevent actual harm.

Your takeaway shouldn't be to fear the model. It should be to go threat model your application's architecture today. That's where the real vulnerabilities that will cause tomorrow's breaches and leaks are hiding.

hashtag#AIsecurity hashtag#ApplicationSecurity hashtag#RiskManagement