Field Notes

Earlier this week Claude needed an API key, couldn't find it in the project, found one in a .env on…

Earlier this week Claude needed an API key, couldn't find it in the project, found one in a .env on…

Earlier this week Claude needed an API key, couldn't find it in the project, found one in a .env on a different project on my laptop, and used it. No bad intent. Just a good agent doing its job however it could.

Intent isn't binary and trying to infer it is a real, actual security problem. An agent can't tell the difference between doing its job and being used to do its job. It doesn't know when it's being weaponized because from its perspective it isn't. It's just solving the problem in front of it.

Researchers this week showed that instructions embedded in README files triggered data exfiltration 85% of the time. Zero humans caught them in review.

Everyone is loaded with more to do and expected to do it faster than ever so expecting people to closely read every line of a README instead of giving it to an LLM is about as likely as me getting in more than 2500 steps in a day 😭

We keep expanding what we have agents do because the value is real. Every capability we add though is also blast radius we can't fully contain, because the agent will use whatever it can reach to accomplish what it's been told.

What has your coding agent done that you never explicitly authorized and couldn't fully explain afterward?