The standard security framing on agents borrows from API security, which gets the threat model wrong. API security assumes you can enumerate what a system can do -- access controls work because the action space is static. Agents are different: the action space expands dynamically based on tools available and the instruction given at runtime. The priority for NIST should be distinguishing authorization (who can invoke an agent) from action scope control (what any invocation can trigger). Those are different security primitives, and most current frameworks don't address the second one
totetsu 38 days ago [-]
With this renaming of AISI to CAISI[1], and the resignation of its founding director[2] Elizabeth Kelly, It seems that the position has sifted to, don't let any concerns about social harms stop tech companies doing what ever they want, and also lets make a show of how bad China is. I think any public comment outside of the narrow definition of AI Risk as risk to national security, might fall on deaf ears.
The NIST focus on "agent registration/tracking" is the right instinct but the wrong abstraction. Registration is a compliance checkbox — it tells you an agent exists, not what it's doing.
What we actually need is runtime behavioral monitoring: what files is the agent accessing? What network calls is it making? What credentials can it reach? That's where the real threat surface lives.
We've been building exactly this with ClawMoat (open source, MIT) — host-level security that monitors agent behavior in real-time. Permission tiers, forbidden zones, credential isolation, network egress monitoring. Think AppArmor for AI agents.
The gap in NIST's framing: they're treating agents like software to be certified, but agents are more like employees to be supervised. You don't just background-check an employee once — you give them appropriate access levels and monitor for anomalies.
Anyone planning to submit comments to NIST, the deadline is March 9. Would love to see the community push for runtime monitoring requirements, not just pre-deployment certification.
digitr33 34 days ago [-]
Point 5 is the one nobody's actually doing yet. It's pretty apparent that everyone agrees we need to measure blast radius but where's the tooling?
I've been running AI models against real vulnerable targets, giving them a Kali box and an objective, letting them go autonomous. Every model I tested popped almost every OWASP top 10 challenge we had. The interesting part is the cost of getting there. One model solved a JWT forgery in 16 seconds and 5K tokens. Another took 170 seconds and 210K tokens. Same result, completely different blast pattern.
If we're serious about measuring agent risk, we need to stop theorizing about what they can do and start actually benchmarking it.
Note
On the othr hand, we had a lab that a jr pentest would have caught in 10 mins, and the best models couldn't figure it out..
saltpath 20 days ago [-]
The token cost difference is the metric nobody's capturing. 5K vs 210K tokens for the same JWT forgery isn't just efficiency — it's the blast surface. A contained agent leaves a narrow call trace. A thrashing one touches five APIs, retries three times, leaks context in every hop. If your proxy logs the full call chain with timestamps and response sizes per hop, that cost delta becomes a measurable risk signal, not just a billing line. The hard part isn't the instrumentation, it's getting teams to route agent traffic through anything they don't own.
saltpath 23 days ago [-]
The blast radius framing is right but the tooling gap is actually worse than debugging. It's about third-party verifiability. A regulator or auditor can't trust a log produced by the same operator who runs the agent.
Spent the last few months on this specific problem. chain hash per outbound call + external timestamp so anyone can verify independently what the agent called, when, and what it got back. works across providers which matters when you're chaining claude -> mistral -> internal endpoint.
NIST is requesting public input on security practices for AI agent systems -
autonomous AI that can take actions affecting real-world systems (trading bots,
automated operations, multi-agent coordination).
Key focus areas:
- Novel threats: prompt injection, behavioral hijacking, cascade failures
- How existing security frameworks (STRIDE, attack trees) need to adapt
- Technical controls and assessment methodologies
- Agent registration/tracking (analogous to drone registration)
This is specifically about agentic AI security, not general ML security - one of
the first formal government RFIs on autonomous agents.
Comments from practitioners deploying these systems would be valuable.
NIST asking for agent security comments right as the agent stack is splitting into layers with completely different threat models.
A model layer vulnerability looks nothing like a tool-use layer vulnerability.. but most framework still treats "the AI system" as one blob. But probably nobody who owns the audit trail when an agent chain spans six vendors.
Best security is a proper liability process for damages caused by publically accessible LLMs followed by users.
jksmith 38 days ago [-]
1. Attack surface for agents is tantamount to a virus.
2. Any way for an agent to touch something is a potential compromised vector.
3. The mitigation is controlling the blast radius.
4. Sandboxing capability will have to be baked into architecture.
5. Mitigation includes measuring cost of blast radius.
6. All agent orchestration will likely require an andon cord.
beej71 38 days ago [-]
War Operations Plan Response.
snowhale 38 days ago [-]
[dead]
niyikiza 38 days ago [-]
Good distinction, but I wonder if it's worth going further: context integrity may be fundamentally unsolvable. Agents consume untrusted input by design. Trying to guarantee the model won't be tricked seems like the wrong layer to bet on.
What seems more promising is accepting that the model will be tricked and constraining what it can do when that happens. Authorization at the tool boundary, scoped to the task and delegation chain rather than the agent's identity. If a child agent gets compromised, it still can't exceed the authority that was delegated to it. Contain the blast radius instead of trying to prevent the confusion.
(Disclaimer: working on this problem at tenuo.ai)
jensbontinck 28 days ago [-]
This is exactly right. We went down this path and the practical implementation ends up looking like capability tokens. short-lived, cryptographically signed credentials that encode what the agent is authorized to do for this specific task.
The key insight: the token isn't just authorization, it's evidence.
When you issue an ES256-signed token that says "this agent was scanned for PII, classified as INTERNAL, and is authorized to call [search,read_file] for the next 60 seconds" , that token becomes the audit artifact. The auditor doesn't need to the agent or the operator; they verify the token chain.
On "contain the blast radius instead of preventing the confusion"
agreed, but you need both. Containment (scoped permissions, delegation
chains) handles authorization. But you still need a detection layer
for data protection: PII flowing to an external model is a GDPR or EU AI Act (def. in europe) violation regardless of whether the agent was "authorized" to make that call. We found deterministic scanning (regex + normalization, not LLM judges) at the proxy layer catches this at ~250ms without the reliability problems of using another model to judge the first one.
The ergonomics point tucnak raised is real too. We use OPA/Rego for the policy layer with presets so operators don't have to write Rego from scratch, pick a security posture and tune from there. The governance tax has to be near-zero or teams just bypass it.
wangzhongwang 37 days ago [-]
[dead]
umairnadeem123 38 days ago [-]
[dead]
tucnak 38 days ago [-]
What you're talking about exists, and it's called Relationship-based Access Control, or ReBAC. There are a few implementations, Zanzibar paper, etc. The issue is not capability system, it's governance. The operator needs to write policies, of course! They don't want to read, write policies, audit other people's policies.
mrkmarron 37 days ago [-]
What is your take on usability of these systems? In practice they seem to be rather un-ergonomic and usage devolves into require everything.
As agentic systems seem to mainly interoperate with REST style systems I suspect that using URIs for resource use descriptions would be more natural.
tucnak 36 days ago [-]
You're right on ergonomics.
CodeAct is one way to abstract away some things, and bring others to the forefront. Especially when it comes to anything requiring a sidecar for mTLS, or something agents must be aware of, like error handling for whenever some call fails deep inside the stack. Troubleshooting access issues is key, during tool development and when using said tool in production, too. For many, many things, CodeAct is simply superior to naive calling conventions that you see around MCP clients, think OpenAPI.
jzelinskie 38 days ago [-]
Sorry to piggyback, but if this is of interest to you, feel free to reach out to me over to email (contact info in my profile). I'm one of the founders of the most popular ReBAC solution, SpiceDB, which secures quite a few AI products including big players like OpenAI. I'm always interested in hearing about more use cases or where folks are struggling the most.
tucnak 36 days ago [-]
Hi Jimmy, happy to talk about my experience. I reached out to you over email.
Rendered at 01:02:43 GMT+0000 (UTC) with Wasmer Edge.
[1] https://www.commerce.gov/news/press-releases/2025/06/stateme... [2] https://www.reuters.com/technology/us-ai-safety-institute-di...
What we actually need is runtime behavioral monitoring: what files is the agent accessing? What network calls is it making? What credentials can it reach? That's where the real threat surface lives.
We've been building exactly this with ClawMoat (open source, MIT) — host-level security that monitors agent behavior in real-time. Permission tiers, forbidden zones, credential isolation, network egress monitoring. Think AppArmor for AI agents.
The gap in NIST's framing: they're treating agents like software to be certified, but agents are more like employees to be supervised. You don't just background-check an employee once — you give them appropriate access levels and monitor for anomalies.
Anyone planning to submit comments to NIST, the deadline is March 9. Would love to see the community push for runtime monitoring requirements, not just pre-deployment certification.
I've been running AI models against real vulnerable targets, giving them a Kali box and an objective, letting them go autonomous. Every model I tested popped almost every OWASP top 10 challenge we had. The interesting part is the cost of getting there. One model solved a JWT forgery in 16 seconds and 5K tokens. Another took 170 seconds and 210K tokens. Same result, completely different blast pattern.
If we're serious about measuring agent risk, we need to stop theorizing about what they can do and start actually benchmarking it.
Note On the othr hand, we had a lab that a jr pentest would have caught in 10 mins, and the best models couldn't figure it out..
Spent the last few months on this specific problem. chain hash per outbound call + external timestamp so anyone can verify independently what the agent called, when, and what it got back. works across providers which matters when you're chaining claude -> mistral -> internal endpoint.
Early days but if useful for the nist response: https://arkforge.tech/trust/v1/proof/prf_20260310_182226_cbc...
Key focus areas: - Novel threats: prompt injection, behavioral hijacking, cascade failures - How existing security frameworks (STRIDE, attack trees) need to adapt - Technical controls and assessment methodologies - Agent registration/tracking (analogous to drone registration)
This is specifically about agentic AI security, not general ML security - one of the first formal government RFIs on autonomous agents.
Comments from practitioners deploying these systems would be valuable.
Deadline: March 9, 2026, 11:59 PM ET Submit: https://www.regulations.gov/commenton/NIST-2025-0035-0001
Priority questions (if limited time): 1(a), 1(d), 2(a), 2(e), 3(a), 3(b), 4(a), 4(b), 4(d)
Full 43-question RFI at link above.
A model layer vulnerability looks nothing like a tool-use layer vulnerability.. but most framework still treats "the AI system" as one blob. But probably nobody who owns the audit trail when an agent chain spans six vendors.
Wrote about this layering in an article last month: https://philippdubach.com/posts/dont-go-monolithic-the-agent...
A more recent release:
Announcing the "AI Agent Standards Initiative" for Interoperable and Secure Innovation
https://www.nist.gov/news-events/news/2026/02/announcing-ai-...
(Disclaimer: working on this problem at tenuo.ai)
The key insight: the token isn't just authorization, it's evidence. When you issue an ES256-signed token that says "this agent was scanned for PII, classified as INTERNAL, and is authorized to call [search,read_file] for the next 60 seconds" , that token becomes the audit artifact. The auditor doesn't need to the agent or the operator; they verify the token chain.
On "contain the blast radius instead of preventing the confusion" agreed, but you need both. Containment (scoped permissions, delegation chains) handles authorization. But you still need a detection layer for data protection: PII flowing to an external model is a GDPR or EU AI Act (def. in europe) violation regardless of whether the agent was "authorized" to make that call. We found deterministic scanning (regex + normalization, not LLM judges) at the proxy layer catches this at ~250ms without the reliability problems of using another model to judge the first one.
The ergonomics point tucnak raised is real too. We use OPA/Rego for the policy layer with presets so operators don't have to write Rego from scratch, pick a security posture and tune from there. The governance tax has to be near-zero or teams just bypass it.
As agentic systems seem to mainly interoperate with REST style systems I suspect that using URIs for resource use descriptions would be more natural.
CodeAct is one way to abstract away some things, and bring others to the forefront. Especially when it comes to anything requiring a sidecar for mTLS, or something agents must be aware of, like error handling for whenever some call fails deep inside the stack. Troubleshooting access issues is key, during tool development and when using said tool in production, too. For many, many things, CodeAct is simply superior to naive calling conventions that you see around MCP clients, think OpenAPI.