Everyone's shipping agents. Almost nobody's pricing the risk.
Operator, Claude Code, Manus, Devin, Cursor — the agent gold rush is on. The uncomfortable part: deployments don't fail on model quality. They fail on the controls nobody bothered to wire up.
Marcus Reyes
Writes about AI infrastructure economics
Pick your fighter: OpenAI's Operator for computer use, Claude Code for big refactors, Cursor for daily work, Manus for full autonomy, Devin if you want a black box that ships features. In 2026 they all produce strong output. Which means the model is now the least interesting variable in whether your deployment survives.
The numbers say everyone's in. A widely-cited Gartner figure from early 2026 — reported in a 2026 enterprise-adoption roundup — put roughly 80% of enterprises running at least one production agent, up from about a third two years earlier. Treat the exact percentage as directional; the direction is not in doubt.
Here's the part that doesn't make the launch posts: most of those agents never get past pilot, and the enterprise playbooks are blunt about why — the deployments that die don't lose on model quality, they lose on identity, logging, and incident controls that were never wired in.
The risk nobody prices into the demo
Strip away the branding and every one of these products has the same three holes when you drop it into a real company:
- →Identity. Agents share an API key or run under one engineer's creds. When something goes wrong, your audit log can't tell you which agent, which human, which session — so you can't revoke and you can't explain.
- →Egress. The agent can call anything its network can reach. A poisoned file or a confused plan turns into a request to a host you never approved, and nothing stops it.
- →Cost + audit. No per-run ceiling, no record of why it did what it did. You find out from the invoice or the incident, not before.
Even the frontier labs say this out loud. In its own guidance on running Codex safely — as summarized here — OpenAI's framing is that control is only half the job: once agents are deployed, security teams need visibility into what those agents are doing and why. If that's true inside the lab that built the thing, it's true for you.
What actually separates the survivors
The deployments that scale share a boring control stack, and it has nothing to do with which logo is on the agent:
- →Deny-by-default egress — an allowlist of the handful of hosts a task actually needs, everything else dropped.
- →Per-run spend ceilings that halt execution, not alerts that arrive after the money's gone.
- →A real audit trail — what was attempted, what policy applied, what was allowed or blocked — that you can hand to a security review.
Where Vantio fits
Vantio doesn't replace your agent — pick whichever product you like. It's the layer underneath that makes the choice safe: host allow/block rules so an agent only talks to hosts you approved (off-policy calls stopped client-side and logged), per-run spend caps that actually stop the bleed, and a metadata-only audit trail of every action — no prompts, no completions, just what happened. Start free to watch what your agents already do; add the guardrails when you put them in front of anything that matters. The product is the engine. This is the seatbelt.
Sources
PII redaction, spend caps, and host blocking — live in under an hour.
Put real guardrails on your agents →Get the next one
Subscribe to The Brief — occasional, signal-only.
No spam. Email only — unsubscribe anytime.