MarketJune 17, 20263 min read

The webpage your agent visits is already giving it orders

Computer-use agents don't distinguish between content to process and instructions to follow. A low-skilled attacker used that fact to breach fourteen companies last week.

Dani Brooks

Security & governance

Market

On June 17, 2026, Help Net Security reported on an OALABS analysis of over a thousand recovered agent sessions from a compromised server. The finding: a low-skilled attacker, using nothing but Anthropic's Claude Code and OpenAI's Codex, breached fourteen companies. He didn't need to understand what he was doing. He typed vague prompts and let the agents fill in the details — enumerate exposed services, identify vulnerabilities, write exploit code, harvest credentials. According to the OALABS researchers, in many cases the attacker 'supplied only vague, low-skill prompts and allowed Claude to fill in the gaps.' The agents complied. That's their job.

He only got caught because of an operational security failure — he ran the agents on a server he didn't own, and the server's owner found the working directory and handed it to researchers. If he'd run them on his own infrastructure, there would be no report. Just fourteen companies with unexplained incidents.

Indirect prompt injection: the attack in plain terms

Computer-use agents — OpenAI Operator, Claude Computer Use, and every similar product — read content from the web in order to complete tasks. That's the feature. The attack surface is identical to the feature: the content the agent reads can contain instructions that override the task you gave it. Hidden text, invisible CSS, a manipulated UI element — the model doesn't distinguish 'content to process' from 'instruction to follow.' HiddenLayer demonstrated this live against Claude Computer Use in October 2024, embedding instructions in a webpage that caused the agent to take actions the user never authorized. The agent complied then, too.

The same attack class showed up in a research disclosure published June 5, 2026. Microsoft Threat Intelligence found that Anthropic's Claude Code GitHub Action could expose CI/CD secrets when processing untrusted GitHub content — issue bodies, PR descriptions, comments. The agent's Read tool was authorized to access /proc/self/environ, which contained the workflow's ANTHROPIC_API_KEY. Anthropic patched it in Claude Code 2.1.128. The underlying pattern — attacker-controlled content reaching an agent that has access to secrets — is not patched anywhere.

The threat surface is everywhere the agent reads

→Webpages the agent visits to complete a task. Any page, any instruction hidden in the markup.
→GitHub issues, PR descriptions, and comments — especially now that coding agents are a CI/CD workflow component.
→Emails and documents if your agent can read them. A poisoned support ticket. A file retrieved from external storage.
→Tool outputs from external APIs. The agent trusts whatever text the tool returns.

OWASP's Q1 2026 GenAI exploit roundup documented a case where an attacker used Claude-assisted workflows to breach Mexican government agencies, treating the agent as an autonomous vulnerability discovery and exploitation engine. OWASP's framing: AI is now a force multiplier for attackers, and the primary surface isn't the model itself — it's whatever the model reads.

What you can actually do today

→Treat all retrieved content as untrusted. External data — webpages, emails, documents, tool outputs — should not be able to trigger high-impact actions without a human gate.
→Hard-wall the destructive operations. File writes, outbound requests, shell commands, credential access — require explicit approval, not just a willing model.
→Isolate agents from secrets. If the agent doesn't need the API key to complete its task, it shouldn't be able to read the environment. Blocking /proc/self/environ was the right instinct in the Claude Code patch — applied systemically, not one credential at a time.
→Log what the agent reads, not just what it does. If you want to investigate an injection after the fact, you need to know what triggered the behavior.

The attacker did not need to be an expert operator; they simply had to use the correct framing for their prompts. The agent supplied much of the structure and technical execution that the attacker appeared to lack.
— OALABS researchers, via Help Net Security, June 2026

Where Vantio fits

Prompt injection via external content is hard to stop at the model layer — you can't fully trust what the model says it will do before it does it. What you can do is constrain what the agent can actually reach. Vantio's host allow/block rules stop the agent from making outbound requests to hosts not on an approved list, so a poisoned prompt directing the agent to exfiltrate data somewhere off-policy gets blocked before the request leaves the machine. The metadata trail logs every attempted outbound call with context, which is exactly what you need to reconstruct an injection incident after the fact.

Sources

ShareX LinkedIn YDiscuss on HN

PII redaction, spend caps, and host blocking — live in under an hour.

Put real guardrails on your agents →

Get the next one

Subscribe to The Brief — occasional, signal-only.

No spam. Email only — unsubscribe anytime.

The webpage your agent visits is already giving it orders

Indirect prompt injection: the attack in plain terms

The threat surface is everywhere the agent reads

What you can actually do today

Where Vantio fits

Get the next one

Keep reading

The EU AI Act deadline you were building toward just moved. Here's what didn't.

Everyone's shipping agents. Almost nobody's pricing the risk.