AI Sandboxes Keep Failing at the Same Boundary

The uncomfortable part of the Cohere Terrarium bug is not that a sandbox had a bug. Sandboxes have bugs. The uncomfortable part is that the failure looks familiar.

Terrarium tried to run LLM-generated Python safely. OpenAI Codex CLI tried to let an agent run commands without letting that agent redraw the filesystem boundary. Different stacks, different vendors, different months. The shared mistake is smaller and nastier: code produced or steered by a model was allowed to touch the rules that were supposed to contain it.

That should bother anyone building agentic systems. We keep using the word "sandbox" as if it names a security property. In practice, it often names a hopeful wrapper around a language runtime, a current working directory, or a container that still trusts too much input.

What is verified

CERT/CC published VU#414811 for Cohere's cohere-terrarium on April 21, 2026. The note says Terrarium allowed arbitrary code execution with root privileges on the host Node.js process through JavaScript prototype chain traversal in Pyodide's WebAssembly environment.

The technical root cause is specific. Terrarium configured Pyodide's jsglobals with mock DOM objects created as ordinary JavaScript object literals. Those objects inherited from Object.prototype. Sandbox code could walk from the fake object to .constructor.constructor, reach the host Function constructor, return the real globalThis, and then access Node.js internals such as require().

Cohere released cohere-terrarium v1.0.1 on April 22. The GitHub release says the fix changed exposed mock objects to Object.create(null), froze read-only mocks, and added a regression test for CVE-2026-5752. The same release says the project remains unmaintained beyond that security release and users should migrate to a maintained sandbox. GitHub also shows the repository as archived.

The disclosure trail matters. CERT says Cohere was notified on February 19. Its vendor block says CERT had not received a vendor statement. The CERT addendum says cohere-terrarium has been archived, v1.0.1 is the final release, and there will be no further patches.

OpenAI's older Codex CLI advisory is the useful comparison. GHSA-w5fx-fh39-j5rw says Codex CLI versions 0.2.0 through 0.38.0 could treat a model-generated cwd as the sandbox's writable root, including paths outside the folder where the user started the session. OpenAI says this enabled arbitrary file writes and command execution where the Codex process had permissions, although it did not bypass the network-disabled sandbox restriction. The fix shipped in Codex CLI 0.39.0 and the Codex IDE extension 0.4.12.

The real pattern: policy must not be model-shaped

The Terrarium bug is easy to file under "JavaScript weirdness." That would miss the lesson.

Pyodide is a Python distribution for the browser and Node.js, not a magic security boundary. Its foreign-function interface exists so Python can interact with JavaScript. If a server hands Pyodide a fake JavaScript world and that fake world has a path back to the real one, the sandbox boundary is gone.

Codex CLI's bug is not the same mechanism, but it rhymes. The advisory says the CLI let a model-generated current working directory influence the writable root of the sandbox policy. In other words, an untrusted actor could help define the scope of its own confinement.

That is the mistake to watch for in agent systems:

a model chooses a path, and the path becomes a policy boundary;
a model sees a config file, and the config changes tool permissions;
a model calls a helper object, and the helper object reaches the host runtime;
a model reads an issue, README, or web page, and those bytes become instructions for code execution.

The attack surface is not only prompt injection. It is prompt injection plus ambient authority.

Containers are not the end of the discussion

CERT's note says the Terrarium issue allowed root command execution inside the container and warned about access to sensitive files, environment variables, and services reachable from the container network. That is already bad. Whether a specific deployment could break out of the container depends on its runtime, privileges, mounts, network, secrets, and host hardening.

So the right claim is not "this gave attackers the host." CERT describes root in the host Node.js process and root inside the container context. The practical risk is that many production containers are full of credentials and internal reachability. A container boundary is useful, but it is not a reason to treat a language runtime as safe.

The better question is: what isolation primitive sits under the agent's code executor? A language-level sandbox is a weak answer. A plain container is a better answer, but still needs careful capabilities, mount rules, seccomp, network policy, and secret handling. MicroVMs, gVisor-style user-space kernels, and similar layers cost more, but they at least start from the premise that generated code is hostile.

Why Reddit cared

The r/netsec thread took off because this is not another abstract AI safety debate. It is a bug class that lands on developer machines and production agent infrastructure.

The most useful public reaction came from security researchers and aggregators, not from a giant social pile-on. Hacker News search did not show a matching discussion at research time. X discovery was thin too: xurl was not available here, and the public metadata I could retrieve was a Spanish security post by Securízame linking to coverage of the Terrarium flaw. That thinness is itself a signal. The issue is serious, but it has not crossed into mainstream developer discourse yet.

That gap is dangerous. Agent builders are moving faster than the shared vocabulary for agent confinement. Users see a checkbox labeled sandbox and assume the hard part is solved.

What remains uncertain

There are a few boundaries worth keeping clean.

First, I have not found public evidence of exploitation in the wild. The verified record is disclosure, advisory, patch, archive, and third-party reporting.

Second, the severity numbers differ by source. GitHub's global advisory for CVE-2026-5752 lists a CVSS 3.1 score of 9.4. Cohere's release notes call it 9.3. The exact decimal matters less than the classification: critical sandbox escape.

Third, the disclosure coordination story is incomplete. CERT says Cohere was notified on February 19 and says it had not received a vendor statement. Cohere did release v1.0.1 after CERT's publication and archived the repository. Without a vendor statement, motive and internal timeline should stay out of the article.

Finally, Codex CLI and Terrarium should not be collapsed into one incident. Codex was patched months earlier and its advisory says network sandboxing was not bypassed. The comparison is about the design pattern, not equal blast radius.

Practical takeaways

If an agent runs generated code, audit what defines the sandbox policy. Do not let model-controlled fields, tool arguments, retrieved files, or generated paths decide where the boundary is.

If you use Pyodide, vm2-style JavaScript isolation, or a same-process interpreter as the main security layer, treat that as a bug waiting for a proof of concept. These tools can be useful. They are not a substitute for process, kernel, or VM isolation.

If you run containers for agent execution, inspect what is inside them. Environment variables, cloud tokens, package registry credentials, internal service access, Docker sockets, host mounts, and writable config paths are the difference between "root in a disposable box" and "incident report."

The headline is a Cohere CVE. The broader story is that AI code execution keeps failing where product language is softest. "Sandbox" is not an architecture. It is a claim that needs a boundary, a threat model, and proof that the model cannot help rewrite either one.

Sources

Reddit r/netsec hot thread: "Cohere Terrarium (CVE-2026-5752) and OpenAI Codex CLI (CVE-2025-59532): a cross-CVE analysis of AI code sandbox escapes"

https://old.reddit.com/r/netsec/comments/1suh47t/cohere_terrarium_cve20265752_and_openai_codex_cli/

Barrack AI analysis: "Cohere, OpenAI, and the broken sandbox problem"

https://blog.barrack.ai/pyodide-sandbox-escape-cohere-terrarium-openai-codex/

CERT/CC VU#414811: "Terrarium contains a vulnerability that allows arbitrary code execution"

https://www.kb.cert.org/vuls/id/414811

GitHub global advisory for CVE-2026-5752 / GHSA-cmpr-pw8g-6q6c

https://github.com/advisories/GHSA-cmpr-pw8g-6q6c

Cohere Terrarium v1.0.1 release notes

https://github.com/cohere-ai/cohere-terrarium/releases/tag/v1.0.1

OpenAI Codex CLI advisory GHSA-w5fx-fh39-j5rw / CVE-2025-59532

https://github.com/advisories/GHSA-w5fx-fh39-j5rw

The Hacker News: "Cohere AI Terrarium Sandbox Flaw Enables Root Code Execution, Container Escape"

https://thehackernews.com/2026/04/cohere-ai-terrarium-sandbox-flaw.html

Public X metadata for Securízame post linking Spanish coverage

https://x.com/Securizame/status/2047218918037622942