Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:
For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:
d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)
For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:
d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)
While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.
This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:
d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)
We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.
Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:
* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`
A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.
Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.
This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.
The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
https://github.com/openai/codex/pull/1086 is a work-in-progress to make
Linux sandboxing work more like Seatbelt where, for the command we want
to sandbox, we build up the command and then hand it, and some sandbox
configuration flags, to another command to set up the sandbox and then
run it.
In the case of Seatbelt, macOS provides this helper binary and provides
it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and
pass it through (which is what #1086 does), so this makes the new
`codex_linux_sandbox_exe` available on `Config` so that it will later be
available in `exec.rs` when we need it in #1086.
To date, when handling `shell` and `local_shell` tool calls, we were
spawning new processes using the environment inherited from the Codex
process itself. This means that the sensitive `OPENAI_API_KEY` that
Codex needs to talk to OpenAI models was made available to everything
run by `shell` and `local_shell`. While there are cases where that might
be useful, it does not seem like a good default.
This PR introduces a complex `shell_environment_policy` config option to
control the `env` used with these tool calls. It is inevitably a bit
complex so that it is possible to override individual components of the
policy so without having to restate the entire thing.
Details are in the updated `README.md` in this PR, but here is the
relevant bit that explains the individual fields of
`shell_environment_policy`:
| Field | Type | Default | Description |
| ------------------------- | -------------------------- | ------- |
-----------------------------------------------------------------------------------------------------------------------------------------------
|
| `inherit` | string | `core` | Starting template for the
environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full
parent env), or `none` (start empty). |
| `ignore_default_excludes` | boolean | `false` | When `false`, Codex
removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN`
(case-insensitive) before other rules run. |
| `exclude` | array<string> | `[]` | Case-insensitive glob
patterns to drop after the default filter.<br>Examples: `"AWS_*"`,
`"AZURE_*"`. |
| `set` | table<string,string> | `{}` | Explicit key/value
overrides or additions – always win over inherited values. |
| `include_only` | array<string> | `[]` | If non-empty, a
whitelist of patterns; only variables that match _one_ pattern survive
the final step. (Generally used with `inherit = "all"`.) |
In particular, note that the default is `inherit = "core"`, so:
* if you have extra env variables that you want to inherit from the
parent process, use `inherit = "all"` and then specify `include_only`
* if you have extra env variables where you want to hardcode the values,
the default `inherit = "core"` will work fine, but then you need to
specify `set`
This configuration is not battle-tested, so we will probably still have
to play with it a bit. `core/src/exec_env.rs` has the critical business
logic as well as unit tests.
Though if nothing else, previous to this change:
```
$ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY
# ...prints OPENAI_API_KEY...
```
But after this change it does not print anything (as desired).
One final thing to call out about this PR is that the
`configure_command!` macro we use in `core/src/exec.rs` has to do some
complex logic with respect to how it builds up the `env` for the process
being spawned under Landlock/seccomp. Specifically, doing
`cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably
the most intuitive way to do it) caused the Landlock unit tests to fail
because the processes spawned by the unit tests started failing in
unexpected ways! If we forgo `env_clear()` in favor of updating env vars
one at a time, the tests still pass. The comment in the code talks about
this a bit, and while I would like to investigate this more, I need to
move on for the moment, but I do plan to come back to it to fully
understand what is going on. For example, this suggests that we might
not be able to spawn a C program that calls `env_clear()`, which would
be...weird. We may still have to fiddle with our Landlock config if that
is the case.
Previously, running Codex as an MCP server required a standalone binary
in our Cargo workspace, but this PR makes it available as a subcommand
(`mcp`) of the main CLI.
Ran this with:
```
RUST_LOG=debug npx @modelcontextprotocol/inspector cargo run --bin codex -- mcp
```
and verified it worked as expected in the inspector at
`http://127.0.0.1:6274/`.
Adds `expect()` as a denied lint. Same deal applies with `unwrap()`
where we now need to put `#[expect(...` on ones that we legit want. Took
care to enable `expect()` in test contexts.
# Tests
```
cargo fmt
cargo clippy --all-features --all-targets --no-deps -- -D warnings
cargo test
```
When using Codex to develop Codex itself, I noticed that sometimes it
would try to add `#[ignore]` to the following tests:
```
keeps_previous_response_id_between_tasks()
retries_on_early_close()
```
Both of these tests start a `MockServer` that launches an HTTP server on
an ephemeral port and requires network access to hit it, which the
Seatbelt policy associated with `--full-auto` correctly denies. If I
wasn't paying attention to the code that Codex was generating, one of
these `#[ignore]` annotations could have slipped into the codebase,
effectively disabling the test for everyone.
To that end, this PR enables an experimental environment variable named
`CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the
`SandboxPolicy` used to spawn the process does not have full network
access. I say it is "experimental" because I'm not convinced this API is
quite right, but we need to start somewhere. (It might be more
appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the
challenge is that our newer `SandboxPolicy` abstraction does not map to
a simple set of enums like in the TypeScript CLI.)
We leverage this new functionality by adding the following code to the
aforementioned tests as a way to "dynamically disable" them:
```rust
if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() {
println!(
"Skipping test because it cannot execute when network is disabled in a Codex sandbox."
);
return;
}
```
We can use the `debug seatbelt --full-auto` command to verify that
`cargo test` fails when run under Seatbelt prior to this change:
```
$ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test
---- keeps_previous_response_id_between_tasks stdout ----
thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46:
Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
keeps_previous_response_id_between_tasks
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `-p codex-core --test previous_response_id`
```
Though after this change, the above command succeeds! This means that,
going forward, when Codex operates on Codex itself, when it runs `cargo
test`, only "real failures" should cause the command to fail.
As part of this change, I decided to tighten up the codepaths for
running `exec()` for shell tool calls. In particular, we do it in `core`
for the main Codex business logic itself, but we also expose this logic
via `debug` subcommands in the CLI in the `cli` crate. The logic for the
`debug` subcommands was not quite as faithful to the true business logic
as I liked, so I:
* refactored a bit of the Linux code, splitting `linux.rs` into
`linux_exec.rs` and `landlock.rs` in the `core` crate.
* gating less code behind `#[cfg(target_os = "linux")]` because such
code does not get built by default when I develop on Mac, which means I
either have to build the code in Docker or wait for CI signal
* introduced `macro_rules! configure_command` in `exec.rs` so we can
have both sync and async versions of this code. The synchronous version
seems more appropriate for straight threads or potentially fork/exec.
Sets submodules to use workspace lints. Added denying unwrap as a
workspace level lint, which found a couple of cases where we could have
propagated errors. Also manually labeled ones that were fine by my eye.
I discovered that I accidentally introduced a change in
https://github.com/openai/codex/pull/829 where we load a fresh `Config`
in the middle of `codex.rs`:
c3e10e180a/codex-rs/core/src/codex.rs (L515-L522)
This is not good because the `Config` could differ from the one that has
the user's overrides specified from the CLI. Also, in unit tests, it
means the `Config` was picking up my personal settings as opposed to
using a vanilla config, which was problematic.
This PR cleans things up by moving the common case where
`Op::ConfigureSession` is derived from `Config` (originally done in
`codex_wrapper.rs`) and making it the standard way to initialize `Codex`
by putting it in `Codex::spawn()`. Note this also eliminates quite a bit
of boilerplate from the tests and relieves the caller of the
responsibility of minting out unique IDs when invoking `submit()`.
Some effects of this change:
- New formatting changes across many files. No functionality changes
should occur from that.
- Calls to `set_env` are considered unsafe, since this only happens in
tests we wrap them in `unsafe` blocks
I started this PR because I wanted to share the `format_duration()`
utility function in `codex-rs/exec/src/event_processor.rs` with the TUI.
The question was: where to put it?
`core` should have as few dependencies as possible, so moving it there
would introduce a dependency on `chrono`, which seemed undesirable.
`core` already had this `cli` feature to deal with a similar situation
around sharing common utility functions, so I decided to:
* make `core` feature-free
* introduce `common`
* `common` can have as many "special interest" features as it needs,
each of which can declare their own deps
* the first two features of common are `cli` and `elapsed`
In practice, this meant updating a number of `Cargo.toml` files,
replacing this line:
```toml
codex-core = { path = "../core", features = ["cli"] }
```
with these:
```toml
codex-core = { path = "../core" }
codex-common = { path = "../common", features = ["cli"] }
```
Moving `format_duration()` into its own file gave it some "breathing
room" to add a unit test, so I had Codex generate some tests and new
support for durations over 1 minute.
In order to expose Codex via an MCP server, I realized that we should be
taking `cwd` as a parameter rather than assuming
`std::env::current_dir()` as the `cwd`. Specifically, the user may want
to start a session in a directory other than the one where the MCP
server has been started.
This PR makes `cwd: PathBuf` a required field of `Session` and threads
it all the way through, though I think there is still an issue with not
honoring `workdir` for `apply_patch`, which is something we also had to
fix in the TypeScript version: https://github.com/openai/codex/pull/556.
This also adds `-C`/`--cd` to change the cwd via the command line.
To test, I ran:
```
cargo run --bin codex -- exec -C /tmp 'show the output of ls'
```
and verified it showed the contents of my `/tmp` folder instead of
`$PWD`.
@oai-ragona and I discussed it, and we feel the REPL crate has served
its purpose, so we're going to delete the code and future archaeologists
can find it in Git history.
This introduces a standalone executable that run the equivalent of the
`codex debug landlock` subcommand and updates `rust-release.yml` to
include it in the release.
The idea is that we will include this small binary with the TypeScript
CLI to provide support for Linux sandboxing.
Taking a pass at building artifacts per platform so we can consider
different distribution strategies that don't require users to install
the full `cargo` toolchain.
Right now this grabs just the `codex-repl` and `codex-tui` bins for 5
different targets and bundles them into a draft release. I think a
clearly marked pre-release set of artifacts will unblock the next step
of testing.
Previous to this PR, `SandboxPolicy` was a bit difficult to work with:
237f8a11e1/codex-rs/core/src/protocol.rs (L98-L108)
Specifically:
* It was an `enum` and therefore options were mutually exclusive as
opposed to additive.
* It defined things in terms of what the agent _could not_ do as opposed
to what they _could_ do. This made things hard to support because we
would prefer to build up a sandbox config by starting with something
extremely restrictive and only granting permissions for things the user
as explicitly allowed.
This PR changes things substantially by redefining the policy in terms
of two concepts:
* A `SandboxPermission` enum that defines permissions that can be
granted to the agent/sandbox.
* A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`,
but externally exposes a simpler API that can be used to configure
Seatbelt/Landlock.
Previous to this PR, we supported a `--sandbox` flag that effectively
mapped to an enum value in `SandboxPolicy`. Though now that
`SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single
`--sandbox` flag no longer makes sense. While I could have turned it
into a flag that the user can specify multiple times, I think the
current values to use with such a flag are long and potentially messy,
so for the moment, I have dropped support for `--sandbox` altogether and
we can bring it back once we have figured out the naming thing.
Since `--sandbox` is gone, users now have to specify `--full-auto` to
get a sandbox that allows writes in `cwd`. Admittedly, there is no clean
way to specify the equivalent of `--full-auto` in your `config.toml`
right now, so we will have to revisit that, as well.
Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy`
changed considerably, I had to overhaul how config loading works, as
well. There are now two distinct concepts, `ConfigToml` and `Config`:
* `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one
might expect, every field is `Optional` and it is `#[derive(Deserialize,
Default)]`. Consistent use of `Optional` makes it clear what the user
has specified explicitly.
* `Config` is the "normalized config" and is produced by merging
`ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw
`Option<Vec<SandboxPermission>>`, `Config` presents only the final
`SandboxPolicy`.
The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra
special attention to ensure we are faithfully mapping the
`SandboxPolicy` to the Seatbelt and Landlock configs, respectively.
Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been
renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that
`(allow file-read*)` has been removed from the `.sbpl` file as now this
is added to the policy in `core/src/exec.rs` when
`sandbox_policy.has_full_disk_read_access()` is `true`.
This PR adds a `debug landlock` subcommand to the Codex CLI for testing
how Codex would execute a command using the specified sandbox policy.
Built and ran this code in the `rust:latest` Docker container. In the
container, hitting the network with vanilla `curl` succeeds:
```
$ curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
```
whereas this fails, as expected:
```
$ cargo run -- debug landlock -s network-restricted -- curl google.com
curl: (6) getaddrinfo() thread failed to start
```
Originally, the `interactive` crate was going to be a placeholder for
building out a UX that was comparable to that of the existing TypeScript
CLI. Though after researching how Ratatui works, that seems difficult to
do because it is designed around the idea that it will redraw the full
screen buffer each time (and so any scrolling should be "internal" to
your Ratatui app) whereas the TypeScript CLI expects to render the full
history of the conversation every time(*) (which is why you can use your
terminal scrollbar to scroll it).
While it is possible to use Ratatui in a way that acts more like what
the TypeScript CLI is doing, it is awkward and seemingly results in
tedious code, so I think we should abandon that approach. As such, this
PR deletes the `interactive/` folder and the code that depended on it.
Further, since we added support for mousewheel scrolling in the TUI in
https://github.com/openai/codex/pull/641, it certainly feels much better
and the need for scroll support via the terminal scrollbar is greatly
diminished. This is now a more appropriate default UX for the
"multitool" CLI.
(*) Incidentally, I haven't verified this, but I think this results in
O(N^2) work in rendering, which seems potentially problematic for long
conversations.
This changes how instantiating `Config` works and also adds
`approval_policy` and `sandbox_policy` as fields. The idea is:
* All fields of `Config` have appropriate default values.
* `Config` is initially loaded from `~/.codex/config.toml`, so values in
`config.toml` will override those defaults.
* Clients must instantiate `Config` via
`Config::load_with_overrides(ConfigOverrides)` where `ConfigOverrides`
has optional overrides that are expected to be settable based on CLI
flags.
The `Config` should be defined early in the program and then passed
down. Now functions like `init_codex()` take fewer individual parameters
because they can just take a `Config`.
Also, `Config::load()` used to fail silently if `~/.codex/config.toml`
had a parse error and fell back to the default config. This seemed
really bad because it wasn't clear why the values in my `config.toml`
weren't getting picked up. I changed things so that
`load_with_overrides()` returns `Result<Config>` and verified that the
various CLIs print a reasonable error if `config.toml` is malformed.
Finally, I also updated the TUI to show which **sandbox** value is being
used, as we do for other key values like **model** and **approval**.
This was also a reminder that the various values of `--sandbox` are
honored on Linux but not macOS today, so I added some TODOs about fixing
that.
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.