Commit Graph

5 Commits

Author SHA1 Message Date
Michael Bolin
d60f350cf8 feat: add support for -c/--config to override individual config items (#1137)
This PR introduces support for `-c`/`--config` so users can override
individual config values on the command line using `--config
name=value`. Example:

```
codex --config model=o4-mini
```

Making it possible to set arbitrary config values on the command line
results in a more flexible configuration scheme and makes it easier to
provide single-line examples that can be copy-pasted from documentation.

Effectively, it means there are four levels of configuration for some
values:

- Default value (e.g., `model` currently defaults to `o4-mini`)
- Value in `config.toml` (e.g., user could override the default to be
`model = "o3"` in their `config.toml`)
- Specifying `-c` or `--config` to override `model` (e.g., user can
include `-c model=o3` in their list of args to Codex)
- If available, a config-specific flag can be used, which takes
precedence over `-c` (e.g., user can specify `--model o3` in their list
of args to Codex)

Now that it is possible to specify anything that could be configured in
`config.toml` on the command line using `-c`, we do not need to have a
custom flag for every possible config option (which can clutter the
output of `--help`). To that end, as part of this PR, we drop support
for the `--disable-response-storage` flag, as users can now specify `-c
disable_response_storage=true` to get the equivalent functionality.

Under the hood, this works by loading the `config.toml` into a
`toml::Value`. Then for each `key=value`, we create a small synthetic
TOML file with `value` so that we can run the TOML parser to get the
equivalent `toml::Value`. We then parse `key` to determine the point in
the original `toml::Value` to do the insert/replace. Once all of the
overrides from `-c` args have been applied, the `toml::Value` is
deserialized into a `ConfigToml` and then the `ConfigOverrides` are
applied, as before.
2025-05-27 23:11:44 -07:00
Michael Bolin
89ef4efdcf fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086)
Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:

For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:


d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)

For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:


d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)

While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.

This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:


d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)

We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.

Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:

* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`

A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.

Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.

This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.

The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
2025-05-23 11:37:07 -07:00
Michael Bolin
fde48aaa0d feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879)
When using Codex to develop Codex itself, I noticed that sometimes it
would try to add `#[ignore]` to the following tests:

```
keeps_previous_response_id_between_tasks()
retries_on_early_close()
```

Both of these tests start a `MockServer` that launches an HTTP server on
an ephemeral port and requires network access to hit it, which the
Seatbelt policy associated with `--full-auto` correctly denies. If I
wasn't paying attention to the code that Codex was generating, one of
these `#[ignore]` annotations could have slipped into the codebase,
effectively disabling the test for everyone.

To that end, this PR enables an experimental environment variable named
`CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the
`SandboxPolicy` used to spawn the process does not have full network
access. I say it is "experimental" because I'm not convinced this API is
quite right, but we need to start somewhere. (It might be more
appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the
challenge is that our newer `SandboxPolicy` abstraction does not map to
a simple set of enums like in the TypeScript CLI.)

We leverage this new functionality by adding the following code to the
aforementioned tests as a way to "dynamically disable" them:

```rust
if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() {
    println!(
        "Skipping test because it cannot execute when network is disabled in a Codex sandbox."
    );
    return;
}
```

We can use the `debug seatbelt --full-auto` command to verify that
`cargo test` fails when run under Seatbelt prior to this change:

```
$ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test
---- keeps_previous_response_id_between_tasks stdout ----

thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46:
Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    keeps_previous_response_id_between_tasks

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `-p codex-core --test previous_response_id`
```

Though after this change, the above command succeeds! This means that,
going forward, when Codex operates on Codex itself, when it runs `cargo
test`, only "real failures" should cause the command to fail.

As part of this change, I decided to tighten up the codepaths for
running `exec()` for shell tool calls. In particular, we do it in `core`
for the main Codex business logic itself, but we also expose this logic
via `debug` subcommands in the CLI in the `cli` crate. The logic for the
`debug` subcommands was not quite as faithful to the true business logic
as I liked, so I:

* refactored a bit of the Linux code, splitting `linux.rs` into
`linux_exec.rs` and `landlock.rs` in the `core` crate.
* gating less code behind `#[cfg(target_os = "linux")]` because such
code does not get built by default when I develop on Mac, which means I
either have to build the code in Docker or wait for CI signal
* introduced `macro_rules! configure_command` in `exec.rs` so we can
have both sync and async versions of this code. The synchronous version
seems more appropriate for straight threads or potentially fork/exec.
2025-05-09 18:29:34 -07:00
Michael Bolin
c577e94b67 chore: introduce codex-common crate (#843)
I started this PR because I wanted to share the `format_duration()`
utility function in `codex-rs/exec/src/event_processor.rs` with the TUI.
The question was: where to put it?

`core` should have as few dependencies as possible, so moving it there
would introduce a dependency on `chrono`, which seemed undesirable.
`core` already had this `cli` feature to deal with a similar situation
around sharing common utility functions, so I decided to:

* make `core` feature-free
* introduce `common`
* `common` can have as many "special interest" features as it needs,
each of which can declare their own deps
* the first two features of common are `cli` and `elapsed`

In practice, this meant updating a number of `Cargo.toml` files,
replacing this line:

```toml
codex-core = { path = "../core", features = ["cli"] }
```

with these:

```toml
codex-core = { path = "../core" }
codex-common = { path = "../common", features = ["cli"] }
```

Moving `format_duration()` into its own file gave it some "breathing
room" to add a unit test, so I had Codex generate some tests and new
support for durations over 1 minute.
2025-05-06 17:38:56 -07:00
Michael Bolin
411bfeb410 feat: codex-linux-sandbox standalone executable (#740)
This introduces a standalone executable that run the equivalent of the
`codex debug landlock` subcommand and updates `rust-release.yml` to
include it in the release.

The idea is that we will include this small binary with the TypeScript
CLI to provide support for Linux sandboxing.
2025-04-29 19:21:26 -07:00