valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	8595237505	fix: ensure cwd for conversation and sandbox are separate concerns (#3874 ) Previous to this PR, both of these functions take a single `cwd`: `71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)` `71038381aa/codex-rs/core/src/landlock.rs (L16-L23)` whereas `cwd` and `sandbox_cwd` should be set independently (fixed in this PR). Added `sandbox_distinguishes_command_and_policy_cwds()` to `codex-rs/exec/tests/suite/sandbox.rs` to verify this.	2025-09-18 14:37:06 -07:00
Michael Bolin	2ecca79663	fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318 ) The high-order bit on this PR is that it makes it so `sandbox.rs` tests both Mac and Linux, as we introduce a general `spawn_command_under_sandbox()` function with platform-specific implementations for testing. An important, and interesting, discovery in porting the test to Linux is that (for reasons cited in the code comments), `/dev/shm` has to be added to `writable_roots` on Linux in order for `multiprocessing.Lock` to work there. Granting write access to `/dev/shm` comes with some degree of risk, so we do not make this the default for Codex CLI. Piggybacking on top of #2317, this moves the `python_multiprocessing_lock_works` test yet again, moving `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs` because in `codex-rs/exec/tests` we can use `cargo_bin()` like so: ``` let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec"); ``` which is necessary so we can use `codex_linux_sandbox_exe` and therefore `spawn_command_under_linux_sandbox` in an integration test. This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs` and into `landlock.rs`, which makes things more consistent with `seatbelt.rs` in `codex-core`. For reference, https://github.com/openai/codex/pull/1808 is the PR that made the change to Seatbelt to get this test to pass on Mac.	2025-08-14 15:47:48 -07:00
Michael Bolin	89ef4efdcf	fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086 ) Historically, we spawned the Seatbelt and Landlock sandboxes in substantially different ways: For Seatbelt, we would run `/usr/bin/sandbox-exec` with our policy specified as an arg followed by the original command: `d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)` For Landlock/Seccomp, we would do `tokio::runtime::Builder::new_current_thread()`, _invoke Landlock/Seccomp APIs to modify the permissions of that new thread_, and then spawn the command: `d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)` While it is neat that Landlock/Seccomp supports applying a policy to only one thread without having to apply it to the entire process, it requires us to maintain two different codepaths and is a bit harder to reason about. The tipping point was https://github.com/openai/codex/pull/1061, in which we had to start building up the `env` in an unexpected way for the existing Landlock/Seccomp approach to continue to work. This PR overhauls things so that we do similar things for Mac and Linux. It turned out that we were already building our own "helper binary" comparable to Mac's `sandbox-exec` as part of the `cli` crate: `d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)` We originally created this to build a small binary to include with the Node.js version of the Codex CLI to provide support for Linux sandboxing. Though the sticky bit is that, at this point, we still want to deploy the Rust version of Codex as a single, standalone binary rather than a CLI and a supporting sandboxing binary. To satisfy this goal, we use "the arg0 trick," in which we: * use `std::env::current_exe()` to get the path to the CLI that is currently running * use the CLI as the `program` for the `Command` * set `"codex-linux-sandbox"` as arg0 for the `Command` A CLI that supports sandboxing should check arg0 at the start of the program. If it is `"codex-linux-sandbox"`, it must invoke `codex_linux_sandbox::run_main()`, which runs the CLI as if it were `codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn the original command, so do _replace_ the process rather than spawn a subprocess. Incidentally, we do this before starting the Tokio runtime, so the process should only have one thread when `execvp(3)` is called. Because the `core` crate that needs to spawn the Linux sandboxing is not a CLI in its own right, this means that every CLI that includes `core` and relies on this behavior has to (1) implement it and (2) provide the path to the sandboxing executable. While the path is almost always `std::env::current_exe()`, we needed to make this configurable for integration tests, so `Config` now has a `codex_linux_sandbox_exe: Option<PathBuf>` property to facilitate threading this through, introduced in https://github.com/openai/codex/pull/1089. This common pattern is now captured in `codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs` functions that should use it have been updated as part of this PR. The `codex-linux-sandbox` crate added to the Cargo workspace as part of this PR now has the bulk of the Landlock/Seccomp logic, which makes `core` a bit simpler. Indeed, `core/src/exec_linux.rs` and `core/src/landlock.rs` were removed/ported as part of this PR. I also moved the unit tests for this code into an integration test, `linux-sandbox/tests/landlock.rs`, in which I use `env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for `codex_linux_sandbox_exe` since `std::env::current_exe()` is not appropriate in that case.	2025-05-23 11:37:07 -07:00
Michael Bolin	cb379d7797	feat: introduce support for shell_environment_policy in config.toml (#1061 ) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: \| Field \| Type \| Default \| Description \| \| ------------------------- \| -------------------------- \| ------- \| ----------------------------------------------------------------------------------------------------------------------------------------------- \| \| `inherit` \| string \| `core` \| Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). \| \| `ignore_default_excludes` \| boolean \| `false` \| When `false`, Codex removes any var whose name contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. \| \| `exclude` \| array<string> \| `[]` \| Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_"`, `"AZURE_"`. \| \| `set` \| table<string,string> \| `{}` \| Explicit key/value overrides or additions – always win over inherited values. \| \| `include_only` \| array<string> \| `[]` \| If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) \| In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.	2025-05-22 09:51:19 -07:00
Michael Bolin	399e819c9b	fix: increase timeout for test_dev_null_write (#933 ) After updating this test in https://github.com/openai/codex/pull/923, I have been getting some timeouts with this test in CI, so increasing the timeout to match that of `test_writable_root`: `327cf41f0f/codex-rs/core/src/landlock.rs (L211-L213)`	2025-05-14 10:06:14 -07:00
Michael Bolin	5bf9445351	fix: test_dev_null_write() was not using echo as intended (#923 ) I believe this test meant to verify that echoing content to `/dev/null` succeeded, but instead, I believe it was testing the equivalent to `echo 'blah > /dev/null'`.	2025-05-13 21:40:26 -07:00
jcoens-openai	f3bd143867	Disallow expect via lints (#865 ) Adds `expect()` as a denied lint. Same deal applies with `unwrap()` where we now need to put `#[expect(...` on ones that we legit want. Took care to enable `expect()` in test contexts. # Tests ``` cargo fmt cargo clippy --all-features --all-targets --no-deps -- -D warnings cargo test ```	2025-05-12 08:45:46 -07:00
Michael Bolin	fde48aaa0d	feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879 ) When using Codex to develop Codex itself, I noticed that sometimes it would try to add `#[ignore]` to the following tests: ``` keeps_previous_response_id_between_tasks() retries_on_early_close() ``` Both of these tests start a `MockServer` that launches an HTTP server on an ephemeral port and requires network access to hit it, which the Seatbelt policy associated with `--full-auto` correctly denies. If I wasn't paying attention to the code that Codex was generating, one of these `#[ignore]` annotations could have slipped into the codebase, effectively disabling the test for everyone. To that end, this PR enables an experimental environment variable named `CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the `SandboxPolicy` used to spawn the process does not have full network access. I say it is "experimental" because I'm not convinced this API is quite right, but we need to start somewhere. (It might be more appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the challenge is that our newer `SandboxPolicy` abstraction does not map to a simple set of enums like in the TypeScript CLI.) We leverage this new functionality by adding the following code to the aforementioned tests as a way to "dynamically disable" them: ```rust if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() { println!( "Skipping test because it cannot execute when network is disabled in a Codex sandbox." ); return; } ``` We can use the `debug seatbelt --full-auto` command to verify that `cargo test` fails when run under Seatbelt prior to this change: ``` $ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test ---- keeps_previous_response_id_between_tasks stdout ---- thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46: Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: keeps_previous_response_id_between_tasks test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `-p codex-core --test previous_response_id` ``` Though after this change, the above command succeeds! This means that, going forward, when Codex operates on Codex itself, when it runs `cargo test`, only "real failures" should cause the command to fail. As part of this change, I decided to tighten up the codepaths for running `exec()` for shell tool calls. In particular, we do it in `core` for the main Codex business logic itself, but we also expose this logic via `debug` subcommands in the CLI in the `cli` crate. The logic for the `debug` subcommands was not quite as faithful to the true business logic as I liked, so I: * refactored a bit of the Linux code, splitting `linux.rs` into `linux_exec.rs` and `landlock.rs` in the `core` crate. * gating less code behind `#[cfg(target_os = "linux")]` because such code does not get built by default when I develop on Mac, which means I either have to build the code in Docker or wait for CI signal * introduced `macro_rules! configure_command` in `exec.rs` so we can have both sync and async versions of this code. The synchronous version seems more appropriate for straight threads or potentially fork/exec.	2025-05-09 18:29:34 -07:00

8 Commits