Files
llmx/codex-rs/cli/src/main.rs

127 lines
3.5 KiB
Rust
Raw Normal View History

use std::path::PathBuf;
use clap::Parser;
use codex_cli::LandlockCommand;
use codex_cli::SeatbeltCommand;
use codex_cli::create_sandbox_policy;
use codex_cli::proto;
use codex_cli::seatbelt;
feat: introduce support for shell_environment_policy in config.toml (#1061) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: | Field | Type | Default | Description | | ------------------------- | -------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `inherit` | string | `core` | Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). | | `ignore_default_excludes` | boolean | `false` | When `false`, Codex removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. | | `exclude` | array&lt;string&gt; | `[]` | Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_*"`, `"AZURE_*"`. | | `set` | table&lt;string,string&gt; | `{}` | Explicit key/value overrides or additions – always win over inherited values. | | `include_only` | array&lt;string&gt; | `[]` | If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) | In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.
2025-05-22 09:51:19 -07:00
use codex_core::config::Config;
use codex_core::config::ConfigOverrides;
use codex_exec::Cli as ExecCli;
use codex_tui::Cli as TuiCli;
use crate::proto::ProtoCli;
/// Codex CLI
///
/// If no subcommand is specified, options will be forwarded to the interactive CLI.
#[derive(Debug, Parser)]
#[clap(
author,
version,
// If a subcommand is given, ignore requirements of the default args.
subcommand_negates_reqs = true
)]
struct MultitoolCli {
#[clap(flatten)]
interactive: TuiCli,
#[clap(subcommand)]
subcommand: Option<Subcommand>,
}
#[derive(Debug, clap::Subcommand)]
enum Subcommand {
/// Run Codex non-interactively.
#[clap(visible_alias = "e")]
Exec(ExecCli),
/// Experimental: run Codex as an MCP server.
Mcp,
/// Run the Protocol stream via stdin/stdout
#[clap(visible_alias = "p")]
Proto(ProtoCli),
/// Internal debugging commands.
Debug(DebugArgs),
}
#[derive(Debug, Parser)]
struct DebugArgs {
#[command(subcommand)]
cmd: DebugCommand,
}
#[derive(Debug, clap::Subcommand)]
enum DebugCommand {
/// Run a command under Seatbelt (macOS only).
Seatbelt(SeatbeltCommand),
/// Run a command under Landlock+seccomp (Linux only).
Landlock(LandlockCommand),
}
#[derive(Debug, Parser)]
struct ReplProto {}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let codex_linux_sandbox_exe: Option<PathBuf> = if cfg!(target_os = "linux") {
std::env::current_exe().ok()
} else {
None
};
let cli = MultitoolCli::parse();
match cli.subcommand {
None => {
codex_tui::run_main(cli.interactive, codex_linux_sandbox_exe)?;
}
Some(Subcommand::Exec(exec_cli)) => {
codex_exec::run_main(exec_cli, codex_linux_sandbox_exe).await?;
}
Some(Subcommand::Mcp) => {
codex_mcp_server::run_main(codex_linux_sandbox_exe).await?;
}
Some(Subcommand::Proto(proto_cli)) => {
proto::run_main(proto_cli).await?;
}
Some(Subcommand::Debug(debug_args)) => match debug_args.cmd {
DebugCommand::Seatbelt(SeatbeltCommand {
command,
sandbox,
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
full_auto,
}) => {
let sandbox_policy = create_sandbox_policy(full_auto, sandbox);
feat: introduce support for shell_environment_policy in config.toml (#1061) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: | Field | Type | Default | Description | | ------------------------- | -------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `inherit` | string | `core` | Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). | | `ignore_default_excludes` | boolean | `false` | When `false`, Codex removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. | | `exclude` | array&lt;string&gt; | `[]` | Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_*"`, `"AZURE_*"`. | | `set` | table&lt;string,string&gt; | `{}` | Explicit key/value overrides or additions – always win over inherited values. | | `include_only` | array&lt;string&gt; | `[]` | If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) | In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.
2025-05-22 09:51:19 -07:00
let config = Config::load_with_overrides(ConfigOverrides {
sandbox_policy: Some(sandbox_policy),
..Default::default()
})?;
seatbelt::run_seatbelt(command, &config).await?;
}
feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879) When using Codex to develop Codex itself, I noticed that sometimes it would try to add `#[ignore]` to the following tests: ``` keeps_previous_response_id_between_tasks() retries_on_early_close() ``` Both of these tests start a `MockServer` that launches an HTTP server on an ephemeral port and requires network access to hit it, which the Seatbelt policy associated with `--full-auto` correctly denies. If I wasn't paying attention to the code that Codex was generating, one of these `#[ignore]` annotations could have slipped into the codebase, effectively disabling the test for everyone. To that end, this PR enables an experimental environment variable named `CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the `SandboxPolicy` used to spawn the process does not have full network access. I say it is "experimental" because I'm not convinced this API is quite right, but we need to start somewhere. (It might be more appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the challenge is that our newer `SandboxPolicy` abstraction does not map to a simple set of enums like in the TypeScript CLI.) We leverage this new functionality by adding the following code to the aforementioned tests as a way to "dynamically disable" them: ```rust if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() { println!( "Skipping test because it cannot execute when network is disabled in a Codex sandbox." ); return; } ``` We can use the `debug seatbelt --full-auto` command to verify that `cargo test` fails when run under Seatbelt prior to this change: ``` $ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test ---- keeps_previous_response_id_between_tasks stdout ---- thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46: Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: keeps_previous_response_id_between_tasks test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `-p codex-core --test previous_response_id` ``` Though after this change, the above command succeeds! This means that, going forward, when Codex operates on Codex itself, when it runs `cargo test`, only "real failures" should cause the command to fail. As part of this change, I decided to tighten up the codepaths for running `exec()` for shell tool calls. In particular, we do it in `core` for the main Codex business logic itself, but we also expose this logic via `debug` subcommands in the CLI in the `cli` crate. The logic for the `debug` subcommands was not quite as faithful to the true business logic as I liked, so I: * refactored a bit of the Linux code, splitting `linux.rs` into `linux_exec.rs` and `landlock.rs` in the `core` crate. * gating less code behind `#[cfg(target_os = "linux")]` because such code does not get built by default when I develop on Mac, which means I either have to build the code in Docker or wait for CI signal * introduced `macro_rules! configure_command` in `exec.rs` so we can have both sync and async versions of this code. The synchronous version seems more appropriate for straight threads or potentially fork/exec.
2025-05-09 18:29:34 -07:00
#[cfg(unix)]
DebugCommand::Landlock(LandlockCommand {
command,
sandbox,
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
full_auto,
}) => {
let sandbox_policy = create_sandbox_policy(full_auto, sandbox);
feat: introduce support for shell_environment_policy in config.toml (#1061) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: | Field | Type | Default | Description | | ------------------------- | -------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `inherit` | string | `core` | Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). | | `ignore_default_excludes` | boolean | `false` | When `false`, Codex removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. | | `exclude` | array&lt;string&gt; | `[]` | Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_*"`, `"AZURE_*"`. | | `set` | table&lt;string,string&gt; | `{}` | Explicit key/value overrides or additions – always win over inherited values. | | `include_only` | array&lt;string&gt; | `[]` | If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) | In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.
2025-05-22 09:51:19 -07:00
let config = Config::load_with_overrides(ConfigOverrides {
sandbox_policy: Some(sandbox_policy),
..Default::default()
})?;
codex_cli::landlock::run_landlock(command, &config)?;
}
feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879) When using Codex to develop Codex itself, I noticed that sometimes it would try to add `#[ignore]` to the following tests: ``` keeps_previous_response_id_between_tasks() retries_on_early_close() ``` Both of these tests start a `MockServer` that launches an HTTP server on an ephemeral port and requires network access to hit it, which the Seatbelt policy associated with `--full-auto` correctly denies. If I wasn't paying attention to the code that Codex was generating, one of these `#[ignore]` annotations could have slipped into the codebase, effectively disabling the test for everyone. To that end, this PR enables an experimental environment variable named `CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the `SandboxPolicy` used to spawn the process does not have full network access. I say it is "experimental" because I'm not convinced this API is quite right, but we need to start somewhere. (It might be more appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the challenge is that our newer `SandboxPolicy` abstraction does not map to a simple set of enums like in the TypeScript CLI.) We leverage this new functionality by adding the following code to the aforementioned tests as a way to "dynamically disable" them: ```rust if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() { println!( "Skipping test because it cannot execute when network is disabled in a Codex sandbox." ); return; } ``` We can use the `debug seatbelt --full-auto` command to verify that `cargo test` fails when run under Seatbelt prior to this change: ``` $ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test ---- keeps_previous_response_id_between_tasks stdout ---- thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46: Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: keeps_previous_response_id_between_tasks test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `-p codex-core --test previous_response_id` ``` Though after this change, the above command succeeds! This means that, going forward, when Codex operates on Codex itself, when it runs `cargo test`, only "real failures" should cause the command to fail. As part of this change, I decided to tighten up the codepaths for running `exec()` for shell tool calls. In particular, we do it in `core` for the main Codex business logic itself, but we also expose this logic via `debug` subcommands in the CLI in the `cli` crate. The logic for the `debug` subcommands was not quite as faithful to the true business logic as I liked, so I: * refactored a bit of the Linux code, splitting `linux.rs` into `linux_exec.rs` and `landlock.rs` in the `core` crate. * gating less code behind `#[cfg(target_os = "linux")]` because such code does not get built by default when I develop on Mac, which means I either have to build the code in Docker or wait for CI signal * introduced `macro_rules! configure_command` in `exec.rs` so we can have both sync and async versions of this code. The synchronous version seems more appropriate for straight threads or potentially fork/exec.
2025-05-09 18:29:34 -07:00
#[cfg(not(unix))]
DebugCommand::Landlock(_) => {
anyhow::bail!("Landlock is only supported on Linux.");
}
},
}
Ok(())
}