valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
easong-openai	9285350842	Introduce `--oss` flag to use gpt-oss models (#1848 ) This adds support for easily running Codex backed by a local Ollama instance running our new open source models. See https://github.com/openai/gpt-oss for details. If you pass in `--oss` you'll be prompted to install/launch ollama, and it will automatically download the 20b model and attempt to use it. We'll likely want to expand this with some options later to make the experience smoother for users who can't run the 20b or want to run the 120b. Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-05 11:31:11 -07:00
Gabriel Peal	8828f6f082	Add an experimental plan tool (#1726 ) This adds a tool the model can call to update a plan. The tool doesn't actually _do_ anything but it gives clients a chance to read and render the structured plan. We will likely iterate on the prompt and tools exposed for planning over time.	2025-07-29 14:22:02 -04:00
aibrahim-oai	01c0896f0f	Adding interrupt Support to MCP (#1646 )	2025-07-22 20:33:49 +00:00
pakrym-oai	6d82907082	Add support for custom base instructions (#1645 ) Allows providing custom instructions file as a config parameter and custom instruction text via MCP tool call.	2025-07-22 09:42:22 -07:00
Dylan	18b2b30841	[mcp-server] Add reply tool call (#1643 ) ## Summary Adds a new mcp tool call, `codex-reply`, so we can continue existing sessions. This is a first draft and does not yet support sessions from previous processes. ## Testing - [x] tested with mcp client	2025-07-21 21:01:56 -07:00
Michael Bolin	d49d802b06	test: add integration test for MCP server (#1633 ) This PR introduces a single integration test for `cargo mcp`, though it also introduces a number of reusable components so that it should be easier to introduce more integration tests going forward. The new test is introduced in `codex-rs/mcp-server/tests/elicitation.rs` and the reusable pieces are in `codex-rs/mcp-server/tests/common`. The test itself verifies new functionality around elicitations introduced in https://github.com/openai/codex/pull/1623 (and the fix introduced in https://github.com/openai/codex/pull/1629) by doing the following: - starts a mock model provider with canned responses for `/v1/chat/completions` - starts the MCP server with a `config.toml` to use that model provider (and `approval_policy = "untrusted"`) - sends the `codex` tool call which causes the mock model provider to request a shell call for `git init` - the MCP server sends an elicitation to the client to approve the request - the client replies to the elicitation with `"approved"` - the MCP server runs the command and re-samples the model, getting a `"finish_reason": "stop"` - in turn, the MCP server sends the final response to the original `codex` tool call - verifies that `git init` ran as expected To test: ``` cargo test shell_command_approval_triggers_elicitation ``` In writing this test, I discovered that `ExecApprovalResponse` does not conform to `ElicitResult`, so I added a TODO to fix that, since I think that should be updated in a separate PR. As it stands, this PR does not update any business logic, though it does make a number of members of the `mcp-server` crate `pub` so they can be used in the test. One additional learning from this PR is that `std::process::Command::cargo_bin()` from the `assert_cmd` trait is only available for `std::process::Command`, but we really want to use `tokio::process::Command` so that everything is async and we can leverage utilities like `tokio::time::timeout()`. The trick I came up with was to use `cargo_bin()` to locate the program, and then to use `std::process::Command::get_program()` when constructing the `tokio::process::Command`.	2025-07-21 10:27:07 -07:00
Michael Bolin	e78ec00e73	chore: support MCP schema 2025-06-18 (#1621 ) This updates the schema in `generate_mcp_types.py` from `2025-03-26` to `2025-06-18`, regenerates `mcp-types/src/lib.rs`, and then updates all the code that uses `mcp-types` to honor the changes. Ran ``` npx @modelcontextprotocol/inspector just codex mcp ``` and verified that I was able to invoke the `codex` tool, as expected. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1621). * #1623 * #1622 * __->__ #1621	2025-07-19 00:09:34 -04:00
Michael Bolin	e0c08cea4f	feat: add support for --sandbox flag (#1476 ) On a high-level, we try to design `config.toml` so that you don't have to "comment out a lot of stuff" when testing different options. Previously, defining a sandbox policy was somewhat at odds with this principle because you would define the policy as attributes of `[sandbox]` like so: ```toml [sandbox] mode = "workspace-write" writable_roots = [ "/tmp" ] ``` but if you wanted to temporarily change to a read-only sandbox, you might feel compelled to modify your file to be: ```toml [sandbox] mode = "read-only" # mode = "workspace-write" # writable_roots = [ "/tmp" ] ``` Technically, commenting out `writable_roots` would not be strictly necessary, as `mode = "read-only"` would ignore `writable_roots`, but it's still a reasonable thing to do to keep things tidy. Currently, the various values for `mode` do not support that many attributes, so this is not that hard to maintain, but one could imagine this becoming more complex in the future. In this PR, we change Codex CLI so that it no longer recognizes `[sandbox]`. Instead, it introduces a top-level option, `sandbox_mode`, and `[sandbox_workspace_write]` is used to further configure the sandbox when when `sandbox_mode = "workspace-write"` is used: ```toml sandbox_mode = "workspace-write" [sandbox_workspace_write] writable_roots = [ "/tmp" ] ``` This feels a bit more future-proof in that it is less tedious to configure different sandboxes: ```toml sandbox_mode = "workspace-write" [sandbox_read_only] # read-only options here... [sandbox_workspace_write] writable_roots = [ "/tmp" ] [sandbox_danger_full_access] # danger-full-access options here... ``` In this scheme, you never need to comment out the configuration for an individual sandbox type: you only need to redefine `sandbox_mode`. Relatedly, previous to this change, a user had to do `-c sandbox.mode=read-only` to change the mode on the command line. With this change, things are arguably a bit cleaner because the equivalent option is `-c sandbox_mode=read-only` (and now `-c sandbox_workspace_write=...` can be set separately). Though more importantly, we introduce the `-s/--sandbox` option to the CLI, which maps directly to `sandbox_mode` in `config.toml`, making config override behavior easier to reason about. Moreover, as you can see in the updates to the various Markdown files, it is much easier to explain how to configure sandboxing when things like `--sandbox read-only` can be used as an example. Relatedly, this cleanup also made it straightforward to add support for a `sandbox` option for Codex when used as an MCP server (see the changes to `mcp-server/src/codex_tool_config.rs`). Fixes https://github.com/openai/codex/issues/1248.	2025-07-07 22:31:30 -07:00
Michael Bolin	72082164c1	chore: rename AskForApproval::UnlessAllowListed to AskForApproval::UnlessTrusted (#1385 ) We could just rename to `Untrusted` instead of `UnlessTrusted`, but I think `AskForApproval::UnlessTrusted` reads a bit better.	2025-06-25 12:26:13 -07:00
Michael Bolin	86d5a9d80d	chore: rename unless-allow-listed to untrusted (#1378 ) For the `approval_policy` config option, renames `unless-allow-listed` to `untrusted`. In general, when it comes to exec'ing commands, I think "trusted" is a more accurate term than "safe." Also drops the `AskForApproval::AutoEdit` variant, as we were not really making use of it, anyway. Fixes https://github.com/openai/codex/issues/1250. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1378). * #1379 * __->__ #1378	2025-06-24 22:19:21 -07:00
Michael Bolin	0776d78357	feat: redesign sandbox config (#1373 ) This is a major redesign of how sandbox configuration works and aims to fix https://github.com/openai/codex/issues/1248. Specifically, it replaces `sandbox_permissions` in `config.toml` (and the `-s`/`--sandbox-permission` CLI flags) with a "table" with effectively three variants: ```toml # Safest option: full disk is read-only, but writes and network access are disallowed. [sandbox] mode = "read-only" # The cwd of the Codex task is writable, as well as $TMPDIR on macOS. # writable_roots can be used to specify additional writable folders. [sandbox] mode = "workspace-write" writable_roots = [] # Optional, defaults to the empty list. network_access = false # Optional, defaults to false. # Disable sandboxing: use at your own risk!!! [sandbox] mode = "danger-full-access" ``` This should make sandboxing easier to reason about. While we have dropped support for `-s`, the way it works now is: - no flags => `read-only` - `--full-auto` => `workspace-write` - currently, there is no way to specify `danger-full-access` via a CLI flag, but we will revisit that as part of https://github.com/openai/codex/issues/1254 Outstanding issue: - As noted in the `TODO` on `SandboxPolicy::is_unrestricted()`, we are still conflating sandbox preferences with approval preferences in that case, which needs to be cleaned up.	2025-06-24 16:59:47 -07:00
Michael Bolin	d60f350cf8	feat: add support for -c/--config to override individual config items (#1137 ) This PR introduces support for `-c`/`--config` so users can override individual config values on the command line using `--config name=value`. Example: ``` codex --config model=o4-mini ``` Making it possible to set arbitrary config values on the command line results in a more flexible configuration scheme and makes it easier to provide single-line examples that can be copy-pasted from documentation. Effectively, it means there are four levels of configuration for some values: - Default value (e.g., `model` currently defaults to `o4-mini`) - Value in `config.toml` (e.g., user could override the default to be `model = "o3"` in their `config.toml`) - Specifying `-c` or `--config` to override `model` (e.g., user can include `-c model=o3` in their list of args to Codex) - If available, a config-specific flag can be used, which takes precedence over `-c` (e.g., user can specify `--model o3` in their list of args to Codex) Now that it is possible to specify anything that could be configured in `config.toml` on the command line using `-c`, we do not need to have a custom flag for every possible config option (which can clutter the output of `--help`). To that end, as part of this PR, we drop support for the `--disable-response-storage` flag, as users can now specify `-c disable_response_storage=true` to get the equivalent functionality. Under the hood, this works by loading the `config.toml` into a `toml::Value`. Then for each `key=value`, we create a small synthetic TOML file with `value` so that we can run the TOML parser to get the equivalent `toml::Value`. We then parse `key` to determine the point in the original `toml::Value` to do the insert/replace. Once all of the overrides from `-c` args have been applied, the `toml::Value` is deserialized into a `ConfigToml` and then the `ConfigOverrides` are applied, as before.	2025-05-27 23:11:44 -07:00
Michael Bolin	d1de7bb383	feat: add `codex_linux_sandbox_exe: Option<PathBuf>` field to Config (#1089 ) https://github.com/openai/codex/pull/1086 is a work-in-progress to make Linux sandboxing work more like Seatbelt where, for the command we want to sandbox, we build up the command and then hand it, and some sandbox configuration flags, to another command to set up the sandbox and then run it. In the case of Seatbelt, macOS provides this helper binary and provides it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and pass it through (which is what #1086 does), so this makes the new `codex_linux_sandbox_exe` available on `Config` so that it will later be available in `exec.rs` when we need it in #1086.	2025-05-22 21:52:28 -07:00
Michael Bolin	3c03c25e56	feat: introduce --profile for Rust CLI (#921 ) This introduces a much-needed "profile" concept where users can specify a collection of options under one name and then pass that via `--profile` to the CLI. This PR introduces the `ConfigProfile` struct and makes it a field of `CargoToml`. It further updates `Config::load_from_base_config_with_overrides()` to respect `ConfigProfile`, overriding default values where appropriate. A detailed unit test is added at the end of `config.rs` to verify this behavior. Details on how to use this feature have also been added to `codex-rs/README.md`.	2025-05-13 16:52:52 -07:00
jcoens-openai	f3bd143867	Disallow expect via lints (#865 ) Adds `expect()` as a denied lint. Same deal applies with `unwrap()` where we now need to put `#[expect(...` on ones that we legit want. Took care to enable `expect()` in test contexts. # Tests ``` cargo fmt cargo clippy --all-features --all-targets --no-deps -- -D warnings cargo test ```	2025-05-12 08:45:46 -07:00
Michael Bolin	86022f097e	feat: read `model_provider` and `model_providers` from config.toml (#853 ) This is the first step in supporting other model providers in the Rust CLI. Specifically, this PR adds support for the new entries in `Config` and `ConfigOverrides` to specify a `ModelProviderInfo`, which is the basic config needed for an LLM provider. This PR does not get us all the way there yet because `client.rs` still categorically appends `/responses` to the URL and expects the endpoint to support the OpenAI Responses API. Will fix that next!	2025-05-07 17:38:28 -07:00
jcoens-openai	8a89d3aeda	Update cargo to 2024 edition (#842 ) Some effects of this change: - New formatting changes across many files. No functionality changes should occur from that. - Calls to `set_env` are considered unsafe, since this only happens in tests we wrap them in `unsafe` blocks	2025-05-07 08:37:48 -07:00
Michael Bolin	2b72d05c5e	feat: make Codex available as a tool when running it as an MCP server (#811 ) This PR replaces the placeholder `"echo"` tool call in the MCP server with a `"codex"` tool that calls Codex. Events such as `ExecApprovalRequest` and `ApplyPatchApprovalRequest` are not handled properly yet, but I have `approval_policy = "never"` set in my `~/.codex/config.toml` such that those codepaths are not exercised. The schema for this MPC tool is defined by a new `CodexToolCallParam` struct introduced in this PR. It is fairly similar to `ConfigOverrides`, as the param is used to help create the `Config` used to start the Codex session, though it also includes the `prompt` used to kick off the session. This PR also introduces the use of the third-party `schemars` crate to generate the JSON schema, which is verified in the `verify_codex_tool_json_schema()` unit test. Events that are dispatched during the Codex session are sent back to the MCP client as MCP notifications. This gives the client a way to monitor progress as the tool call itself may take minutes to complete depending on the complexity of the task requested by the user. In the video below, I launched the server via: ```shell mcp-server$ RUST_LOG=debug npx @modelcontextprotocol/inspector cargo run -- ``` In the video, you can see the flow of: * requesting the list of tools * choosing the codex tool * entering a value for prompt and then making the tool call Note that I left the other fields blank because when unspecified, the values in my `~/.codex/config.toml` were used: https://github.com/user-attachments/assets/1975058c-b004-43ef-8c8d-800a953b8192 Note that while using the inspector, I did run into https://github.com/modelcontextprotocol/inspector/issues/293, though the tip about ensuring I had only one instance of the MCP Inspector tab open in my browser seemed to fix things.	2025-05-05 07:16:19 -07:00

18 Commits