valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	c6fcec55fe	fix: always send full instructions when using the Responses API (#1207 ) This fixes a longstanding error in the Rust CLI where `codex.rs` contained an errant `is_first_turn` check that would exclude the user instructions for subsequent "turns" of a conversation when using the responses API (i.e., when `previous_response_id` existed). While here, renames `Prompt.instructions` to `Prompt.user_instructions` since we now have quite a few levels of instructions floating around. Also removed an unnecessary use of `clone()` in `Prompt.get_full_instructions()`.	2025-06-03 09:40:19 -07:00
Michael Bolin	6fcc528a43	fix: provide tolerance for apply_patch tool (#993 ) As explained in detail in the doc comment for `ParseMode::Lenient`, we have observed that GPT-4.1 does not always generate a valid invocation of `apply_patch`. Fortunately, the error is predictable, so we introduce some new logic to the `codex-apply-patch` crate to recover from this error. Because we would like to avoid this becoming a de facto standard (as it would be incompatible if `apply_patch` were provided as an actual executable, unless we also introduced the lenient behavior in the executable, as well), we require passing `ParseMode::Lenient` to `parse_patch_text()` to make it clear that the caller is opting into supporting this special case. Note the analogous change to the TypeScript CLI was https://github.com/openai/codex/pull/930. In addition to changing the accepted input to `apply_patch`, it also introduced additional instructions for the model, which we include in this PR. Note that `apply-patch` does not depend on either `regex` or `regex-lite`, so some of the checks are slightly more verbose to avoid introducing this dependency. That said, this PR does not leverage the existing `extract_heredoc_body_from_apply_patch_command()`, which depends on `tree-sitter` and `tree-sitter-bash`: `5a5aa89914/codex-rs/apply-patch/src/lib.rs (L191-L246)` though perhaps it should.	2025-06-03 09:06:38 -07:00
Michael Bolin	0f3cc8f842	feat: make reasoning effort/summaries configurable (#1199 ) Previous to this PR, we always set `reasoning` when making a request using the Responses API: `d7245cbbc9/codex-rs/core/src/client.rs (L108-L111)` Though if you tried to use the Rust CLI with `--model gpt-4.1`, this would fail with: ```shell "Unsupported parameter: 'reasoning.effort' is not supported with this model." ``` We take a cue from the TypeScript CLI, which does a check on the model name: `d7245cbbc9/codex-cli/src/utils/agent/agent-loop.ts (L786-L789)` This PR does a similar check, though also adds support for the following config options: ``` model_reasoning_effort = "low" \| "medium" \| "high" \| "none" model_reasoning_summary = "auto" \| "concise" \| "detailed" \| "none" ``` This way, if you have a model whose name happens to start with `"o"` (or `"codex"`?), you can set these to `"none"` to explicitly disable reasoning, if necessary. (That said, it seems unlikely anyone would use the Responses API with non-OpenAI models, but we provide an escape hatch, anyway.) This PR also updates both the TUI and `codex exec` to show `reasoning effort` and `reasoning summaries` in the header.	2025-06-02 16:01:34 -07:00
Michael Bolin	d7245cbbc9	fix: chat completions API now also passes tools along (#1167 ) Prior to this PR, there were two big misses in `chat_completions.rs`: 1. The loop in `stream_chat_completions()` was only including items of type `ResponseItem::Message` when building up the `"messages"` JSON for the `POST` request to the `chat/completions` endpoint. This fixes things by ensuring other variants (`FunctionCall`, `LocalShellCall`, and `FunctionCallOutput`) are included, as well. 2. In `process_chat_sse()`, we were not recording tool calls and were only emitting items of type `ResponseEvent::OutputItemDone(ResponseItem::Message)` to the stream. Now we introduce `FunctionCallState`, which is used to accumulate the `delta`s of type `tool_calls`, so we can ultimately emit a `ResponseItem::FunctionCall`, when appropriate. While function calling now appears to work for chat completions with my local testing, I believe that there are still edge cases that are not covered and that this codepath would benefit from a battery of integration tests. (As part of that further cleanup, we should also work to support streaming responses in the UI.) The other important part of this PR is some cleanup in `core/src/codex.rs`. In particular, it was hard to reason about how `run_task()` was building up the list of messages to include in a request across the various cases: - Responses API - Chat Completions API - Responses API used in concert with ZDR I like to think things are a bit cleaner now where: - `zdr_transcript` (if present) contains all messages in the history of the conversation, which includes function call outputs that have not been sent back to the model yet - `pending_input` includes any messages the user has submitted while the turn is in flight that need to be injected as part of the next `POST` to the model - `input_for_next_turn` includes the tool call outputs that have not been sent back to the model yet	2025-06-02 13:47:51 -07:00
Michael Bolin	e40f86b446	chore: logging cleanup (#1196 ) Update what we log to make `RUST_LOG=debug` a bit easier to work with. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1196). * #1167 * __->__ #1196	2025-06-02 13:31:33 -07:00
Michael Bolin	e81327e5f4	feat: add hide_agent_reasoning config option (#1181 ) This PR introduces a `hide_agent_reasoning` config option (that defaults to `false`) that users can enable to make the output less verbose by suppressing reasoning output. To test, verified that this includes agent reasoning in the output: ``` echo hello \| just exec ``` whereas this does not: ``` echo hello \| just exec --config hide_agent_reasoning=false ```	2025-05-30 23:14:56 -07:00
Michael Bolin	1bf82056b3	fix: introduce `create_tools_json()` and share it with chat_completions.rs (#1177 ) The main motivator behind this PR is that `stream_chat_completions()` was not adding the `"tools"` entry to the payload posted to the `/chat/completions` endpoint. This (1) refactors the existing logic to build up the `"tools"` JSON from `client.rs` into `openai_tools.rs`, and (2) updates the use of responses API (`client.rs`) and chat completions API (`chat_completions.rs`) to both use it. Note this PR alone is not sufficient to get tool calling from chat completions working: that is done in https://github.com/openai/codex/pull/1167. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1177). * #1167 * __->__ #1177	2025-05-30 14:07:03 -07:00
Fouad Matin	828e2062c2	fix(codex-rs): use codex-mini-latest as default (#1164 )	2025-05-29 16:55:19 -07:00
Michael Bolin	a768a6a41d	fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151 ) The output of an MCP server tool call can be one of several types, but to date, we treated all outputs as text by showing the serialized JSON as the "tool output" in Codex: `25a9949c49/codex-rs/mcp-types/src/lib.rs (L96-L101)` This PR adds support for the `ImageContent` variant so we can now display an image output from an MCP tool call. In making this change, we introduce a new `ResponseInputItem::McpToolCallOutput` variant so that we can work with the `mcp_types::CallToolResult` directly when the function call is made to an MCP server. Though arguably the more significant change is the introduction of `HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that uses `ratatui_image` to render an image into the terminal. To support this, we introduce `ImageRenderCache`, cache a `ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the appropriate scaled image data and dimensions based on the current terminal size. To test, I created a minimal `package.json`: ```json { "name": "kitty-mcp", "version": "1.0.0", "type": "module", "description": "MCP that returns image of kitty", "main": "index.js", "dependencies": { "@modelcontextprotocol/sdk": "^1.12.0" } } ``` with the following `index.js` to define the MCP server: ```js #!/usr/bin/env node import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { readFile } from "node:fs/promises"; import { join } from "node:path"; const IMAGE_URI = "image://Ada.png"; const server = new McpServer({ name: "Demo", version: "1.0.0", }); server.tool( "get-cat-image", "If you need a cat image, this tool will provide one.", async () => ({ content: [ { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" }, ], }) ); server.resource("Ada the Cat", IMAGE_URI, async (uri) => { const base64Image = await getAdaPngBase64(); return { contents: [ { uri: uri.href, mimeType: "image/png", blob: base64Image, }, ], }; }); async function getAdaPngBase64() { const __dirname = new URL(".", import.meta.url).pathname; // From `9705ce2c59/assets/Ada.png` const filePath = join(__dirname, "Ada.png"); const imageData = await readFile(filePath); const base64Image = imageData.toString("base64"); return base64Image; } const transport = new StdioServerTransport(); await server.connect(transport); ``` With the local changes from this PR, I added the following to my `config.toml`: ```toml [mcp_servers.kitty] command = "node" args = ["/Users/mbolin/code/kitty-mcp/index.js"] ``` Running the TUI from source: ``` cargo run --bin codex -- --model o3 'I need a picture of a cat' ``` I get: <img width="732" alt="image" src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869" /> Now, that said, I have only tested in iTerm and there is definitely some funny business with getting an accurate character-to-pixel ratio (sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10 rows to render instead of 4), so there is still work to be done here.	2025-05-28 19:03:17 -07:00
Michael Bolin	25a9949c49	fix: ensure inputSchema for MCP tool always has "properties" field when talking to OpenAI (#1150 ) As noted in the comment introduced in this PR, this is analogous to the issue reported in https://github.com/openai/openai-agents-python/issues/449. This seems to work now.	2025-05-28 17:17:21 -07:00
Michael Bolin	d60f350cf8	feat: add support for -c/--config to override individual config items (#1137 ) This PR introduces support for `-c`/`--config` so users can override individual config values on the command line using `--config name=value`. Example: ``` codex --config model=o4-mini ``` Making it possible to set arbitrary config values on the command line results in a more flexible configuration scheme and makes it easier to provide single-line examples that can be copy-pasted from documentation. Effectively, it means there are four levels of configuration for some values: - Default value (e.g., `model` currently defaults to `o4-mini`) - Value in `config.toml` (e.g., user could override the default to be `model = "o3"` in their `config.toml`) - Specifying `-c` or `--config` to override `model` (e.g., user can include `-c model=o3` in their list of args to Codex) - If available, a config-specific flag can be used, which takes precedence over `-c` (e.g., user can specify `--model o3` in their list of args to Codex) Now that it is possible to specify anything that could be configured in `config.toml` on the command line using `-c`, we do not need to have a custom flag for every possible config option (which can clutter the output of `--help`). To that end, as part of this PR, we drop support for the `--disable-response-storage` flag, as users can now specify `-c disable_response_storage=true` to get the equivalent functionality. Under the hood, this works by loading the `config.toml` into a `toml::Value`. Then for each `key=value`, we create a small synthetic TOML file with `value` so that we can run the TOML parser to get the equivalent `toml::Value`. We then parse `key` to determine the point in the original `toml::Value` to do the insert/replace. Once all of the overrides from `-c` args have been applied, the `toml::Value` is deserialized into a `ConfigToml` and then the `ConfigOverrides` are applied, as before.	2025-05-27 23:11:44 -07:00
Michael Bolin	29d154cb13	fix: use o4-mini as the default model (#1135 ) Rollback of https://github.com/openai/codex/pull/972.	2025-05-27 09:12:55 -07:00
Michael Bolin	89ef4efdcf	fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086 ) Historically, we spawned the Seatbelt and Landlock sandboxes in substantially different ways: For Seatbelt, we would run `/usr/bin/sandbox-exec` with our policy specified as an arg followed by the original command: `d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)` For Landlock/Seccomp, we would do `tokio::runtime::Builder::new_current_thread()`, _invoke Landlock/Seccomp APIs to modify the permissions of that new thread_, and then spawn the command: `d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)` While it is neat that Landlock/Seccomp supports applying a policy to only one thread without having to apply it to the entire process, it requires us to maintain two different codepaths and is a bit harder to reason about. The tipping point was https://github.com/openai/codex/pull/1061, in which we had to start building up the `env` in an unexpected way for the existing Landlock/Seccomp approach to continue to work. This PR overhauls things so that we do similar things for Mac and Linux. It turned out that we were already building our own "helper binary" comparable to Mac's `sandbox-exec` as part of the `cli` crate: `d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)` We originally created this to build a small binary to include with the Node.js version of the Codex CLI to provide support for Linux sandboxing. Though the sticky bit is that, at this point, we still want to deploy the Rust version of Codex as a single, standalone binary rather than a CLI and a supporting sandboxing binary. To satisfy this goal, we use "the arg0 trick," in which we: * use `std::env::current_exe()` to get the path to the CLI that is currently running * use the CLI as the `program` for the `Command` * set `"codex-linux-sandbox"` as arg0 for the `Command` A CLI that supports sandboxing should check arg0 at the start of the program. If it is `"codex-linux-sandbox"`, it must invoke `codex_linux_sandbox::run_main()`, which runs the CLI as if it were `codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn the original command, so do _replace_ the process rather than spawn a subprocess. Incidentally, we do this before starting the Tokio runtime, so the process should only have one thread when `execvp(3)` is called. Because the `core` crate that needs to spawn the Linux sandboxing is not a CLI in its own right, this means that every CLI that includes `core` and relies on this behavior has to (1) implement it and (2) provide the path to the sandboxing executable. While the path is almost always `std::env::current_exe()`, we needed to make this configurable for integration tests, so `Config` now has a `codex_linux_sandbox_exe: Option<PathBuf>` property to facilitate threading this through, introduced in https://github.com/openai/codex/pull/1089. This common pattern is now captured in `codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs` functions that should use it have been updated as part of this PR. The `codex-linux-sandbox` crate added to the Cargo workspace as part of this PR now has the bulk of the Landlock/Seccomp logic, which makes `core` a bit simpler. Indeed, `core/src/exec_linux.rs` and `core/src/landlock.rs` were removed/ported as part of this PR. I also moved the unit tests for this code into an integration test, `linux-sandbox/tests/landlock.rs`, in which I use `env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for `codex_linux_sandbox_exe` since `std::env::current_exe()` is not appropriate in that case.	2025-05-23 11:37:07 -07:00
Michael Bolin	d1de7bb383	feat: add `codex_linux_sandbox_exe: Option<PathBuf>` field to Config (#1089 ) https://github.com/openai/codex/pull/1086 is a work-in-progress to make Linux sandboxing work more like Seatbelt where, for the command we want to sandbox, we build up the command and then hand it, and some sandbox configuration flags, to another command to set up the sandbox and then run it. In the case of Seatbelt, macOS provides this helper binary and provides it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and pass it through (which is what #1086 does), so this makes the new `codex_linux_sandbox_exe` available on `Config` so that it will later be available in `exec.rs` when we need it in #1086.	2025-05-22 21:52:28 -07:00
Michael Bolin	cb379d7797	feat: introduce support for shell_environment_policy in config.toml (#1061 ) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: \| Field \| Type \| Default \| Description \| \| ------------------------- \| -------------------------- \| ------- \| ----------------------------------------------------------------------------------------------------------------------------------------------- \| \| `inherit` \| string \| `core` \| Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). \| \| `ignore_default_excludes` \| boolean \| `false` \| When `false`, Codex removes any var whose name contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. \| \| `exclude` \| array<string> \| `[]` \| Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_"`, `"AZURE_"`. \| \| `set` \| table<string,string> \| `{}` \| Explicit key/value overrides or additions – always win over inherited values. \| \| `include_only` \| array<string> \| `[]` \| If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) \| In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.	2025-05-22 09:51:19 -07:00
Michael Bolin	5746561428	chore: move types out of config.rs into config_types.rs (#1054 ) `config.rs` is already quite long without these definitions. Since they have no real dependencies of their own, let's move them to their own file so `config.rs` can focus on the business logic of loading a config.	2025-05-20 11:55:25 -07:00
Michael Bolin	d766e845b3	feat: experimental --output-last-message flag to exec subcommand (#1037 ) This introduces an experimental `--output-last-message` flag that can be used to identify a file where the final message from the agent will be written. Two use cases: - Ultimately, we will likely add a `--quiet` option to `exec`, but even if the user does not want any output written to the terminal, they probably want to know what the agent did. Writing the output to a file makes it possible to get that information in a clean way. - Relatedly, when using `exec` in CI, it is easier to review the transcript written "normally," (i.e., not as JSON or something with extra escapes), but getting programmatic access to the last message is likely helpful, so writing the last message to a file gets the best of both worlds. I am calling this "experimental" because it is possible that we are overfitting and will want a more general solution to this problem that would justify removing this flag.	2025-05-19 16:08:18 -07:00
Michael Bolin	1dc14cefa1	fix: make codex-mini-latest the default model in the Rust TUI (#972 ) It's time to make `codex-mini-latest` the new default, as this should be an "evergreen" model pointer. * Equivalent change in TypeScript https://github.com/openai/codex/pull/951 * See some notes about using `codex-mini-latest` with MCP in https://github.com/openai/codex/pull/961	2025-05-16 17:08:18 -07:00
Michael Bolin	7ca84087e6	feat: make it possible to toggle mouse mode in the Rust TUI (#971 ) I did a bit of research to understand why I could not use my mouse to drag to select text to copy to the clipboard in iTerm. Apparently https://github.com/openai/codex/pull/641 to enable mousewheel scrolling broke this functionality. It seems that, unless we put in a bit of effort, we can have drag-to-select or scrolling, but not both. Though if you know the trick to hold down `Option` will dragging with the mouse in iTerm, you can probably get by with this. (I did not know about this option prior to researching this issue.) Nevertheless, users may still prefer to disable mouse capture altogether, so this PR introduces: * the ability to set `tui.disable_mouse_capture = true` in `config.toml` to disable mouse capture * a new command, `/toggle-mouse-mode` to toggle mouse capture	2025-05-16 16:16:50 -07:00
Michael Bolin	f48dd99f22	feat: add support for OpenAI tool type, local_shell (#961 ) The new `codex-mini-latest` model expects a new tool with `{"type": "local_shell"}`. Its contract is similar to the existing `function` tool with `"name": "shell"`, so this takes the `local_shell` tool call into `ExecParams` and sends it through the existing `handle_container_exec_with_params()` code path. This also adds the following logic when adding the default set of tools to a request: ```rust let default_tools = if self.model.starts_with("codex") { &DEFAULT_CODEX_MODEL_TOOLS } else { &DEFAULT_TOOLS }; ``` That is, if the model name starts with `"codex"`, we add `{"type": "local_shell"}` to the list of tools; otherwise, we add the aforementioned `shell` tool. To test this, I ran the TUI with `-m codex-mini-latest` and verified that it used the `local_shell` tool. Though I also had some entries in `[mcp_servers]` in my personal `config.toml`. The `codex-mini-latest` model seemed eager to try the tools from the MCP servers first, so I have personally commented them out for now, so keep an eye out if you're testing `codex-mini-latest`! Perhaps we should include more details with `{"type": "local_shell"}` or update the following: `fd0b1b0208/codex-rs/core/prompt.md` For reference, the corresponding change in the TypeScript CLI is https://github.com/openai/codex/pull/951.	2025-05-16 14:38:08 -07:00
Michael Bolin	dfd54e1433	chore: refactor handle_function_call() into smaller functions (#965 ) Overall, `codex.rs` is still far too large, but at least there's less indenting now that things have been moved into smaller functions. This will also make it easier to introduce the `local_shell` tool in https://github.com/openai/codex/pull/961. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/965). * #961 * __->__ #965	2025-05-16 14:17:10 -07:00
Michael Bolin	1e39189393	feat: add support for file_opener option in Rust, similiar to #911 (#957 ) This ports the enhancement introduced in https://github.com/openai/codex/pull/911 (and the fixes in https://github.com/openai/codex/pull/919) for the TypeScript CLI to the Rust one.	2025-05-16 11:33:08 -07:00
Michael Bolin	3d9f4fcd8a	fix: introduce ExtractHeredocError that implements PartialEq (#958 )	2025-05-16 09:42:27 -07:00
Michael Bolin	ce2ecbe72f	feat: record messages from user in ~/.codex/history.jsonl (#939 ) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: `3fdf9df133/codex-cli/src/utils/storage/command-history.ts (L10-L17)` We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.	2025-05-15 16:26:23 -07:00
Michael Bolin	ec5e82b77c	chore: pin Rust version to 1.86 and use io::Error::other to prepare for 1.87 (#947 ) Previously, our GitHub actions specified the Rust toolchain as `dtolnay/rust-toolchain@stable`, which meant the version could change out from under us. In this case, the move from 1.86 to 1.87 introduced new clippy warnings, causing build failures. Because it will take a little time to fix all the new clippy warnings, this PR pins things to 1.86 for now to unbreak the build. It also replaces `io::Error::new(io::ErrorKind::Other)` with `io::Error::other()` in preparation for 1.87.	2025-05-15 14:07:16 -07:00
Michael Bolin	5fc9fc3e3e	chore: expose codex_home via Config (#941 )	2025-05-15 00:30:13 -07:00
Michael Bolin	34aa1991f1	chore: handle all cases for EventMsg (#936 ) For now, this removes the `#[non_exhaustive]` directive on `EventMsg` so that we are forced to handle all `EventMsg` by default. (We may revisit this if/when we publish `core/` as a `lib` crate.) For now, it is helpful to have this as a forcing function because we have effectively two UIs (`tui` and `exec`) and usually when we add a new variant to `EventMsg`, we want to be sure that we update both.	2025-05-14 13:36:43 -07:00
Michael Bolin	399e819c9b	fix: increase timeout for test_dev_null_write (#933 ) After updating this test in https://github.com/openai/codex/pull/923, I have been getting some timeouts with this test in CI, so increasing the timeout to match that of `test_writable_root`: `327cf41f0f/codex-rs/core/src/landlock.rs (L211-L213)`	2025-05-14 10:06:14 -07:00
Yaroslav Halchenko	327cf41f0f	Add codespell support (config, workflow to detect/not fix) and make it fix some typos (#903 ) More about codespell: https://github.com/codespell-project/codespell . I personally introduced it to dozens if not hundreds of projects already and so far only positive feedback. CI workflow has 'permissions' set only to 'read' so also should be safe. Let me know if just want to take typo fixes in and get rid of the CI --------- Signed-off-by: Yaroslav O. Halchenko <debian@onerussian.com>	2025-05-14 09:39:49 -07:00
Michael Bolin	5bf9445351	fix: test_dev_null_write() was not using echo as intended (#923 ) I believe this test meant to verify that echoing content to `/dev/null` succeeded, but instead, I believe it was testing the equivalent to `echo 'blah > /dev/null'`.	2025-05-13 21:40:26 -07:00
Michael Bolin	a5f3a34827	fix: change EventMsg enum so every variant takes a single struct (#925 ) https://github.com/openai/codex/pull/922 did this for the `SessionConfigured` enum variant, and I think it is generally helpful to be able to work with the values as each enum variant as their own type, so this converts the remaining variants and updates all of the callsites. Added a simple unit test to verify that the JSON-serialized version of `Event` does not have any unexpected nesting.	2025-05-13 20:44:42 -07:00
Michael Bolin	e6c206d19d	fix: tighten up some logic around session timestamps and ids (#922 ) * update `SessionConfigured` event to include the UUID for the session * show the UUID in the Rust TUI * use local timestamps in log files instead of UTC * include timestamps in log file names for easier discovery	2025-05-13 19:22:16 -07:00
Michael Bolin	3c03c25e56	feat: introduce --profile for Rust CLI (#921 ) This introduces a much-needed "profile" concept where users can specify a collection of options under one name and then pass that via `--profile` to the CLI. This PR introduces the `ConfigProfile` struct and makes it a field of `CargoToml`. It further updates `Config::load_from_base_config_with_overrides()` to respect `ConfigProfile`, overriding default values where appropriate. A detailed unit test is added at the end of `config.rs` to verify this behavior. Details on how to use this feature have also been added to `codex-rs/README.md`.	2025-05-13 16:52:52 -07:00
Michael Bolin	61b881d4e5	fix: agent instructions were not being included when ~/.codex/instructions.md was empty (#908 ) I had seen issues where `codex-rs` would not always write files without me pressuring it to do so, and between that and the report of https://github.com/openai/codex/issues/900, I decided to look into this further. I found two serious issues with agent instructions: (1) We were only sending agent instructions on the first turn, but looking at the TypeScript code, we should be sending them on every turn. (2) There was a serious issue where the agent instructions were frequently lost: * The TypeScript CLI appears to keep writing `~/.codex/instructions.md`: `55142e3e6c/codex-cli/src/utils/config.ts (L586)` * If `instructions.md` is present, the Rust CLI uses the contents of it INSTEAD OF the default prompt, even if `instructions.md` is empty: `55142e3e6c/codex-rs/core/src/config.rs (L202-L203)` The combination of these two things means that I have been using `codex-rs` without these key instructions: https://github.com/openai/codex/blob/main/codex-rs/core/prompt.md Looking at the TypeScript code, it appears we should be concatenating these three items every time (if they exist): * `prompt.md` * `~/.codex/instructions.md` * nearest `AGENTS.md` This PR fixes things so that: * `Config.instructions` is `None` if `instructions.md` is empty * `Payload.instructions` is now `&'a str` instead of `Option<&'a String>` because we should always have _something_ to send * `Prompt` now has a `get_full_instructions()` helper that returns a `Cow<str>` that will always include the agent instructions first.	2025-05-12 17:24:44 -07:00
Michael Bolin	115fb0b95d	fix: navigate initialization phase before tools/list request in MCP client (#904 ) Apparently the MCP server implemented in JavaScript did not require the `initialize` handshake before responding to tool list/call, so I missed this.	2025-05-12 15:15:26 -07:00
jcoens-openai	f3bd143867	Disallow expect via lints (#865 ) Adds `expect()` as a denied lint. Same deal applies with `unwrap()` where we now need to put `#[expect(...` on ones that we legit want. Took care to enable `expect()` in test contexts. # Tests ``` cargo fmt cargo clippy --all-features --all-targets --no-deps -- -D warnings cargo test ```	2025-05-12 08:45:46 -07:00
Michael Bolin	b4785b5f88	feat: include "reasoning" messages in Rust TUI (#892 ) As shown in the screenshot, we now include reasoning messages from the model in the TUI under the heading "codex reasoning": ![image](https://github.com/user-attachments/assets/d8eb3dc3-2f9f-4e95-847e-d24b421249a8) To ensure these are visible by default when using `o4-mini`, this also changes the default value for `summary` (formerly `generate_summary`, which is deprecated in favor of `summary` according to the docs) from unset to `"auto"`.	2025-05-10 21:43:27 -07:00
Michael Bolin	2b122da087	feat: add support for AGENTS.md in Rust CLI (#885 ) The TypeScript CLI already has support for including the contents of `AGENTS.md` in the instructions sent with the first turn of a conversation. This PR brings this functionality to the Rust CLI. To be considered, `AGENTS.md` must be in the `cwd` of the session, or in one of the parent folders up to a Git/filesystem root (whichever is encountered first). By default, a maximum of 32 KiB of `AGENTS.md` will be included, though this is configurable using the new-in-this-PR `project_doc_max_bytes` option in `config.toml`.	2025-05-10 17:52:59 -07:00
Michael Bolin	fde48aaa0d	feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879 ) When using Codex to develop Codex itself, I noticed that sometimes it would try to add `#[ignore]` to the following tests: ``` keeps_previous_response_id_between_tasks() retries_on_early_close() ``` Both of these tests start a `MockServer` that launches an HTTP server on an ephemeral port and requires network access to hit it, which the Seatbelt policy associated with `--full-auto` correctly denies. If I wasn't paying attention to the code that Codex was generating, one of these `#[ignore]` annotations could have slipped into the codebase, effectively disabling the test for everyone. To that end, this PR enables an experimental environment variable named `CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the `SandboxPolicy` used to spawn the process does not have full network access. I say it is "experimental" because I'm not convinced this API is quite right, but we need to start somewhere. (It might be more appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the challenge is that our newer `SandboxPolicy` abstraction does not map to a simple set of enums like in the TypeScript CLI.) We leverage this new functionality by adding the following code to the aforementioned tests as a way to "dynamically disable" them: ```rust if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() { println!( "Skipping test because it cannot execute when network is disabled in a Codex sandbox." ); return; } ``` We can use the `debug seatbelt --full-auto` command to verify that `cargo test` fails when run under Seatbelt prior to this change: ``` $ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test ---- keeps_previous_response_id_between_tasks stdout ---- thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46: Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: keeps_previous_response_id_between_tasks test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `-p codex-core --test previous_response_id` ``` Though after this change, the above command succeeds! This means that, going forward, when Codex operates on Codex itself, when it runs `cargo test`, only "real failures" should cause the command to fail. As part of this change, I decided to tighten up the codepaths for running `exec()` for shell tool calls. In particular, we do it in `core` for the main Codex business logic itself, but we also expose this logic via `debug` subcommands in the CLI in the `cli` crate. The logic for the `debug` subcommands was not quite as faithful to the true business logic as I liked, so I: * refactored a bit of the Linux code, splitting `linux.rs` into `linux_exec.rs` and `landlock.rs` in the `core` crate. * gating less code behind `#[cfg(target_os = "linux")]` because such code does not get built by default when I develop on Mac, which means I either have to build the code in Docker or wait for CI signal * introduced `macro_rules! configure_command` in `exec.rs` so we can have both sync and async versions of this code. The synchronous version seems more appropriate for straight threads or potentially fork/exec.	2025-05-09 18:29:34 -07:00
Michael Bolin	93817643ee	chore: refactor exec() into spawn_child() and consume_truncated_output() (#878 ) This PR is a straight refactor so that creating the `Child` process for an `shell` tool call and consuming its output can be separate concerns. For the actual tool call, we will always apply `consume_truncated_output()`, but for the top-level debug commands in the CLI (e.g., `debug seatbelt` and `debug landlock`), we only want to use the `spawn_child()` part of `exec()`. We want the subcommands to match the `shell` tool call usage as faithfully as possible. This becomes more important when we introduce a new parameter to `spawn_child()` in https://github.com/openai/codex/pull/879. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/878). * #879 * __->__ #878	2025-05-09 11:03:58 -07:00
Michael Bolin	27198bfe11	fix: make McpConnectionManager tolerant of MCPs that fail to start (#854 ) I added a typo in my `config.toml` such that the `command` for one of my `mcp_servers` did not exist and I verified that the error was surfaced in the TUI (and that I was still able to use Codex). ![image](https://github.com/user-attachments/assets/f13cc08c-f4c6-40ec-9ab4-a9d75e03152f)	2025-05-08 23:45:54 -07:00
Michael Bolin	b940adae8e	fix: get responses API working again in Rust (#872 ) I inadvertently regressed support for the Responses API when adding support for the chat completions API in https://github.com/openai/codex/pull/862. This should get both APIs working again, but the chat completions codepath seems more complex than necessary. I'll try to clean that up shortly, but I want to get things working again ASAP.	2025-05-08 22:49:15 -07:00
Michael Bolin	e924070cee	feat: support the chat completions API in the Rust CLI (#862 ) This is a substantial PR to add support for the chat completions API, which in turn makes it possible to use non-OpenAI model providers (just like in the TypeScript CLI): * It moves a number of structs from `client.rs` to `client_common.rs` so they can be shared. * It introduces support for the chat completions API in `chat_completions.rs`. * It updates `ModelProviderInfo` so that `env_key` is `Option<String>` instead of `String` (for e.g., ollama) and adds a `wire_api` field * It updates `client.rs` to choose between `stream_responses()` and `stream_chat_completions()` based on the `wire_api` for the `ModelProviderInfo` * It updates the `exec` and TUI CLIs to no longer fail if the `OPENAI_API_KEY` environment variable is not set * It updates the TUI so that `EventMsg::Error` is displayed more prominently when it occurs, particularly now that it is important to alert users to the `CodexErr::EnvVar` variant. * `CodexErr::EnvVar` was updated to include an optional `instructions` field so we can preserve the behavior where we direct users to https://platform.openai.com if `OPENAI_API_KEY` is not set. * Cleaned up the "welcome message" in the TUI to ensure the model provider is displayed. * Updated the docs in `codex-rs/README.md`. To exercise the chat completions API from OpenAI models, I added the following to my `config.toml`: ```toml model = "gpt-4o" model_provider = "openai-chat-completions" [model_providers.openai-chat-completions] name = "OpenAI using Chat Completions" base_url = "https://api.openai.com/v1" env_key = "OPENAI_API_KEY" wire_api = "chat" ``` Though to test a non-OpenAI provider, I installed ollama with mistral locally on my Mac because ChatGPT said that would be a good match for my hardware: ```shell brew install ollama ollama serve ollama pull mistral ``` Then I added the following to my `~/.codex/config.toml`: ```toml model = "mistral" model_provider = "ollama" ``` Note this code could certainly use more test coverage, but I want to get this in so folks can start playing with it. For reference, I believe https://github.com/openai/codex/pull/247 was roughly the comparable PR on the TypeScript side.	2025-05-08 21:46:06 -07:00
Michael Bolin	a9adb4175c	fix: enable clippy on tests (#870 ) https://github.com/openai/codex/pull/855 added the clippy warning to disallow `unwrap()`, but apparently we were not verifying that tests were "clippy clean" in CI, so I ended up with a lot of local errors in VS Code. This turns on the check in CI and fixes the offenders.	2025-05-08 16:02:56 -07:00
jcoens-openai	87cf120873	Workspace lints and disallow unwrap (#855 ) Sets submodules to use workspace lints. Added denying unwrap as a workspace level lint, which found a couple of cases where we could have propagated errors. Also manually labeled ones that were fine by my eye.	2025-05-08 09:46:18 -07:00
Michael Bolin	86022f097e	feat: read `model_provider` and `model_providers` from config.toml (#853 ) This is the first step in supporting other model providers in the Rust CLI. Specifically, this PR adds support for the new entries in `Config` and `ConfigOverrides` to specify a `ModelProviderInfo`, which is the basic config needed for an LLM provider. This PR does not get us all the way there yet because `client.rs` still categorically appends `/responses` to the URL and expects the endpoint to support the OpenAI Responses API. Will fix that next!	2025-05-07 17:38:28 -07:00
Michael Bolin	cfe50c7107	fix: creating an instance of Codex requires a Config (#859 ) I discovered that I accidentally introduced a change in https://github.com/openai/codex/pull/829 where we load a fresh `Config` in the middle of `codex.rs`: `c3e10e180a/codex-rs/core/src/codex.rs (L515-L522)` This is not good because the `Config` could differ from the one that has the user's overrides specified from the CLI. Also, in unit tests, it means the `Config` was picking up my personal settings as opposed to using a vanilla config, which was problematic. This PR cleans things up by moving the common case where `Op::ConfigureSession` is derived from `Config` (originally done in `codex_wrapper.rs`) and making it the standard way to initialize `Codex` by putting it in `Codex::spawn()`. Note this also eliminates quite a bit of boilerplate from the tests and relieves the caller of the responsibility of minting out unique IDs when invoking `submit()`.	2025-05-07 16:33:28 -07:00
Michael Bolin	c3e10e180a	fix: remove CodexBuilder and Recorder (#858 ) These abstractions were originally created exclusively for the REPL, which was removed in https://github.com/openai/codex/pull/754. Currently, the create some unnecessary Tokio tasks, so we are better off without them. (We can always bring this back if we have a new use case.)	2025-05-07 16:11:42 -07:00
Michael Bolin	42617f8726	feat: save session transcripts when using Rust CLI (#845 ) This adds support for saving transcripts when using the Rust CLI. Like the TypeScript CLI, it saves the transcript to `~/.codex/sessions`, though it uses JSONL for the file format (and `.jsonl` for the file extension) so that even if Codex crashes, what was written to the `.jsonl` file should generally still be valid JSONL content.	2025-05-07 13:49:15 -07:00
Michael Bolin	9da6ebef3f	fix: add optional timeout to McpClient::send_request() (#852 ) We now impose a 10s timeout on the initial `tools/list` request to an MCP server. We do not apply a timeout for other types of requests yet, but we should start enforcing those, as well.	2025-05-07 12:56:38 -07:00

1 2

77 Commits