valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	c02d25fbad	fix: include codex-linux-sandbox-aarch64-unknown-linux-musl in the set of release artifacts (#1230 ) This was missed in https://github.com/openai/codex/pull/1225. Once we create a new GitHub Release with this change, we can use the URL from the workflow that triggered the release in https://github.com/openai/codex/pull/1228.	2025-06-05 22:03:07 -07:00
Michael Bolin	9db53b33aa	fix: support arm64 build for Linux (#1225 ) Users were running into issues with glibc mismatches on arm64 linux. In the past, we did not provide a musl build for arm64 Linux because we had trouble getting the openssl dependency to build correctly. Though today I just tried the same trick in `Cargo.toml` that we were doing for `x86_64-unknown-linux-musl` (using `openssl-sys` with `features = ["vendored"]`), so I'm not sure what problem we had in the past the builds "just worked" today! Though one tweak that did have to be made is that the integration tests for Seccomp/Landlock empirically require longer timeouts on arm64 linux, or at least on the `ubuntu-24.04-arm` GitHub Runner. As such, we change the timeouts for arm64 in `codex-rs/linux-sandbox/tests/landlock.rs`. Though in solving this problem, I decided I needed a turnkey solution for testing the Linux build(s) from my Mac laptop, so this PR introduces `.devcontainer/Dockerfile` and `.devcontainer/devcontainer.json` to facilitate this. Detailed instructions are in `.devcontainer/README.md`. We will update `dotslash-config.json` and other release-related scripts in a follow-up PR.	2025-06-05 20:29:46 -07:00
Michael Bolin	515b6331bd	feat: add support for login with ChatGPT (#1212 ) This does not implement the full Login with ChatGPT experience, but it should unblock people. What works * The `codex` multitool now has a `login` subcommand, so you can run `codex login`, which should write `CODEX_HOME/auth.json` if you complete the flow successfully. The TUI will now read the `OPENAI_API_KEY` from `auth.json`. * The TUI should refresh the token if it has expired and the necessary information is in `auth.json`. * There is a `LoginScreen` in the TUI that tells you to run `codex login` if both (1) your model provider expects to use `OPENAI_API_KEY` as its env var, and (2) `OPENAI_API_KEY` is not set. What does not work * The `LoginScreen` does not support the login flow from within the TUI. Instead, it tells you to quit, run `codex login`, and then run `codex` again. * `codex exec` does read from `auth.json` yet, nor does it direct the user to go through the login flow if `OPENAI_API_KEY` is not be found. * The `maybeRedeemCredits()` function from `get-api-key.tsx` has not been ported from TypeScript to `login_with_chatgpt.py` yet: `a67a67f325/codex-cli/src/utils/get-api-key.tsx (L84-L89)` Implementation Currently, the OAuth flow requires running a local webserver on `127.0.0.1:1455`. It seemed wasteful to incur the additional binary cost of a webserver dependency in the Rust CLI just to support login, so instead we implement this logic in Python, as Python has a `http.server` module as part of its standard library. Specifically, we bundle the contents of a single Python file as a string in the Rust CLI and then use it to spawn a subprocess as `python3 -c {{SOURCE_FOR_PYTHON_SERVER}}`. As such, the most significant files in this PR are: ``` codex-rs/login/src/login_with_chatgpt.py codex-rs/login/src/lib.rs ``` Now that the CLI may load `OPENAI_API_KEY` from the environment _or_ `CODEX_HOME/auth.json`, we need a new abstraction for reading/writing this variable, so we introduce: ``` codex-rs/core/src/openai_api_key.rs ``` Note that `std::env::set_var()` is [rightfully] `unsafe` in Rust 2024, so we use a LazyLock<RwLock<Option<String>>> to store `OPENAI_API_KEY` so it is read in a thread-safe manner. Ultimately, it should be possible to go through the entire login flow from the TUI. This PR introduces a placeholder `LoginScreen` UI for that right now, though the new `codex login` subcommand introduced in this PR should be a viable workaround until the UI is ready. Testing Because the login flow is currently implemented in a standalone Python file, you can test it without building any Rust code as follows: ``` rm -rf /tmp/codex_home && mkdir /tmp/codex_home CODEX_HOME=/tmp/codex_home python3 codex-rs/login/src/login_with_chatgpt.py ``` For reference: * the original TypeScript implementation was introduced in https://github.com/openai/codex/pull/963 * support for redeeming credits was later added in https://github.com/openai/codex/pull/974	2025-06-04 08:44:17 -07:00
Reilly Wood	a67a67f325	codex-rs: make tool calls prettier (#1211 ) This PR overhauls how active tool calls and completed tool calls are displayed: 1. More use of colour to indicate success/failure and distinguish between components like tool name+arguments 2. Previously, the entire `CallToolResult` was serialized to JSON and pretty-printed. Now, we extract each individual `CallToolResultContent` and print those 1. The previous solution was wasting space by unnecessarily showing details of the `CallToolResult` struct to users, without formatting the actual tool call results nicely 2. We're now able to show users more information from tool results in less space, with nicer formatting when tools return JSON results ### Before: <img width="1251" alt="Screenshot 2025-06-03 at 11 24 26" src="https://github.com/user-attachments/assets/5a58f222-219c-4c53-ace7-d887194e30cf" /> ### After: <img width="1265" alt="image" src="https://github.com/user-attachments/assets/99fe54d0-9ebe-406a-855b-7aa529b91274" /> ## Future Work 1. Integrate image tool result handling better. We should be able to display images even if they're not the first `CallToolResultContent` 2. Users should have some way to view the full version of truncated tool results 3. It would be nice to add some left padding for tool results, make it more clear that they are results. This is doable, just a little fiddly due to the way `first_visible_line` scrolling works 4. There's almost certainly a better way to format JSON than "all on 1 line with spaces to make Ratatui wrapping work". But I think that works OK for now.	2025-06-03 14:29:26 -07:00
Michael Bolin	c6fcec55fe	fix: always send full instructions when using the Responses API (#1207 ) This fixes a longstanding error in the Rust CLI where `codex.rs` contained an errant `is_first_turn` check that would exclude the user instructions for subsequent "turns" of a conversation when using the responses API (i.e., when `previous_response_id` existed). While here, renames `Prompt.instructions` to `Prompt.user_instructions` since we now have quite a few levels of instructions floating around. Also removed an unnecessary use of `clone()` in `Prompt.get_full_instructions()`.	2025-06-03 09:40:19 -07:00
Michael Bolin	6fcc528a43	fix: provide tolerance for apply_patch tool (#993 ) As explained in detail in the doc comment for `ParseMode::Lenient`, we have observed that GPT-4.1 does not always generate a valid invocation of `apply_patch`. Fortunately, the error is predictable, so we introduce some new logic to the `codex-apply-patch` crate to recover from this error. Because we would like to avoid this becoming a de facto standard (as it would be incompatible if `apply_patch` were provided as an actual executable, unless we also introduced the lenient behavior in the executable, as well), we require passing `ParseMode::Lenient` to `parse_patch_text()` to make it clear that the caller is opting into supporting this special case. Note the analogous change to the TypeScript CLI was https://github.com/openai/codex/pull/930. In addition to changing the accepted input to `apply_patch`, it also introduced additional instructions for the model, which we include in this PR. Note that `apply-patch` does not depend on either `regex` or `regex-lite`, so some of the checks are slightly more verbose to avoid introducing this dependency. That said, this PR does not leverage the existing `extract_heredoc_body_from_apply_patch_command()`, which depends on `tree-sitter` and `tree-sitter-bash`: `5a5aa89914/codex-rs/apply-patch/src/lib.rs (L191-L246)` though perhaps it should.	2025-06-03 09:06:38 -07:00
Michael Bolin	5a5aa89914	chore: replace regex with regex-lite, where appropriate (#1200 ) As explained on https://crates.io/crates/regex-lite, `regex-lite` is a lighter alternative to `regex` and seems to be sufficient for our purposes.	2025-06-02 17:11:45 -07:00
Michael Bolin	0f3cc8f842	feat: make reasoning effort/summaries configurable (#1199 ) Previous to this PR, we always set `reasoning` when making a request using the Responses API: `d7245cbbc9/codex-rs/core/src/client.rs (L108-L111)` Though if you tried to use the Rust CLI with `--model gpt-4.1`, this would fail with: ```shell "Unsupported parameter: 'reasoning.effort' is not supported with this model." ``` We take a cue from the TypeScript CLI, which does a check on the model name: `d7245cbbc9/codex-cli/src/utils/agent/agent-loop.ts (L786-L789)` This PR does a similar check, though also adds support for the following config options: ``` model_reasoning_effort = "low" \| "medium" \| "high" \| "none" model_reasoning_summary = "auto" \| "concise" \| "detailed" \| "none" ``` This way, if you have a model whose name happens to start with `"o"` (or `"codex"`?), you can set these to `"none"` to explicitly disable reasoning, if necessary. (That said, it seems unlikely anyone would use the Responses API with non-OpenAI models, but we provide an escape hatch, anyway.) This PR also updates both the TUI and `codex exec` to show `reasoning effort` and `reasoning summaries` in the header.	2025-06-02 16:01:34 -07:00
Michael Bolin	d7245cbbc9	fix: chat completions API now also passes tools along (#1167 ) Prior to this PR, there were two big misses in `chat_completions.rs`: 1. The loop in `stream_chat_completions()` was only including items of type `ResponseItem::Message` when building up the `"messages"` JSON for the `POST` request to the `chat/completions` endpoint. This fixes things by ensuring other variants (`FunctionCall`, `LocalShellCall`, and `FunctionCallOutput`) are included, as well. 2. In `process_chat_sse()`, we were not recording tool calls and were only emitting items of type `ResponseEvent::OutputItemDone(ResponseItem::Message)` to the stream. Now we introduce `FunctionCallState`, which is used to accumulate the `delta`s of type `tool_calls`, so we can ultimately emit a `ResponseItem::FunctionCall`, when appropriate. While function calling now appears to work for chat completions with my local testing, I believe that there are still edge cases that are not covered and that this codepath would benefit from a battery of integration tests. (As part of that further cleanup, we should also work to support streaming responses in the UI.) The other important part of this PR is some cleanup in `core/src/codex.rs`. In particular, it was hard to reason about how `run_task()` was building up the list of messages to include in a request across the various cases: - Responses API - Chat Completions API - Responses API used in concert with ZDR I like to think things are a bit cleaner now where: - `zdr_transcript` (if present) contains all messages in the history of the conversation, which includes function call outputs that have not been sent back to the model yet - `pending_input` includes any messages the user has submitted while the turn is in flight that need to be injected as part of the next `POST` to the model - `input_for_next_turn` includes the tool call outputs that have not been sent back to the model yet	2025-06-02 13:47:51 -07:00
Michael Bolin	e40f86b446	chore: logging cleanup (#1196 ) Update what we log to make `RUST_LOG=debug` a bit easier to work with. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1196). * #1167 * __->__ #1196	2025-06-02 13:31:33 -07:00
Michael Bolin	7896b1089d	chore: update the WORKFLOW_URL in install_native_deps.sh to the latest release (#1190 )	2025-05-31 10:30:50 -07:00
Michael Bolin	1410ae95ca	fix: set `--config hide_agent_reasoning=true` in the GitHub Action (#1185 ) Whoops, I had this flipped in https://github.com/openai/codex/pull/1183.	2025-05-30 23:57:05 -07:00
Michael Bolin	fccf5f3221	fix: disable agent reasoning output by default in the GitHub Action (#1183 )	2025-05-30 23:49:48 -07:00
Michael Bolin	1159eaf04f	feat: show the version when starting Codex (#1182 ) The TypeScript version of the CLI shows the version when it starts up, which is helpful when users share screenshots (and nice to know, as a user).	2025-05-30 23:24:36 -07:00
Michael Bolin	e81327e5f4	feat: add hide_agent_reasoning config option (#1181 ) This PR introduces a `hide_agent_reasoning` config option (that defaults to `false`) that users can enable to make the output less verbose by suppressing reasoning output. To test, verified that this includes agent reasoning in the output: ``` echo hello \| just exec ``` whereas this does not: ``` echo hello \| just exec --config hide_agent_reasoning=false ```	2025-05-30 23:14:56 -07:00
Michael Bolin	4f3d294762	feat: dim the timestamp in the exec output (#1180 ) This required changing `ts_println!()` to take `$self:ident`, which is a bit more verbose, but the usability improvement seems worth it. Also eliminated an unnecessary `.to_string()` while here.	2025-05-30 16:27:37 -07:00
Michael Bolin	cf1d070538	feat: grab-bag of improvements to `exec` output (#1179 ) Fixes: * Instantiate `EventProcessor` earlier in `lib.rs` so `print_config_summary()` can be an instance method of it and leverage its various `Style` fields to ensure it honors `with_ansi` properly. * After printing the config summary, print out user's prompt with the heading `User instructions:`. As noted in the comment, now that we can read the instructions via stdin as of #1178, it is helpful to the user to ensure they know what instructions were given to Codex. * Use same colors/bold/italic settings for headers as the TUI, making the output a bit easier to read.	2025-05-30 16:22:10 -07:00
Michael Bolin	ae743d56b0	feat: for `codex exec`, if PROMPT is not specified, read from stdin if not a TTY (#1178 ) This attempts to make `codex exec` more flexible in how the prompt can be passed: * as before, it can be passed as a single string argument * if `-` is passed as the value, the prompt is read from stdin * if no argument is passed _and stdin is a tty_, prints a warning to stderr that no prompt was specified an exits non-zero. * if no argument is passed _and stdin is NOT a tty_, prints `Reading prompt from stdin...` to stderr to let the user know that Codex will wait until it reads EOF from stdin to proceed. (You can repro this case by doing `yes \| just exec` since stdin is not a TTY in that case but it also never reaches EOF).	2025-05-30 14:41:55 -07:00
Michael Bolin	1bf82056b3	fix: introduce `create_tools_json()` and share it with chat_completions.rs (#1177 ) The main motivator behind this PR is that `stream_chat_completions()` was not adding the `"tools"` entry to the payload posted to the `/chat/completions` endpoint. This (1) refactors the existing logic to build up the `"tools"` JSON from `client.rs` into `openai_tools.rs`, and (2) updates the use of responses API (`client.rs`) and chat completions API (`chat_completions.rs`) to both use it. Note this PR alone is not sufficient to get tool calling from chat completions working: that is done in https://github.com/openai/codex/pull/1167. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1177). * #1167 * __->__ #1177	2025-05-30 14:07:03 -07:00
Michael Bolin	e207f20f64	fix: add extra debugging to GitHub Action (#1173 ) https://github.com/openai/codex/actions/runs/15352839832/job/43205041563 appeared to fail around `postComment()`, but I don't see the output from `fail()` in the logs. Adding a bit more info.	2025-05-30 11:16:30 -07:00
Michael Bolin	0f40ef5a10	fix: missed a step in #1171 for codex.yml (#1172 ) Missed in my copy/paste.	2025-05-30 11:04:41 -07:00
Michael Bolin	8676185389	fix: update outdated repo setup in codex.yml (#1171 ) We should do some work to share the setup logic across `codex.yml`, `ci.yml`, and `rust-ci.yml`.	2025-05-30 10:58:57 -07:00
Michael Bolin	baa92f37e0	feat: initial import of experimental GitHub Action (#1170 ) This is a first cut at a GitHub Action that lets you define prompt templates in `.md` files under `.github/codex/labels` that will run Codex with the associated prompt when the label is added to a GitHub pull request. For example, this PR includes these files: ``` .github/codex/labels/codex-attempt.md .github/codex/labels/codex-code-review.md .github/codex/labels/codex-investigate-issue.md ``` And the new `.github/workflows/codex.yml` workflow declares the following triggers: ```yaml on: issues: types: [opened, labeled] pull_request: branches: [main] types: [labeled] ``` as well as the following expression to gate the action: ``` jobs: codex: if: \| (github.event_name == 'issues' && ( (github.event.action == 'labeled' && (github.event.label.name == 'codex-attempt' \|\| github.event.label.name == 'codex-investigate-issue')) )) \|\| (github.event_name == 'pull_request' && github.event.action == 'labeled' && github.event.label.name == 'codex-code-review') ``` Note the "actor" who added the label must have write access to the repo for the action to take effect. After adding a label, the action will "ack" the request by replacing the original label (e.g., `codex-review`) with an `-in-progress` suffix (e.g., `codex-review-in-progress`). When it is finished, it will swap the `-in-progress` label with a `-completed` one (e.g., `codex-review-completed`). Users of the action are responsible for providing an `OPENAI_API_KEY` and making it available as a secret to the action.	2025-05-30 10:55:28 -07:00
Michael Bolin	a0239c3cd6	fix: enable `set positional-arguments` in justfile (#1169 ) The way these definitions worked before, they did not handle quoted args with spaces properly. For example, if you had `/tmp/test-just/printlen.py` as: ```python #!/usr/bin/env python3 import sys print(len(sys.argv)) ``` and your `justfile` was: ``` printlen args: /tmp/test-just/printlen.py {{args}} ``` Then: ```shell $ just printlen foo bar 3 $ just printlen 'foo bar' 3 ``` which is not what we want: `'foo bar'` should be treated as one argument. The fix is to use [positional-arguments](`515e806b51/README.md (L1131)`): ``` set positional-arguments printlen args: /tmp/test-just/printlen.py "$@" ```	2025-05-30 09:11:53 -07:00
Michael Bolin	bdfa95ed31	docs: split the config-related portion of codex-rs/README.md into its own config.md file (#1165 ) Also updated the overview on `codex-rs/README.md` while here.	2025-05-29 16:59:35 -07:00
Fouad Matin	828e2062c2	fix(codex-rs): use codex-mini-latest as default (#1164 )	2025-05-29 16:55:19 -07:00
Michael Bolin	92957c47fb	fix: update justfile to facilitate running CLIs from source and formatting source code (#1163 )	2025-05-29 15:35:14 -07:00
Michael Bolin	8c1902b562	chore: update GitHub workflow for native artifacts for npm release (#1162 ) Among other things, this picks up this UI treatment fix: https://github.com/openai/codex/pull/1161	2025-05-29 15:34:06 -07:00
Michael Bolin	a32d305ae6	fix: update UI treatment of slash command menu to match that of the TS CLI (#1161 ) Uses the same colors as in the TypeScript CLI: ![image](https://github.com/user-attachments/assets/919cd472-ffb4-4654-a46a-d84f0cd9c097) Now it is also readable on a light theme, e.g., in Ghostty: ![image](https://github.com/user-attachments/assets/468c37b0-ea63-4455-9b48-73dc2c95f0f6)	2025-05-29 14:57:55 -07:00
Michael Bolin	a768a6a41d	fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151 ) The output of an MCP server tool call can be one of several types, but to date, we treated all outputs as text by showing the serialized JSON as the "tool output" in Codex: `25a9949c49/codex-rs/mcp-types/src/lib.rs (L96-L101)` This PR adds support for the `ImageContent` variant so we can now display an image output from an MCP tool call. In making this change, we introduce a new `ResponseInputItem::McpToolCallOutput` variant so that we can work with the `mcp_types::CallToolResult` directly when the function call is made to an MCP server. Though arguably the more significant change is the introduction of `HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that uses `ratatui_image` to render an image into the terminal. To support this, we introduce `ImageRenderCache`, cache a `ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the appropriate scaled image data and dimensions based on the current terminal size. To test, I created a minimal `package.json`: ```json { "name": "kitty-mcp", "version": "1.0.0", "type": "module", "description": "MCP that returns image of kitty", "main": "index.js", "dependencies": { "@modelcontextprotocol/sdk": "^1.12.0" } } ``` with the following `index.js` to define the MCP server: ```js #!/usr/bin/env node import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { readFile } from "node:fs/promises"; import { join } from "node:path"; const IMAGE_URI = "image://Ada.png"; const server = new McpServer({ name: "Demo", version: "1.0.0", }); server.tool( "get-cat-image", "If you need a cat image, this tool will provide one.", async () => ({ content: [ { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" }, ], }) ); server.resource("Ada the Cat", IMAGE_URI, async (uri) => { const base64Image = await getAdaPngBase64(); return { contents: [ { uri: uri.href, mimeType: "image/png", blob: base64Image, }, ], }; }); async function getAdaPngBase64() { const __dirname = new URL(".", import.meta.url).pathname; // From `9705ce2c59/assets/Ada.png` const filePath = join(__dirname, "Ada.png"); const imageData = await readFile(filePath); const base64Image = imageData.toString("base64"); return base64Image; } const transport = new StdioServerTransport(); await server.connect(transport); ``` With the local changes from this PR, I added the following to my `config.toml`: ```toml [mcp_servers.kitty] command = "node" args = ["/Users/mbolin/code/kitty-mcp/index.js"] ``` Running the TUI from source: ``` cargo run --bin codex -- --model o3 'I need a picture of a cat' ``` I get: <img width="732" alt="image" src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869" /> Now, that said, I have only tested in iTerm and there is definitely some funny business with getting an accurate character-to-pixel ratio (sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10 rows to render instead of 4), so there is still work to be done here.	2025-05-28 19:03:17 -07:00
Michael Bolin	25a9949c49	fix: ensure inputSchema for MCP tool always has "properties" field when talking to OpenAI (#1150 ) As noted in the comment introduced in this PR, this is analogous to the issue reported in https://github.com/openai/openai-agents-python/issues/449. This seems to work now.	2025-05-28 17:17:21 -07:00
Michael Bolin	392fdd7db6	fix: honor RUST_LOG in mcp-client CLI and default to DEBUG (#1149 ) We had `debug!()` logging statements already, but they weren't being printed because `tracing_subscriber` was not set up.	2025-05-28 17:10:06 -07:00
Michael Bolin	ae1a83f095	feat: introduce CellWidget trait (#1148 ) The motivation behind this PR is to make it so a `HistoryCell` is more like a `WidgetRef` that knows how to render itself into a `Rect` so that it can be backed by something other than a `Vec<Line>`. Because a `HistoryCell` is intended to appear in a scrollable list, we want to ensure the stack of cells can be scrolled one `Line` at a time even if the `HistoryCell` is not backed by a `Vec<Line>` itself. To this end, we introduce the `CellWidget` trait whose key method is: ``` fn render_window(&self, first_visible_line: usize, area: Rect, buf: &mut Buffer); ``` The `first_visible_line` param is what differs from `WidgetRef::render_ref()`, as a `CellWidget` needs to know the offset into its "full view" at which it should start rendering. The bookkeeping in `ConversationHistoryWidget` has been updated accordingly to ensure each `CellWidget` in the history is rendered appropriately.	2025-05-28 14:03:19 -07:00
Michael Bolin	d60f350cf8	feat: add support for -c/--config to override individual config items (#1137 ) This PR introduces support for `-c`/`--config` so users can override individual config values on the command line using `--config name=value`. Example: ``` codex --config model=o4-mini ``` Making it possible to set arbitrary config values on the command line results in a more flexible configuration scheme and makes it easier to provide single-line examples that can be copy-pasted from documentation. Effectively, it means there are four levels of configuration for some values: - Default value (e.g., `model` currently defaults to `o4-mini`) - Value in `config.toml` (e.g., user could override the default to be `model = "o3"` in their `config.toml`) - Specifying `-c` or `--config` to override `model` (e.g., user can include `-c model=o3` in their list of args to Codex) - If available, a config-specific flag can be used, which takes precedence over `-c` (e.g., user can specify `--model o3` in their list of args to Codex) Now that it is possible to specify anything that could be configured in `config.toml` on the command line using `-c`, we do not need to have a custom flag for every possible config option (which can clutter the output of `--help`). To that end, as part of this PR, we drop support for the `--disable-response-storage` flag, as users can now specify `-c disable_response_storage=true` to get the equivalent functionality. Under the hood, this works by loading the `config.toml` into a `toml::Value`. Then for each `key=value`, we create a small synthetic TOML file with `value` so that we can run the TOML parser to get the equivalent `toml::Value`. We then parse `key` to determine the point in the original `toml::Value` to do the insert/replace. Once all of the overrides from `-c` args have been applied, the `toml::Value` is deserialized into a `ConfigToml` and then the `ConfigOverrides` are applied, as before.	2025-05-27 23:11:44 -07:00
Michael Bolin	eba0e32909	fix: update install_native_deps.sh to pick up the latest release (#1136 )	2025-05-27 10:06:41 -07:00
Michael Bolin	29d154cb13	fix: use o4-mini as the default model (#1135 ) Rollback of https://github.com/openai/codex/pull/972.	2025-05-27 09:12:55 -07:00
Michael Bolin	6b5b184f21	fix: TUI was not honoring --skip-git-repo-check correctly (#1105 ) I discovered that if I ran `codex <PROMPT>` in a cwd that was not a Git repo, Codex did not automatically run `<PROMPT>` after I accepted the Git warning. It appears that we were not managing the `AppState` transition correctly, so this fixes the bug and ensures the Codex session does not start until the user accepts the Git warning. In particular, we now create the `ChatWidget` lazily and store it in the `AppState::Chat` variant.	2025-05-24 08:33:49 -07:00
Michael Bolin	4bf81373a7	fix: forgot to pass codex_linux_sandbox_exe through in cli/src/debug_sandbox.rs (#1095 ) I accidentally missed this in https://github.com/openai/codex/pull/1086.	2025-05-23 11:53:13 -07:00
Michael Bolin	89ef4efdcf	fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086 ) Historically, we spawned the Seatbelt and Landlock sandboxes in substantially different ways: For Seatbelt, we would run `/usr/bin/sandbox-exec` with our policy specified as an arg followed by the original command: `d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)` For Landlock/Seccomp, we would do `tokio::runtime::Builder::new_current_thread()`, _invoke Landlock/Seccomp APIs to modify the permissions of that new thread_, and then spawn the command: `d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)` While it is neat that Landlock/Seccomp supports applying a policy to only one thread without having to apply it to the entire process, it requires us to maintain two different codepaths and is a bit harder to reason about. The tipping point was https://github.com/openai/codex/pull/1061, in which we had to start building up the `env` in an unexpected way for the existing Landlock/Seccomp approach to continue to work. This PR overhauls things so that we do similar things for Mac and Linux. It turned out that we were already building our own "helper binary" comparable to Mac's `sandbox-exec` as part of the `cli` crate: `d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)` We originally created this to build a small binary to include with the Node.js version of the Codex CLI to provide support for Linux sandboxing. Though the sticky bit is that, at this point, we still want to deploy the Rust version of Codex as a single, standalone binary rather than a CLI and a supporting sandboxing binary. To satisfy this goal, we use "the arg0 trick," in which we: * use `std::env::current_exe()` to get the path to the CLI that is currently running * use the CLI as the `program` for the `Command` * set `"codex-linux-sandbox"` as arg0 for the `Command` A CLI that supports sandboxing should check arg0 at the start of the program. If it is `"codex-linux-sandbox"`, it must invoke `codex_linux_sandbox::run_main()`, which runs the CLI as if it were `codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn the original command, so do _replace_ the process rather than spawn a subprocess. Incidentally, we do this before starting the Tokio runtime, so the process should only have one thread when `execvp(3)` is called. Because the `core` crate that needs to spawn the Linux sandboxing is not a CLI in its own right, this means that every CLI that includes `core` and relies on this behavior has to (1) implement it and (2) provide the path to the sandboxing executable. While the path is almost always `std::env::current_exe()`, we needed to make this configurable for integration tests, so `Config` now has a `codex_linux_sandbox_exe: Option<PathBuf>` property to facilitate threading this through, introduced in https://github.com/openai/codex/pull/1089. This common pattern is now captured in `codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs` functions that should use it have been updated as part of this PR. The `codex-linux-sandbox` crate added to the Cargo workspace as part of this PR now has the bulk of the Landlock/Seccomp logic, which makes `core` a bit simpler. Indeed, `core/src/exec_linux.rs` and `core/src/landlock.rs` were removed/ported as part of this PR. I also moved the unit tests for this code into an integration test, `linux-sandbox/tests/landlock.rs`, in which I use `env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for `codex_linux_sandbox_exe` since `std::env::current_exe()` is not appropriate in that case.	2025-05-23 11:37:07 -07:00
Michael Bolin	d1de7bb383	feat: add `codex_linux_sandbox_exe: Option<PathBuf>` field to Config (#1089 ) https://github.com/openai/codex/pull/1086 is a work-in-progress to make Linux sandboxing work more like Seatbelt where, for the command we want to sandbox, we build up the command and then hand it, and some sandbox configuration flags, to another command to set up the sandbox and then run it. In the case of Seatbelt, macOS provides this helper binary and provides it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and pass it through (which is what #1086 does), so this makes the new `codex_linux_sandbox_exe` available on `Config` so that it will later be available in `exec.rs` when we need it in #1086.	2025-05-22 21:52:28 -07:00
Michael Bolin	63deb7c369	fix: for the @native release of the Node module, use the Rust version by default (#1084 ) Added logic so that when we run `./scripts/stage_release.sh --native` (for the `@native` version of the Node module), we drop a `use-native` file next to `codex.js`. If present, `codex.js` will now run the Rust CLI. Ran `./scripts/stage_release.sh --native` and verified that when the running `codex.js` in the staged folder: ``` $ /var/folders/wm/f209bc1n2bd_r0jncn9s6j_00000gp/T/tmp.efvEvBlSN6/bin/codex.js --version codex-cli 0.0.2505220956 ``` it ran the expected Rust version of the CLI, as desired. While here, I also updated the Rust version to one that I cut today, which includes the new shell environment policy config option: https://github.com/openai/codex/pull/1061. Note this may "break" some users if the processes spawned by Codex need extra environment variables. (We are still working to determine what the right defaults should be for this option.)	2025-05-22 13:42:55 -07:00
Michael Bolin	cb379d7797	feat: introduce support for shell_environment_policy in config.toml (#1061 ) To date, when handling `shell` and `local_shell` tool calls, we were spawning new processes using the environment inherited from the Codex process itself. This means that the sensitive `OPENAI_API_KEY` that Codex needs to talk to OpenAI models was made available to everything run by `shell` and `local_shell`. While there are cases where that might be useful, it does not seem like a good default. This PR introduces a complex `shell_environment_policy` config option to control the `env` used with these tool calls. It is inevitably a bit complex so that it is possible to override individual components of the policy so without having to restate the entire thing. Details are in the updated `README.md` in this PR, but here is the relevant bit that explains the individual fields of `shell_environment_policy`: \| Field \| Type \| Default \| Description \| \| ------------------------- \| -------------------------- \| ------- \| ----------------------------------------------------------------------------------------------------------------------------------------------- \| \| `inherit` \| string \| `core` \| Starting template for the environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full parent env), or `none` (start empty). \| \| `ignore_default_excludes` \| boolean \| `false` \| When `false`, Codex removes any var whose name contains `KEY`, `SECRET`, or `TOKEN` (case-insensitive) before other rules run. \| \| `exclude` \| array<string> \| `[]` \| Case-insensitive glob patterns to drop after the default filter.<br>Examples: `"AWS_"`, `"AZURE_"`. \| \| `set` \| table<string,string> \| `{}` \| Explicit key/value overrides or additions – always win over inherited values. \| \| `include_only` \| array<string> \| `[]` \| If non-empty, a whitelist of patterns; only variables that match _one_ pattern survive the final step. (Generally used with `inherit = "all"`.) \| In particular, note that the default is `inherit = "core"`, so: * if you have extra env variables that you want to inherit from the parent process, use `inherit = "all"` and then specify `include_only` * if you have extra env variables where you want to hardcode the values, the default `inherit = "core"` will work fine, but then you need to specify `set` This configuration is not battle-tested, so we will probably still have to play with it a bit. `core/src/exec_env.rs` has the critical business logic as well as unit tests. Though if nothing else, previous to this change: ``` $ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY # ...prints OPENAI_API_KEY... ``` But after this change it does not print anything (as desired). One final thing to call out about this PR is that the `configure_command!` macro we use in `core/src/exec.rs` has to do some complex logic with respect to how it builds up the `env` for the process being spawned under Landlock/seccomp. Specifically, doing `cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably the most intuitive way to do it) caused the Landlock unit tests to fail because the processes spawned by the unit tests started failing in unexpected ways! If we forgo `env_clear()` in favor of updating env vars one at a time, the tests still pass. The comment in the code talks about this a bit, and while I would like to investigate this more, I need to move on for the moment, but I do plan to come back to it to fully understand what is going on. For example, this suggests that we might not be able to spawn a C program that calls `env_clear()`, which would be...weird. We may still have to fiddle with our Landlock config if that is the case.	2025-05-22 09:51:19 -07:00
Michael Bolin	ef7208359f	feat: show Config overview at start of exec (#1073 ) Now the `exec` output starts with something like: ``` -------- workdir: /Users/mbolin/code/codex/codex-rs model: o3 provider: openai approval: Never sandbox: SandboxPolicy { permissions: [DiskFullReadAccess, DiskWritePlatformUserTempFolder, DiskWritePlatformGlobalTempFolder, DiskWriteCwd, DiskWriteFolder { folder: "/Users/mbolin/.pyenv/shims" }] } -------- ``` which makes it easier to reason about when looking at logs.	2025-05-21 22:53:02 -07:00
Michael Bolin	5746561428	chore: move types out of config.rs into config_types.rs (#1054 ) `config.rs` is already quite long without these definitions. Since they have no real dependencies of their own, let's move them to their own file so `config.rs` can focus on the business logic of loading a config.	2025-05-20 11:55:25 -07:00
Michael Bolin	d766e845b3	feat: experimental --output-last-message flag to exec subcommand (#1037 ) This introduces an experimental `--output-last-message` flag that can be used to identify a file where the final message from the agent will be written. Two use cases: - Ultimately, we will likely add a `--quiet` option to `exec`, but even if the user does not want any output written to the terminal, they probably want to know what the agent did. Writing the output to a file makes it possible to get that information in a clean way. - Relatedly, when using `exec` in CI, it is easier to review the transcript written "normally," (i.e., not as JSON or something with extra escapes), but getting programmatic access to the last message is likely helpful, so writing the last message to a file gets the best of both worlds. I am calling this "experimental" because it is possible that we are overfitting and will want a more general solution to this problem that would justify removing this flag.	2025-05-19 16:08:18 -07:00
Michael Bolin	a4bfdf6779	chore: produce .tar.gz versions of artifacts in addition to .zst (#1036 ) For sparse containers/environments that do not have `zstd`, provide `.tar.gz` as alternative archive format.	2025-05-19 15:17:45 -07:00
Fouad Matin	44022db8d0	bump(version): 0.1.2505172129 (#1008 ) ## `0.1.2505172129` ### 🪲 Bug Fixes - Add node version check (#1007) - Persist token after refresh (#1006)	2025-05-17 21:35:54 -07:00
Fouad Matin	a86270f581	fix: add node version check (#1007 )	2025-05-17 21:27:41 -07:00
Fouad Matin	835eb77a7d	fix: persist token after refresh (#1006 ) After a token refresh/exchange, persist the new refresh and id token	2025-05-17 21:27:02 -07:00
Fouad Matin	dbc0ad348e	bump(version): 0.1.2505171619 (#1001 ) ## `0.1.2505171619` - `codex --login` + `codex --free` (#998)	2025-05-17 16:25:21 -07:00

... 2 3 4 5 6 ...

574 Commits