valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	4a80059b1b	Add turn.failed and rename session created to thread started (#4478 ) Don't produce completed when turn failed.	2025-09-29 18:38:04 -07:00
vishnu-oai	04c1782e52	OpenTelemetry events (#2103 ) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) log events that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is disabled by default so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-09-29 11:30:55 -07:00
pakrym-oai	cc1b21e47f	Add turn started/completed events and correct exit code on error (#4309 ) Adds new event for session completed that includes usage. Also ensures we return 1 on failures. ``` { "type": "session.created", "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea" } { "type": "turn.started" } ... { "type": "turn.completed", "usage": { "input_tokens": 78913, "cached_input_tokens": 65280, "output_tokens": 1099 } } ```	2025-09-26 16:21:50 -07:00
pakrym-oai	ea095e30c1	Add todo-list tool support (#4255 ) Adds a 1-per-turn todo-list item and item.updated event ```jsonl {"type":"item.started","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":false},{"text":"Update progress to next step","completed":false}]}} {"type":"item.updated","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":true},{"text":"Update progress to next step","completed":false}]}} {"type":"item.completed","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":true},{"text":"Update progress to next step","completed":false}]}} ```	2025-09-26 09:35:47 -07:00
pakrym-oai	8e3a048fec	Add codex exec testing helpers (#4254 ) Add a shortcut to create working directories and run codex exec with fake server.	2025-09-25 17:12:45 -07:00
pakrym-oai	67aab04c66	[codex exec] Add item.started and support it for command execution (#4250 ) Adds a new `item.started` event to `codex exec` and implements it for command_execution item type. ```jsonl {"type":"session.created","session_id":"019982d1-75f0-7920-b051-e0d3731a5ed8"} {"type":"item.completed","item":{"id":"item_0","item_type":"reasoning","text":"Executing commands securely\n\nI'm thinking about how the default harness typically uses \"bash -lc,\" while historically \"bash\" is what we've been using. The command should be executed as a string in our CLI, so using \"bash -lc 'echo hello'\" is optimal but calling \"echo hello\" directly feels safer. The sandbox makes sure environment variables like CODEX_SANDBOX_NETWORK_DISABLED=1 are set, so I won't ask for approval. I just need to run \"echo hello\" and correctly present the output."}} {"type":"item.completed","item":{"id":"item_1","item_type":"reasoning","text":"Preparing for tool calls\n\nI realize that I need to include a preamble before making any tool calls. So, I'll first state the preamble in the commentary channel, then proceed with the tool call. After that, I need to present the final message along with the output. It's possible that the CLI will show the output inline, but I must ensure that I present the result clearly regardless. Let's move forward and get this organized!"}} {"type":"item.completed","item":{"id":"item_2","item_type":"assistant_message","text":"Running `echo` to confirm shell access and print output."}} {"type":"item.started","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"","exit_code":null,"status":"in_progress"}} {"type":"item.completed","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"hello\n","exit_code":0,"status":"completed"}} {"type":"item.completed","item":{"id":"item_4","item_type":"assistant_message","text":"hello"}} ```	2025-09-25 22:25:02 +00:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
pakrym-oai	344d4a1d68	Add explicit codex exec events (#4177 ) This pull request add a new experimental format of JSON output. You can try it using `codex exec --experimental-json`. Design takes a lot of inspiration from Responses API items and stream format. # Session and items Each invocation of `codex exec` starts or resumes a session. Session contains multiple high-level item types: 1. Assistant message 2. Assistant thinking 3. Command execution 4. File changes 5. To-do lists 6. etc. # Events Session and items are going through their life cycles which is represented by events. Session is `session.created` or `session.resumed` Items are `item.added`, `item.updated`, `item.completed`, `item.require_approval` (or other item types like `item.output_delta` when we need streaming). So a typical session can look like: <details> ``` { "type": "session.created", "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752" } { "type": "item.completed", "item": { "id": "itm_0", "item_type": "assistant_message", "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed." } } { "type": "item.completed", "item": { "id": "itm_1", "item_type": "command_execution", "command": "bash -lc ls -la \| sed -n '1,200p'", "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@ 33 pakrym staff 1056 Sep 24 14:36 .\ndrwxr-xr-x 41 pakrym staff 1312 Sep 24 09:17 ..\n-rw-r--r--@ 1 pakrym staff 6 Jul 9 16:16 .codespellignore\n-rw-r--r--@ 1 pakrym staff 258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@ 5 pakrym staff 160 Jul 23 08:26 .devcontainer\n-rw-r--r--@ 1 pakrym staff 6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@ 15 pakrym staff 480 Sep 24 14:38 .git\ndrwxr-xr-x@ 12 pakrym staff 384 Sep 2 16:00 .github\n-rw-r--r--@ 1 pakrym staff 778 Jul 9 16:16 .gitignore\ndrwxr-xr-x@ 3 pakrym staff 96 Aug 11 09:37 .husky\n-rw-r--r--@ 1 pakrym staff 104 Jul 9 16:16 .npmrc\n-rw-r--r--@ 1 pakrym staff 96 Sep 2 08:52 .prettierignore\n-rw-r--r--@ 1 pakrym staff 170 Jul 9 16:16 .prettierrc.toml\ndrwxr-xr-x@ 5 pakrym staff 160 Sep 14 17:43 .vscode\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:37 2025-09-11\n-rw-r--r--@ 1 pakrym staff 5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@ 1 pakrym staff 92 Sep 2 08:52 CHANGELOG.md\n-rw-r--r--@ 1 pakrym staff 1145 Jul 9 16:16 cliff.toml\ndrwxr-xr-x@ 11 pakrym staff 352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@ 38 pakrym staff 1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@ 18 pakrym staff 576 Sep 23 11:01 docs\n-rw-r--r--@ 1 pakrym staff 2038 Jul 9 16:16 flake.lock\n-rw-r--r--@ 1 pakrym staff 1434 Jul 9 16:16 flake.nix\n-rw-r--r--@ 1 pakrym staff 10926 Jul 9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym staff 14880 Jul 15 07:36 node_modules\n-rw-r--r--@ 1 pakrym staff 242 Aug 5 08:25 NOTICE\n-rw-r--r--@ 1 pakrym staff 578 Aug 14 12:31 package.json\n-rw-r--r--@ 1 pakrym staff 498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@ 1 pakrym staff 58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@ 1 pakrym staff 2402 Jul 9 16:16 PNPM.md\n-rw-r--r--@ 1 pakrym staff 4393 Sep 12 14:36 README.md\ndrwxr-xr-x@ 4 pakrym staff 128 Sep 18 09:28 scripts\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:34 tmp\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_2", "item_type": "reasoning", "text": "Reviewing README.md file\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!" } } { "type": "item.completed", "item": { "id": "itm_3", "item_type": "command_execution", "command": "bash -lc sed -n '1,40p' README.md", "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_4", "item_type": "reasoning", "text": "Updating README file\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy." } } { "type": "item.completed", "item": { "id": "itm_5", "item_type": "reasoning", "text": "Crafting the patch\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly." } } { "type": "item.completed", "item": { "id": "itm_6", "item_type": "assistant_message", "text": "Applying the change by removing the first line from `README.md`." } } { "type": "item.completed", "item": { "id": "itm_7", "item_type": "file_change", "changes": [ { "path": "/Users/pakrym/code/codex/README.md", "kind": "update" } ], "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_8", "item_type": "reasoning", "text": "Validating README Update\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!" } } { "type": "item.completed", "item": { "id": "itm_9", "item_type": "command_execution", "command": "bash -lc sed -n '1,20p' README.md", "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_10", "item_type": "assistant_message", "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?" } } ``` </details> The idea is to give users fully formatted items they can use directly in their rendering/application logic and avoid having them building up items manually based on events (unless they want to for streaming). This PR implements only the `item.completed` payload for some event types, more event types and item types to come. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-09-25 17:47:09 +00:00
Dylan	594248f415	[exec] add include-plan-tool flag and print it nicely (#3461 ) ### Summary Sometimes in exec runs, we want to allow the model to use the `update_plan` tool, but that's not easily configurable. This change adds a feature flag for this, and formats the output so it's human-readable ## Test Plan <img width="1280" height="354" alt="Screenshot 2025-09-11 at 12 39 44 AM" src="https://github.com/user-attachments/assets/72e11070-fb98-47f5-a784-5123ca7333d9" />	2025-09-23 16:50:59 -07:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
jif-oai	e5fe50d3ce	chore: unify cargo versions (#4044 ) Unify cargo versions at root	2025-09-22 16:47:01 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
pakrym-oai	9b18875a42	Use helpers instead of fixtures (#3888 ) Move to using test helper method everywhere.	2025-09-19 06:46:25 -07:00
Michael Bolin	8595237505	fix: ensure cwd for conversation and sandbox are separate concerns (#3874 ) Previous to this PR, both of these functions take a single `cwd`: `71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)` `71038381aa/codex-rs/core/src/landlock.rs (L16-L23)` whereas `cwd` and `sandbox_cwd` should be set independently (fixed in this PR). Added `sandbox_distinguishes_command_and_policy_cwds()` to `codex-rs/exec/tests/suite/sandbox.rs` to verify this.	2025-09-18 14:37:06 -07:00
dedrisian-oai	62258df92f	feat: /review (#3774 ) Adds `/review` action in TUI <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM" src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d" />	2025-09-18 14:14:16 -07:00
dependabot[bot]	fdf4a68646	chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 in /codex-rs (#3620 ) Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.3.19 to 0.3.20. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/tokio-rs/tracing/releases">tracing-subscriber's releases</a>.</em></p> <blockquote> <h2>tracing-subscriber 0.3.20</h2> <p><strong>Security Fix</strong>: ANSI Escape Sequence Injection (CVE-TBD)</p> <h2>Impact</h2> <p>Previous versions of tracing-subscriber were vulnerable to ANSI escape sequence injection attacks. Untrusted user input containing ANSI escape sequences could be injected into terminal output when logged, potentially allowing attackers to:</p> <ul> <li>Manipulate terminal title bars</li> <li>Clear screens or modify terminal display</li> <li>Potentially mislead users through terminal manipulation</li> </ul> <p>In isolation, impact is minimal, however security issues have been found in terminal emulators that enabled an attacker to use ANSI escape sequences via logs to exploit vulnerabilities in the terminal emulator.</p> <h2>Solution</h2> <p>Version 0.3.20 fixes this vulnerability by escaping ANSI control characters in when writing events to destinations that may be printed to the terminal.</p> <h2>Affected Versions</h2> <p>All versions of tracing-subscriber prior to 0.3.20 are affected by this vulnerability.</p> <h2>Recommendations</h2> <p>Immediate Action Required: We recommend upgrading to tracing-subscriber 0.3.20 immediately, especially if your application:</p> <ul> <li>Logs user-provided input (form data, HTTP headers, query parameters, etc.)</li> <li>Runs in environments where terminal output is displayed to users</li> </ul> <h2>Migration</h2> <p>This is a patch release with no breaking API changes. Simply update your Cargo.toml:</p> <pre lang="toml"><code>[dependencies] tracing-subscriber = "0.3.20" </code></pre> <h2>Acknowledgments</h2> <p>We would like to thank <a href="http://github.com/zefr0x">zefr0x</a> who responsibly reported the issue at <code>security@tokio.rs</code>.</p> <p>If you believe you have found a security vulnerability in any tokio-rs project, please email us at <code>security@tokio.rs</code>.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4c52ca5266`"><code>4c52ca5</code></a> fmt: fix ANSI escape sequence injection vulnerability (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3368">#3368</a>)</li> <li><a href="`f71cebe41e`"><code>f71cebe</code></a> subscriber: impl Clone for EnvFilter (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3360">#3360</a>)</li> <li><a href="`3a1f571102`"><code>3a1f571</code></a> Fix CI (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3361">#3361</a>)</li> <li><a href="`e63ef57f3d`"><code>e63ef57</code></a> chore: prepare tracing-attributes 0.1.30 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3316">#3316</a>)</li> <li><a href="`6e59a13b1a`"><code>6e59a13</code></a> attributes: fix tracing::instrument regression around shadowing (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3311">#3311</a>)</li> <li><a href="`e4df761275`"><code>e4df761</code></a> tracing: update core to 0.1.34 and attributes to 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3305">#3305</a>)</li> <li><a href="`643f392ebb`"><code>643f392</code></a> chore: prepare tracing-attributes 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3304">#3304</a>)</li> <li><a href="`d08e7a6eea`"><code>d08e7a6</code></a> chore: prepare tracing-core 0.1.34 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3302">#3302</a>)</li> <li><a href="`6e70c571d3`"><code>6e70c57</code></a> tracing-subscriber: count numbers of enters in <code>Timings</code> (<a href="https://redirect.github.com/tokio-rs/tracing/issues/2944">#2944</a>)</li> <li><a href="`c01d4fd9de`"><code>c01d4fd</code></a> fix docs and enable CI on <code>main</code> branch (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3295">#3295</a>)</li> <li>Additional commits viewable in <a href="https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.3.19...tracing-subscriber-0.3.20">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tracing-subscriber&package-manager=cargo&previous-version=0.3.19&new-version=0.3.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 00:51:33 -07:00
Ahmed Ibrahim	a30e5e40ee	enable-resume (#3537 ) Adding the ability to resume conversations. we have one verb `resume`. Behavior: `tui`: `codex resume`: opens session picker `codex resume --last`: continue last message `codex resume <session id>`: continue conversation with `session id` `exec`: `codex resume --last`: continue last conversation `codex resume <session id>`: continue conversation with `session id` Implementation: - I added a function to find the path in `~/.codex/sessions/` with a `UUID`. This is helpful in resuming with session id. - Added the above mentioned flags - Added lots of testing	2025-09-14 19:33:19 -04:00
dedrisian-oai	90a0fd342f	Review Mode (Core) (#3401 ) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining why this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" \| "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.	2025-09-12 23:25:10 +00:00
Michael Bolin	9bbeb75361	feat: include reasoning_effort in NewConversationResponse (#3506 ) `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.	2025-09-11 21:04:40 -07:00
Michael Bolin	bec51f6c05	chore: enable clippy::redundant_clone (#3489 ) Created this PR by: - adding `redundant_clone` to `[workspace.lints.clippy]` in `cargo-rs/Cargol.toml` - running `cargo clippy --tests --fix` - running `just fmt` Though I had to clean up one instance of the following that resulted: ```rust let codex = codex; ```	2025-09-11 11:59:37 -07:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Ahmed Ibrahim	162e1235a8	Change forking to read the rollout from file (#3440 ) This PR changes get history op to get path. Then, forking will use a path. This will help us have one unified codepath for resuming/forking conversations. Will also help in having rollout history in order. It also fixes a bug where you won't see the UI when resuming after forking.	2025-09-10 17:42:54 -07:00
Gabriel Peal	5eab4c7ab4	Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388 ) The previous config approach had a few issues: 1. It is part of the config but not designed to be used externally 2. It had to be wired through many places (look at the +/- on this PR 3. It wasn't guaranteed to be set consistently everywhere because we don't have a super well defined way that configs stack. For example, the extension would configure during newConversation but anything that happened outside of that (like login) wouldn't get it. This env var approach is cleaner and also creates one less thing we have to deal with when coming up with a better holistic story around configs. One downside is that I removed the unit test testing for the override because I don't want to deal with setting the global env or spawning child processes and figuring out how to introspect their originator header. The new code is sufficiently simple and I tested it e2e that I feel as if this is still worth it.	2025-09-09 17:23:23 -04:00
Michael Bolin	2a76a08a9e	fix: include rollout_path in NewConversationResponse (#3352 ) Adding the `rollout_path` to the `NewConversationResponse` makes it so a client can perform subsequent operations on a `(ConversationId, PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352). * #3353 * __->__ #3352	2025-09-09 00:11:48 -07:00
jif-oai	a9c68ea270	feat: Run cargo shear during CI (#3338 ) Run cargo shear as part of the CI to ensure no unused dependencies	2025-09-09 01:05:08 +00:00
Justin Lebar	18330c2362	Format large numbers in a more readable way. (#2046 ) - In the bottom line of the TUI, print the number of tokens to 3 sigfigs with an SI suffix, e.g. "1.23K". - Elsewhere where we print a number, I figure it's worthwhile to print the exact number, because e.g. it's a summary of your session. Here we print the numbers comma-separated.	2025-09-08 21:48:48 +00:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	0269096229	Move token usage/context information to session level (#3221 ) Move context information into the main loop so it can be used to interrupt the loop or start auto-compaction.	2025-09-06 15:19:23 +00:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00
pakrym-oai	c636f821ae	Add a common way to create HTTP client (#3110 ) Ensure User-Agent and originator are always sent.	2025-09-03 10:11:02 -07:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Jeremy Rose	e442ecedab	rework message styling (#2877 ) https://github.com/user-attachments/assets/cf07f62b-1895-44bb-b9c3-7a12032eb371	2025-09-02 17:29:58 +00:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Dylan	7f7d1e30f3	[exec] Clean up apply-patch tests (#2648 ) ## Summary These tests were getting a bit unwieldy, and they're starting to become load-bearing. Let's clean them up, and get them working solidly so we can easily expand this harness with new tests. ## Test Plan - [x] Tests continue to pass	2025-08-25 15:08:01 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00
Reuben Narad	363636f5eb	Add web search tool (#2371 ) Adds web_search tool, enabling the model to use Responses API web_search tool. - Disabled by default, enabled by --search flag - When --search is passed, exposes web_search_request function tool to the model, which triggers user approval. When approved, the model can use the web_search tool for the remainder of the turn <img width="1033" height="294" alt="image" src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f" /> --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-23 22:58:56 -07:00
Ahmed Ibrahim	957d44918d	send-aggregated output (#2364 ) We want to send an aggregated output of stderr and stdout so we don't have to aggregate it stderr+stdout as we lose order sometimes. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-23 16:54:31 +00:00
Ahmed Ibrahim	311ad0ce26	fork conversation from a previous message (#2575 ) This can be the underlying logic in order to start a conversation from a previous message. will need some love in the UI. Base for building this: #2588	2025-08-22 17:06:09 -07:00
Jeremy Rose	d994019f3f	tui: coalesce command output; show unabridged commands in transcript (#2590 ) https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b	2025-08-22 16:32:31 -07:00
Dylan	236c4f76a6	[apply_patch] freeform apply_patch tool (#2576 ) ## Summary GPT-5 introduced the concept of [custom tools](https://platform.openai.com/docs/guides/function-calling#custom-tools), which allow the model to send a raw string result back, simplifying json-escape issues. We are migrating gpt-5 to use this by default. However, gpt-oss models do not support custom tools, only normal functions. So we keep both tool definitions, and provide whichever one the model family supports. ## Testing - [x] Tested locally with various models - [x] Unit tests pass	2025-08-22 13:42:34 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Michael Bolin	50c48e88f5	chore: upgrade to Rust 1.89 (#2465 ) Codex created this PR from the following prompt: > upgrade this entire repo to Rust 1.89. Note that this requires updating codex-rs/rust-toolchain.toml as well as the workflows in .github/. Make sure that things are "clippy clean" as this change will likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests` are sufficient to check for correctness Note this modifies a lot of lines because it folds nested `if` statements using `&&`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465). * #2467 * __->__ #2465	2025-08-19 13:22:02 -07:00

1 2 3 4

186 Commits