valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	344d4a1d68	Add explicit codex exec events (#4177 ) This pull request add a new experimental format of JSON output. You can try it using `codex exec --experimental-json`. Design takes a lot of inspiration from Responses API items and stream format. # Session and items Each invocation of `codex exec` starts or resumes a session. Session contains multiple high-level item types: 1. Assistant message 2. Assistant thinking 3. Command execution 4. File changes 5. To-do lists 6. etc. # Events Session and items are going through their life cycles which is represented by events. Session is `session.created` or `session.resumed` Items are `item.added`, `item.updated`, `item.completed`, `item.require_approval` (or other item types like `item.output_delta` when we need streaming). So a typical session can look like: <details> ``` { "type": "session.created", "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752" } { "type": "item.completed", "item": { "id": "itm_0", "item_type": "assistant_message", "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed." } } { "type": "item.completed", "item": { "id": "itm_1", "item_type": "command_execution", "command": "bash -lc ls -la \| sed -n '1,200p'", "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@ 33 pakrym staff 1056 Sep 24 14:36 .\ndrwxr-xr-x 41 pakrym staff 1312 Sep 24 09:17 ..\n-rw-r--r--@ 1 pakrym staff 6 Jul 9 16:16 .codespellignore\n-rw-r--r--@ 1 pakrym staff 258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@ 5 pakrym staff 160 Jul 23 08:26 .devcontainer\n-rw-r--r--@ 1 pakrym staff 6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@ 15 pakrym staff 480 Sep 24 14:38 .git\ndrwxr-xr-x@ 12 pakrym staff 384 Sep 2 16:00 .github\n-rw-r--r--@ 1 pakrym staff 778 Jul 9 16:16 .gitignore\ndrwxr-xr-x@ 3 pakrym staff 96 Aug 11 09:37 .husky\n-rw-r--r--@ 1 pakrym staff 104 Jul 9 16:16 .npmrc\n-rw-r--r--@ 1 pakrym staff 96 Sep 2 08:52 .prettierignore\n-rw-r--r--@ 1 pakrym staff 170 Jul 9 16:16 .prettierrc.toml\ndrwxr-xr-x@ 5 pakrym staff 160 Sep 14 17:43 .vscode\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:37 2025-09-11\n-rw-r--r--@ 1 pakrym staff 5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@ 1 pakrym staff 92 Sep 2 08:52 CHANGELOG.md\n-rw-r--r--@ 1 pakrym staff 1145 Jul 9 16:16 cliff.toml\ndrwxr-xr-x@ 11 pakrym staff 352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@ 38 pakrym staff 1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@ 18 pakrym staff 576 Sep 23 11:01 docs\n-rw-r--r--@ 1 pakrym staff 2038 Jul 9 16:16 flake.lock\n-rw-r--r--@ 1 pakrym staff 1434 Jul 9 16:16 flake.nix\n-rw-r--r--@ 1 pakrym staff 10926 Jul 9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym staff 14880 Jul 15 07:36 node_modules\n-rw-r--r--@ 1 pakrym staff 242 Aug 5 08:25 NOTICE\n-rw-r--r--@ 1 pakrym staff 578 Aug 14 12:31 package.json\n-rw-r--r--@ 1 pakrym staff 498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@ 1 pakrym staff 58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@ 1 pakrym staff 2402 Jul 9 16:16 PNPM.md\n-rw-r--r--@ 1 pakrym staff 4393 Sep 12 14:36 README.md\ndrwxr-xr-x@ 4 pakrym staff 128 Sep 18 09:28 scripts\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:34 tmp\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_2", "item_type": "reasoning", "text": "Reviewing README.md file\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!" } } { "type": "item.completed", "item": { "id": "itm_3", "item_type": "command_execution", "command": "bash -lc sed -n '1,40p' README.md", "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_4", "item_type": "reasoning", "text": "Updating README file\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy." } } { "type": "item.completed", "item": { "id": "itm_5", "item_type": "reasoning", "text": "Crafting the patch\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly." } } { "type": "item.completed", "item": { "id": "itm_6", "item_type": "assistant_message", "text": "Applying the change by removing the first line from `README.md`." } } { "type": "item.completed", "item": { "id": "itm_7", "item_type": "file_change", "changes": [ { "path": "/Users/pakrym/code/codex/README.md", "kind": "update" } ], "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_8", "item_type": "reasoning", "text": "Validating README Update\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!" } } { "type": "item.completed", "item": { "id": "itm_9", "item_type": "command_execution", "command": "bash -lc sed -n '1,20p' README.md", "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_10", "item_type": "assistant_message", "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?" } } ``` </details> The idea is to give users fully formatted items they can use directly in their rendering/application logic and avoid having them building up items manually based on events (unless they want to for streaming). This PR implements only the `item.completed` payload for some event types, more event types and item types to come. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-09-25 17:47:09 +00:00
Dylan	594248f415	[exec] add include-plan-tool flag and print it nicely (#3461 ) ### Summary Sometimes in exec runs, we want to allow the model to use the `update_plan` tool, but that's not easily configurable. This change adds a feature flag for this, and formats the output so it's human-readable ## Test Plan <img width="1280" height="354" alt="Screenshot 2025-09-11 at 12 39 44 AM" src="https://github.com/user-attachments/assets/72e11070-fb98-47f5-a784-5123ca7333d9" />	2025-09-23 16:50:59 -07:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
jif-oai	e5fe50d3ce	chore: unify cargo versions (#4044 ) Unify cargo versions at root	2025-09-22 16:47:01 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
pakrym-oai	9b18875a42	Use helpers instead of fixtures (#3888 ) Move to using test helper method everywhere.	2025-09-19 06:46:25 -07:00
Michael Bolin	8595237505	fix: ensure cwd for conversation and sandbox are separate concerns (#3874 ) Previous to this PR, both of these functions take a single `cwd`: `71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)` `71038381aa/codex-rs/core/src/landlock.rs (L16-L23)` whereas `cwd` and `sandbox_cwd` should be set independently (fixed in this PR). Added `sandbox_distinguishes_command_and_policy_cwds()` to `codex-rs/exec/tests/suite/sandbox.rs` to verify this.	2025-09-18 14:37:06 -07:00
dedrisian-oai	62258df92f	feat: /review (#3774 ) Adds `/review` action in TUI <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM" src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d" />	2025-09-18 14:14:16 -07:00
dependabot[bot]	fdf4a68646	chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 in /codex-rs (#3620 ) Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.3.19 to 0.3.20. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/tokio-rs/tracing/releases">tracing-subscriber's releases</a>.</em></p> <blockquote> <h2>tracing-subscriber 0.3.20</h2> <p><strong>Security Fix</strong>: ANSI Escape Sequence Injection (CVE-TBD)</p> <h2>Impact</h2> <p>Previous versions of tracing-subscriber were vulnerable to ANSI escape sequence injection attacks. Untrusted user input containing ANSI escape sequences could be injected into terminal output when logged, potentially allowing attackers to:</p> <ul> <li>Manipulate terminal title bars</li> <li>Clear screens or modify terminal display</li> <li>Potentially mislead users through terminal manipulation</li> </ul> <p>In isolation, impact is minimal, however security issues have been found in terminal emulators that enabled an attacker to use ANSI escape sequences via logs to exploit vulnerabilities in the terminal emulator.</p> <h2>Solution</h2> <p>Version 0.3.20 fixes this vulnerability by escaping ANSI control characters in when writing events to destinations that may be printed to the terminal.</p> <h2>Affected Versions</h2> <p>All versions of tracing-subscriber prior to 0.3.20 are affected by this vulnerability.</p> <h2>Recommendations</h2> <p>Immediate Action Required: We recommend upgrading to tracing-subscriber 0.3.20 immediately, especially if your application:</p> <ul> <li>Logs user-provided input (form data, HTTP headers, query parameters, etc.)</li> <li>Runs in environments where terminal output is displayed to users</li> </ul> <h2>Migration</h2> <p>This is a patch release with no breaking API changes. Simply update your Cargo.toml:</p> <pre lang="toml"><code>[dependencies] tracing-subscriber = "0.3.20" </code></pre> <h2>Acknowledgments</h2> <p>We would like to thank <a href="http://github.com/zefr0x">zefr0x</a> who responsibly reported the issue at <code>security@tokio.rs</code>.</p> <p>If you believe you have found a security vulnerability in any tokio-rs project, please email us at <code>security@tokio.rs</code>.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4c52ca5266`"><code>4c52ca5</code></a> fmt: fix ANSI escape sequence injection vulnerability (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3368">#3368</a>)</li> <li><a href="`f71cebe41e`"><code>f71cebe</code></a> subscriber: impl Clone for EnvFilter (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3360">#3360</a>)</li> <li><a href="`3a1f571102`"><code>3a1f571</code></a> Fix CI (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3361">#3361</a>)</li> <li><a href="`e63ef57f3d`"><code>e63ef57</code></a> chore: prepare tracing-attributes 0.1.30 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3316">#3316</a>)</li> <li><a href="`6e59a13b1a`"><code>6e59a13</code></a> attributes: fix tracing::instrument regression around shadowing (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3311">#3311</a>)</li> <li><a href="`e4df761275`"><code>e4df761</code></a> tracing: update core to 0.1.34 and attributes to 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3305">#3305</a>)</li> <li><a href="`643f392ebb`"><code>643f392</code></a> chore: prepare tracing-attributes 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3304">#3304</a>)</li> <li><a href="`d08e7a6eea`"><code>d08e7a6</code></a> chore: prepare tracing-core 0.1.34 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3302">#3302</a>)</li> <li><a href="`6e70c571d3`"><code>6e70c57</code></a> tracing-subscriber: count numbers of enters in <code>Timings</code> (<a href="https://redirect.github.com/tokio-rs/tracing/issues/2944">#2944</a>)</li> <li><a href="`c01d4fd9de`"><code>c01d4fd</code></a> fix docs and enable CI on <code>main</code> branch (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3295">#3295</a>)</li> <li>Additional commits viewable in <a href="https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.3.19...tracing-subscriber-0.3.20">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tracing-subscriber&package-manager=cargo&previous-version=0.3.19&new-version=0.3.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 00:51:33 -07:00
Ahmed Ibrahim	a30e5e40ee	enable-resume (#3537 ) Adding the ability to resume conversations. we have one verb `resume`. Behavior: `tui`: `codex resume`: opens session picker `codex resume --last`: continue last message `codex resume <session id>`: continue conversation with `session id` `exec`: `codex resume --last`: continue last conversation `codex resume <session id>`: continue conversation with `session id` Implementation: - I added a function to find the path in `~/.codex/sessions/` with a `UUID`. This is helpful in resuming with session id. - Added the above mentioned flags - Added lots of testing	2025-09-14 19:33:19 -04:00
dedrisian-oai	90a0fd342f	Review Mode (Core) (#3401 ) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining why this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" \| "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.	2025-09-12 23:25:10 +00:00
Michael Bolin	9bbeb75361	feat: include reasoning_effort in NewConversationResponse (#3506 ) `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.	2025-09-11 21:04:40 -07:00
Michael Bolin	bec51f6c05	chore: enable clippy::redundant_clone (#3489 ) Created this PR by: - adding `redundant_clone` to `[workspace.lints.clippy]` in `cargo-rs/Cargol.toml` - running `cargo clippy --tests --fix` - running `just fmt` Though I had to clean up one instance of the following that resulted: ```rust let codex = codex; ```	2025-09-11 11:59:37 -07:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Ahmed Ibrahim	162e1235a8	Change forking to read the rollout from file (#3440 ) This PR changes get history op to get path. Then, forking will use a path. This will help us have one unified codepath for resuming/forking conversations. Will also help in having rollout history in order. It also fixes a bug where you won't see the UI when resuming after forking.	2025-09-10 17:42:54 -07:00
Gabriel Peal	5eab4c7ab4	Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388 ) The previous config approach had a few issues: 1. It is part of the config but not designed to be used externally 2. It had to be wired through many places (look at the +/- on this PR 3. It wasn't guaranteed to be set consistently everywhere because we don't have a super well defined way that configs stack. For example, the extension would configure during newConversation but anything that happened outside of that (like login) wouldn't get it. This env var approach is cleaner and also creates one less thing we have to deal with when coming up with a better holistic story around configs. One downside is that I removed the unit test testing for the override because I don't want to deal with setting the global env or spawning child processes and figuring out how to introspect their originator header. The new code is sufficiently simple and I tested it e2e that I feel as if this is still worth it.	2025-09-09 17:23:23 -04:00
Michael Bolin	2a76a08a9e	fix: include rollout_path in NewConversationResponse (#3352 ) Adding the `rollout_path` to the `NewConversationResponse` makes it so a client can perform subsequent operations on a `(ConversationId, PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352). * #3353 * __->__ #3352	2025-09-09 00:11:48 -07:00
jif-oai	a9c68ea270	feat: Run cargo shear during CI (#3338 ) Run cargo shear as part of the CI to ensure no unused dependencies	2025-09-09 01:05:08 +00:00
Justin Lebar	18330c2362	Format large numbers in a more readable way. (#2046 ) - In the bottom line of the TUI, print the number of tokens to 3 sigfigs with an SI suffix, e.g. "1.23K". - Elsewhere where we print a number, I figure it's worthwhile to print the exact number, because e.g. it's a summary of your session. Here we print the numbers comma-separated.	2025-09-08 21:48:48 +00:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	0269096229	Move token usage/context information to session level (#3221 ) Move context information into the main loop so it can be used to interrupt the loop or start auto-compaction.	2025-09-06 15:19:23 +00:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00
pakrym-oai	c636f821ae	Add a common way to create HTTP client (#3110 ) Ensure User-Agent and originator are always sent.	2025-09-03 10:11:02 -07:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Jeremy Rose	e442ecedab	rework message styling (#2877 ) https://github.com/user-attachments/assets/cf07f62b-1895-44bb-b9c3-7a12032eb371	2025-09-02 17:29:58 +00:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Dylan	7f7d1e30f3	[exec] Clean up apply-patch tests (#2648 ) ## Summary These tests were getting a bit unwieldy, and they're starting to become load-bearing. Let's clean them up, and get them working solidly so we can easily expand this harness with new tests. ## Test Plan - [x] Tests continue to pass	2025-08-25 15:08:01 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00
Reuben Narad	363636f5eb	Add web search tool (#2371 ) Adds web_search tool, enabling the model to use Responses API web_search tool. - Disabled by default, enabled by --search flag - When --search is passed, exposes web_search_request function tool to the model, which triggers user approval. When approved, the model can use the web_search tool for the remainder of the turn <img width="1033" height="294" alt="image" src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f" /> --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-23 22:58:56 -07:00
Ahmed Ibrahim	957d44918d	send-aggregated output (#2364 ) We want to send an aggregated output of stderr and stdout so we don't have to aggregate it stderr+stdout as we lose order sometimes. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-23 16:54:31 +00:00
Ahmed Ibrahim	311ad0ce26	fork conversation from a previous message (#2575 ) This can be the underlying logic in order to start a conversation from a previous message. will need some love in the UI. Base for building this: #2588	2025-08-22 17:06:09 -07:00
Jeremy Rose	d994019f3f	tui: coalesce command output; show unabridged commands in transcript (#2590 ) https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b	2025-08-22 16:32:31 -07:00
Dylan	236c4f76a6	[apply_patch] freeform apply_patch tool (#2576 ) ## Summary GPT-5 introduced the concept of [custom tools](https://platform.openai.com/docs/guides/function-calling#custom-tools), which allow the model to send a raw string result back, simplifying json-escape issues. We are migrating gpt-5 to use this by default. However, gpt-oss models do not support custom tools, only normal functions. So we keep both tool definitions, and provide whichever one the model family supports. ## Testing - [x] Tested locally with various models - [x] Unit tests pass	2025-08-22 13:42:34 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Michael Bolin	50c48e88f5	chore: upgrade to Rust 1.89 (#2465 ) Codex created this PR from the following prompt: > upgrade this entire repo to Rust 1.89. Note that this requires updating codex-rs/rust-toolchain.toml as well as the workflows in .github/. Make sure that things are "clippy clean" as this change will likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests` are sufficient to check for correctness Note this modifies a lot of lines because it folds nested `if` statements using `&&`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465). * #2467 * __->__ #2465	2025-08-19 13:22:02 -07:00
Dylan	e7e5fe91c8	[tui] Support /mcp command (#2430 ) ## Summary Adds a `/mcp` command to list active tools. We can extend this command to allow configuration of MCP tools, but for now a simple list command will help debug if your config.toml and your tools are working as expected.	2025-08-19 09:00:31 -07:00
Michael Bolin	712bfa04ac	chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423 ) The existing `wire_format.rs` should share more types with the `codex-protocol` crate (like `AskForApproval` instead of maintaining a parallel `CodexToolCallApprovalPolicy` enum), so this PR moves `wire_format.rs` into `codex-protocol`, renaming it as `mcp-protocol.rs`. We also de-dupe types, where appropriate. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423). * #2424 * __->__ #2423	2025-08-18 09:36:57 -07:00
Michael Bolin	b581498882	fix: introduce EventMsg::TurnAborted (#2365 ) Introduces `EventMsg::TurnAborted` that should be sent in response to `Op::Interrupt`. In the MCP server, updates the handling of a `ClientRequest::InterruptConversation` request such that it sends the `Op::Interrupt` but does not respond to the request until it sees an `EventMsg::TurnAborted`.	2025-08-17 21:40:31 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Parker Thompson	c26d42ab69	Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309 ) When using codex-tui on a linux system I was unable to run `cargo clippy` inside of codex due to: ``` [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET\|SOCK_CLOEXEC, 0, <unfinished ...> [pid 3548370] close(8 <unfinished ...> [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted) ``` And ``` 3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted) ``` This PR: * Fixes a bug that disallowed AF_UNIX to allow it on `socket()` * Adds recvfrom() to the syscall allow list, this should be fine since we disable opening new sockets. But we should validate there is not a open socket inheritance issue. * Allow socketpair to be called for AF_UNIX * Adds tests for AF_UNIX components * All of which allows running `cargo clippy` within the sandbox on linux, and possibly other tooling using a fork server model + AF_UNIX comms.	2025-08-14 17:12:41 -07:00
Michael Bolin	2ecca79663	fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318 ) The high-order bit on this PR is that it makes it so `sandbox.rs` tests both Mac and Linux, as we introduce a general `spawn_command_under_sandbox()` function with platform-specific implementations for testing. An important, and interesting, discovery in porting the test to Linux is that (for reasons cited in the code comments), `/dev/shm` has to be added to `writable_roots` on Linux in order for `multiprocessing.Lock` to work there. Granting write access to `/dev/shm` comes with some degree of risk, so we do not make this the default for Codex CLI. Piggybacking on top of #2317, this moves the `python_multiprocessing_lock_works` test yet again, moving `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs` because in `codex-rs/exec/tests` we can use `cargo_bin()` like so: ``` let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec"); ``` which is necessary so we can use `codex_linux_sandbox_exe` and therefore `spawn_command_under_linux_sandbox` in an integration test. This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs` and into `landlock.rs`, which makes things more consistent with `seatbelt.rs` in `codex-core`. For reference, https://github.com/openai/codex/pull/1808 is the PR that made the change to Seatbelt to get this test to pass on Mac.	2025-08-14 15:47:48 -07:00

1 2 3

129 Commits