valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00
Jeremy Rose	4ae45a6c8d	remove bold the keyword from prompt (#3121 ) the model was often including the literal text "Bold the keyword" in lists. this guidance doesn't seem particularly useful to me, so just drop it.	2025-09-03 16:00:33 -07:00
Ahmed Ibrahim	6b83c1c3f3	Fix failing CI (#3130 ) In this test, the ChatGPT token path is used, and the auth layer tries to refresh the token if it thinks the token is “old.” Your helper writes a fixed last_refresh timestamp that has now aged past the 28‑day threshold, so the code attempts a real refresh against auth.openai.com, never reaches the mock, and you end up with received_requests().await.unwrap() being empty.	2025-09-03 22:38:32 +00:00
Dylan	db5276f8e6	chore: Clean up verbosity config (#3056 ) ## Summary It appears that #2108 hit a merge conflict with #2355 - I failed to notice the path difference when re-reviewing the former. This PR rectifies that, and consolidates it into the protocol package, in line with our philosophy of specifying types in one place. ## Testing - [x] Adds config test for model_verbosity	2025-09-03 12:20:31 -07:00
Sing303	0e827b6598	Auto-approve DangerFullAccess patches on non-sandboxed platforms (#2988 ) What? Auto-approve patches when `SandboxPolicy::DangerFullAccess` is enabled on platforms without sandbox support. Changes in `codex-rs/core/src/safety.rs`: return `SafetyCheck::AutoApprove { sandbox_type: SandboxType::None }` when no sandbox is available and DangerFullAccess is set. Why? On platforms lacking sandbox support, requiring explicit user approval despite `DangerFullAccess` being explicitly enabled adds friction without additional safety. This aligns behavior with the stated policy intent. How? Extend `assess_patch_safety` match: * If `get_platform_sandbox()` returns `Some`, keep `AutoApprove { sandbox_type }`. * If `None` and `SandboxPolicy::DangerFullAccess`, return `AutoApprove { SandboxType::None }`. * Otherwise, fall back to `AskUser`. Tests * Local checks: ```bash cargo test && cargo clippy --tests && cargo fmt -- --config imports_granularity=Item ``` (Additionally: `just fmt`, `just fix -p codex-core`, `cargo check -p codex-core`.) Docs No user-facing CLI changes. No README/help updates needed. Risk/Impact Reduces prompts on non-sandboxed platforms when DangerFullAccess is explicitly chosen; consistent with policy semantics. --------- Co-authored-by: Michael Bolin <bolinfest@gmail.com>	2025-09-03 10:57:47 -07:00
Ahmed Ibrahim	daaadfb260	Introduce Rollout Policy (#3116 ) Have a helper function for deciding if we are rolling out a function or not	2025-09-03 17:37:07 +00:00
pakrym-oai	c636f821ae	Add a common way to create HTTP client (#3110 ) Ensure User-Agent and originator are always sent.	2025-09-03 10:11:02 -07:00
Jeremy Rose	97000c6e6d	core: correct sandboxed shell tool description (reads allowed anywhere) (#3069 ) Correct the `shell` tool description for sandboxed runs and add targeted tests. - Fix the WorkspaceWrite description to clearly state that writes outside the writable roots require escalated permissions; reads are not restricted. The previous wording/formatting could be read as restricting reads outside the workspace. - Render the writable roots list on its own lines under a newline after "writable roots:" for clarity. - Show the "Commands that require network access" note only in WorkspaceWrite when network is disabled. - Add focused tests that call `create_shell_tool_for_sandbox` directly and assert the exact description text for WorkspaceWrite, ReadOnly, and DangerFullAccess. - Update AGENTS.md to note that `just fmt` can be run automatically without asking.	2025-09-03 10:02:34 -07:00
Ahmed Ibrahim	a56eb48195	Use the new search tool (#3086 ) We were using the preview search tool in the past. We should use the new one.	2025-09-03 01:16:47 -07:00
Ahmed Ibrahim	d77b33ded7	core(rollout): extract rollout module, add listing API, and return file heads (#1634 ) - Move rollout persistence and listing into a dedicated module: rollout/{recorder,list}. - Expose lightweight conversation listing that returns file paths plus the first 5 JSONL records for preview.	2025-09-03 07:39:19 +00:00
dependabot[bot]	9ad2e726fc	chore(deps): bump thiserror from 2.0.12 to 2.0.16 in /codex-rs (#2667 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 2.0.12 to 2.0.16. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dtolnay/thiserror/releases">thiserror's releases</a>.</em></p> <blockquote> <h2>2.0.16</h2> <ul> <li>Add to "no-std" crates.io category (<a href="https://redirect.github.com/dtolnay/thiserror/issues/429">#429</a>)</li> </ul> <h2>2.0.15</h2> <ul> <li>Prevent <code>Error::provide</code> API becoming unavailable from a future new compiler lint (<a href="https://redirect.github.com/dtolnay/thiserror/issues/427">#427</a>)</li> </ul> <h2>2.0.14</h2> <ul> <li>Allow build-script cleanup failure with NFSv3 output directory to be non-fatal (<a href="https://redirect.github.com/dtolnay/thiserror/issues/426">#426</a>)</li> </ul> <h2>2.0.13</h2> <ul> <li>Documentation improvements</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`40b58536cc`"><code>40b5853</code></a> Release 2.0.16</li> <li><a href="`83dfb5f99b`"><code>83dfb5f</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/429">#429</a> from dtolnay/nostd</li> <li><a href="`9b4a99fb90`"><code>9b4a99f</code></a> Add to "no-std" crates.io category</li> <li><a href="`f6145ebe84`"><code>f6145eb</code></a> Release 2.0.15</li> <li><a href="`2717177976`"><code>2717177</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/427">#427</a> from dtolnay/caplints</li> <li><a href="`2cd13e6767`"><code>2cd13e6</code></a> Make error_generic_member_access compatible with -Dwarnings</li> <li><a href="`eea6799e2d`"><code>eea6799</code></a> Release 2.0.14</li> <li><a href="`a2aa6d7a57`"><code>a2aa6d7</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/426">#426</a> from dtolnay/enotempty</li> <li><a href="`f00ebc57be`"><code>f00ebc5</code></a> Allow build-script cleanup failure with NFSv3 output directory to be non-fatal</li> <li><a href="`61f28da3df`"><code>61f28da</code></a> Release 2.0.13</li> <li>Additional commits viewable in <a href="https://github.com/dtolnay/thiserror/compare/2.0.12...2.0.16">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=thiserror&package-manager=cargo&previous-version=2.0.12&new-version=2.0.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-02 23:50:53 -07:00
pchuri	6aa306c584	feat: add stable file locking using std::fs APIs (#2894 ) ## Summary This PR implements advisory file locking for the message history using Rust 1.89+ stabilized std::fs::File locking APIs, eliminating the need for external dependencies. ## Key Changes - Stable API Usage: Uses std::fs::File::try_lock() and try_lock_shared() APIs stabilized in Rust 1.89 - Cross-Platform Compatibility: - Unix systems use try_lock_shared() for advisory read locks - Windows systems use try_lock() due to different lock semantics - Retry Logic: Maintains existing retry behavior for concurrent access scenarios - No External Dependencies: Removes need for external file locking crates ## Technical Details The implementation provides advisory file locking to prevent corruption when multiple Codex processes attempt to write to the message history file simultaneously. The locking is platform-aware to handle differences in Windows vs Unix file locking behavior. ## Testing - ✅ Builds successfully on all platforms - ✅ Existing message history tests pass - ✅ File locking retry logic verified Related to discussion in #2773 about using stabilized Rust APIs instead of external dependencies. --------- Co-authored-by: Michael Bolin <bolinfest@gmail.com>	2025-09-02 23:46:27 -07:00
Jeremy Rose	53413c728e	parse cd foo && ... for exec and apply_patch (#3083 ) sometimes the model likes to run "cd foo && ..." instead of using the workdir parameter of exec. handle them roughly the same.	2025-09-03 05:26:06 +00:00
Dominik Kundel	b127a3643f	Improve gpt-oss compatibility (#2461 ) The gpt-oss models require reasoning with subsequent Chat Completions requests because otherwise the model forgets why the tools were called. This change fixes that and also adds some additional missing documentation around how to handle context windows in Ollama and how to show the CoT if you desire to.	2025-09-02 19:49:03 -07:00
Anton Panasenko	a93a907c7e	[feat] use experimental reasoning summary (#3071 ) <img width="1512" height="442" alt="Screenshot 2025-09-02 at 3 49 46 PM" src="https://github.com/user-attachments/assets/26c3c1cf-b7ed-4520-a12a-8d38a8e0c318" />	2025-09-02 18:47:14 -07:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Eric Traut	051f185ce3	Added back the logic to handle rate-limit errors when using API key (#3070 ) A previous PR removed this when adding rate-limit errors for the ChatGPT auth path.	2025-09-02 17:50:15 -07:00
Dylan	6f75114695	[apply-patch] Fix lark grammar (#2651 ) ## Summary Fixes an issue with the lark grammar definition for the apply_patch freeform tool. This does NOT change the defaults, merely patches the root cause of the issue we were seeing with empty lines, and an issue with config flowing through correctly. Specifically, the following requires that a line is non-empty: ``` add_line: "+" /(.+)/ LF -> line ``` but many changes _should_ involve creating/updating empty lines. The new definition is: ``` add_line: "+" /(.*)/ LF -> line ``` ## Testing - [x] Tested locally, reproduced the issue without the update and confirmed that the model will produce empty lines wiht the new lark grammar	2025-09-02 17:38:19 -07:00
Ahmed Ibrahim	431a10fc50	chore: unify history loading (#2736 ) We have two ways of loading conversation with a previous history. Fork conversation and the experimental resume that we had before. In this PR, I am unifying their code path. The path is getting the history items and recording them in a brand new conversation. This PR also constraint the rollout recorder responsibilities to be only recording to the disk and loading from the disk. The PR also fixes a current bug when we have two forking in a row: History 1: <Environment Context> UserMessage_1 UserMessage_2 UserMessage_3 Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> This shouldn't happen but because we were appending the `<Environment Context>` after each spawning and it's considered as _user message_. Now, we don't add this message if restoring and old conversation.	2025-09-02 22:44:29 +00:00
Jeremy Rose	e442ecedab	rework message styling (#2877 ) https://github.com/user-attachments/assets/cf07f62b-1895-44bb-b9c3-7a12032eb371	2025-09-02 17:29:58 +00:00
dependabot[bot]	1cc6b97227	chore(deps): bump regex-lite from 0.1.6 to 0.1.7 in /codex-rs (#3010 ) Bumps [regex-lite](https://github.com/rust-lang/regex) from 0.1.6 to 0.1.7. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/regex/blob/master/CHANGELOG.md">regex-lite's changelog</a>.</em></p> <blockquote> <h1>0.1.79</h1> <ul> <li>Require regex-syntax 0.3.8.</li> </ul> <h1>0.1.78</h1> <ul> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/290">#290</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/290">rust-lang/regex#290</a>): Fixes bug <a href="https://redirect.github.com/rust-lang/regex/issues/289">#289</a>, which caused some regexes with a certain combination of literals to match incorrectly.</li> </ul> <h1>0.1.77</h1> <ul> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/281">#281</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/281">rust-lang/regex#281</a>): Fixes bug <a href="https://redirect.github.com/rust-lang/regex/issues/280">#280</a> by disabling all literal optimizations when a pattern is partially anchored.</li> </ul> <h1>0.1.76</h1> <ul> <li>Tweak criteria for using the Teddy literal matcher.</li> </ul> <h1>0.1.75</h1> <ul> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/275">#275</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/275">rust-lang/regex#275</a>): Improves match verification performance in the Teddy SIMD searcher.</li> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/278">#278</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/278">rust-lang/regex#278</a>): Replaces slow substring loop in the Teddy SIMD searcher with Aho-Corasick.</li> <li>Implemented DoubleEndedIterator on regex set match iterators.</li> </ul> <h1>0.1.74</h1> <ul> <li>Release regex-syntax 0.3.5 with a minor bug fix.</li> <li>Fix bug <a href="https://redirect.github.com/rust-lang/regex/issues/272">#272</a>.</li> <li>Fix bug <a href="https://redirect.github.com/rust-lang/regex/issues/277">#277</a>.</li> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/270">#270</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/270">rust-lang/regex#270</a>): Fixes bugs <a href="https://redirect.github.com/rust-lang/regex/issues/264">#264</a>, <a href="https://redirect.github.com/rust-lang/regex/issues/268">#268</a> and an unreported where the DFA cache size could be drastically underestimated in some cases (leading to high unexpected memory usage).</li> </ul> <h1>0.1.73</h1> <ul> <li>Release <code>regex-syntax 0.3.4</code>.</li> <li>Bump <code>regex-syntax</code> dependency version for <code>regex</code> to <code>0.3.4</code>.</li> </ul> <h1>0.1.72</h1> <ul> <li>[PR <a href="https://redirect.github.com/rust-lang/regex/issues/262">#262</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/262">rust-lang/regex#262</a>): Fixes a number of small bugs caught by fuzz testing (AFL).</li> </ul> <h1>0.1.71</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`45c3da7681`"><code>45c3da7</code></a> regex-lite-0.1.7</li> <li><a href="`873ed800c5`"><code>873ed80</code></a> regex-automata-0.4.10</li> <li><a href="`ea834f8e1f`"><code>ea834f8</code></a> regex-syntax-0.8.6</li> <li><a href="`86836fbe84`"><code>86836fb</code></a> changelog: 1.11.2</li> <li><a href="`63a26c1a7f`"><code>63a26c1</code></a> cargo: ensure that 'perf' doesn't enable 'std' implicitly (<a href="https://redirect.github.com/rust-lang/regex/issues/1150">#1150</a>)</li> <li><a href="`dd96592be2`"><code>dd96592</code></a> doc: clarify CRLF mode effect</li> <li><a href="`931dae0192`"><code>931dae0</code></a> cargo: point <code>repository</code> metadata to clonable URLs</li> <li><a href="`a66fde6e80`"><code>a66fde6</code></a> doc: remove references to non-existent parameters</li> <li><a href="`1873e96a7b`"><code>1873e96</code></a> automata: add <code>DFA::set_prefilter</code> method to the DFA types</li> <li><a href="`89ff15310b`"><code>89ff153</code></a> doc: fix misspelling typo</li> <li>Additional commits viewable in <a href="https://github.com/rust-lang/regex/compare/regex-lite-0.1.6...regex-lite-0.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=regex-lite&package-manager=cargo&previous-version=0.1.6&new-version=0.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-02 09:09:17 -07:00
Michael Bolin	c988ce28fe	fix: drop Mutex before calling tx_approve.send() (#2876 )	2025-08-28 22:49:29 -07:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Ahmed Ibrahim	c9ca63dc1e	burst paste edge cases (#2683 ) This PR fixes two edge cases in managing burst paste (mainly on power shell). Bugs: - Needs an event key after paste to render the pasted items > ChatComposer::flush_paste_burst_if_due() flushes on timeout. Called: > - Pre-render in App on TuiEvent::Draw. > - Via a delayed frame > BottomPane::request_redraw_in(ChatComposer::recommended_paste_flush_delay()). - Parses two key events separately before starting parsing burst paste > When threshold is crossed, pull preceding burst chars out of the textarea and prepend to paste_burst_buffer, then keep buffering. - Integrates with #2567 to bring image pasting to windows.	2025-08-28 12:54:12 -07:00
Ahmed Ibrahim	ed06f90fb3	Race condition in compact (#2746 ) This fixes the flakiness in `summarize_context_three_requests_and_instructions` because we should trim history before sending task complete.	2025-08-28 12:53:00 -07:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Michael Bolin	ffe585387b	fix: for now, limit the number of deltas sent back to the UI (#2776 ) This is a stopgap solution, but today, we are seeing the client get flooded with events. Since we already truncate the output we send to the model, it feels reasonable to limit how many deltas we send to the client.	2025-08-27 10:23:25 -07:00
Ahmed Ibrahim	2d2f66f9c5	Bug fix: deduplicate assistant messages (#2758 ) We are treating assistant messages in a different way than other messages which resulted in a duplicated history. See #2698	2025-08-27 01:29:16 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Gabriel Peal	4b6c6ce98f	Make git_diff_against_sha more robust (#2749 ) 1. Ignore custom git diff drivers users may have set 2. Allow diffing against filenames that start with a dash	2025-08-27 01:53:00 -04:00
ae	3d8bca7814	feat: decrease testing when running interactively (#2707 )	2025-08-26 19:57:04 -07:00
Ahmed Ibrahim	3eb11c10d0	Don't send Exec deltas on apply patch (#2742 ) We are now sending exec deltas on apply patch which doesn't make sense.	2025-08-26 19:16:51 -07:00
Wang	c229a67312	feat(core): Add `remove_conversation` to `ConversationManager` for ma… (#2613 ) ### What this PR does This PR introduces a new public method, remove_conversation(conversation_id: Uuid), to the ConversationManager. This allows consumers of the codex-core library to manually remove a conversation from the manager's in-memory storage. ### Why this change is needed I am currently adapting the Codex client to run as a long-lived server application. In this server environment, ConversationManager instances persist for extended periods, and new conversations are created for each incoming user request. The current implementation of ConversationManager stores all created conversations in a HashMap indefinitely, with no mechanism for removal. This leads to unbounded memory growth in a server context, as every new conversation permanently occupies memory. While an automatic TTL-based cleanup mechanism could be one solution, a simpler, more direct remove_conversation method provides the necessary control for my use case. It allows my server application to explicitly manage the lifecycle of conversations, such as cleaning them up after a request is fully processed or after a period of inactivity is detected at the application level. This change provides a minimal, non-intrusive way to address the memory management issue for server-like applications built on top of codex-core, giving developers the flexibility to implement their own cleanup logic. Signed-off-by: M4n5ter <m4n5terrr@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-26 15:16:43 -07:00
Eric Traut	d32e4f25cf	Added caps on retry config settings (#2701 ) The CLI supports config settings `stream_max_retries` and `request_max_retries` that allow users to override the default retry counts (4 and 5, respectively). However, there's currently no cap placed on these values. In theory, a user could configure an effectively infinite retry count which could hammer the server. This PR adds a reasonable cap (currently 100) to both of these values.	2025-08-25 22:51:01 -07:00
Eric Traut	ab9250e714	Improved user message for rate-limit errors (#2695 ) This PR improves the error message presented to the user when logged in with ChatGPT and a rate-limit error occurs. In particular, it provides the user with information about when the rate limit will be reset. It removes older code that attempted to do the same but relied on parsing of error messages that are not generated by the ChatGPT endpoint. The new code uses newly-added error fields.	2025-08-25 21:42:10 -07:00
Eric Traut	d63e44ae29	Fixed a bug that causes token refresh to not work in a seamless manner (#2699 ) This PR fixes a bug in the token refresh logic. Token refresh is performed in a retry loop so if we receive a 401 error, we refresh the token, then we go around the loop again and reissue the fetch with a fresh token. The bug is that we're not using the updated token on the second and subsequent times through the loop. The result is that we'll try to refresh the token a few more times until we hit the retry limit (default of 4). The 401 error is then passed back up to the caller. Subsequent calls will use the refreshed token, so the problem clears itself up. The fix is straightforward — make sure we use the updated auth information each time through the retry loop.	2025-08-25 19:18:16 -07:00
Jeremy Rose	17e5077507	do not show timeouts as "sandbox error"s (#2587 ) 🙅🫸 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q sandbox error: command timed out ``` 😌👉 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q error: command timed out ```	2025-08-25 17:52:23 -07:00
Gabriel Peal	cb32f9c64e	Add auth to send_user_turn (#2688 ) It is there for send_user_message but was omitted from send_user_turn. Presumably this was a mistake	2025-08-25 18:57:20 -04:00
Odysseas Yiakoumis	a6c346b9e1	avoid error when /compact response has no token_usage (#2417 ) (#2640 ) Context When running `/compact`, `drain_to_completed` would throw an error if `token_usage` was `None` in `ResponseEvent::Completed`. This made the command fail even though everything else had succeeded. What changed - Instead of erroring, we now just check `if let Some(token_usage)` before sending the event. - If it’s missing, we skip it and move on. Why This makes `AgentTask::compact()` behave in the same way as `AgentTask::spawn()`, which also doesn’t error out when `token_usage` isn’t available. Keeps things consistent and avoids unnecessary failures. Fixes Closes #2417 --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-25 18:42:22 +00:00
dependabot[bot]	7d67e54628	chore(deps): bump toml_edit from 0.23.3 to 0.23.4 in /codex-rs (#2665 )	2025-08-25 08:20:30 -07:00
Michael Bolin	295ca27e98	fix: Scope ExecSessionManager to Session instead of using global singleton (#2664 ) The `SessionManager` in `exec_command` owns a number of `ExecCommandSession` objects where `ExecCommandSession` has a non-trivial implementation of `Drop`, so we want to be able to drop an individual `SessionManager` to help ensure things get cleaned up in a timely fashion. To that end, we should have one `SessionManager` per session rather than one global one for the lifetime of the CLI process.	2025-08-24 22:52:49 -07:00
Michael Bolin	7b20db942a	fix: build is broken on main; introduce ToolsConfigParams to help fix (#2663 ) `ToolsConfig::new()` taking a large number of boolean params was hard to manage and it finally bit us (see https://github.com/openai/codex/pull/2660). This changes `ToolsConfig::new()` so that it takes a struct (and also reduces the visibility of some members, where possible).	2025-08-24 22:43:42 -07:00
Uhyeon Park	ee2ccb5cb6	Fix cache hit rate by making MCP tools order deterministic (#2611 ) Fixes https://github.com/openai/codex/issues/2610 This PR sorts the tools in `get_openai_tools` by name to ensure a consistent MCP tool order. Currently, MCP servers are stored in a HashMap, which does not guarantee ordering. As a result, the tool order changes across turns, effectively breaking prompt caching in multi-turn sessions. An alternative solution would be to replace the HashMap with an ordered structure, but that would require a much larger code change. Given that it is unrealistic to have so many MCP tools that sorting would cause performance issues, this lightweight fix is chosen instead. By ensuring deterministic tool order, this change should significantly improve cache hit rates and prevent users from hitting usage limits too quickly. (For reference, my own sessions last week reached the limit unusually fast, with cache hit rates falling below 1%.) ## Result After this fix, sessions with MCP servers now show caching behavior almost identical to sessions without MCP servers. Without MCP \| With MCP :-------------------------:\|:-------------------------: <img width="1368" height="1634" alt="image" src="https://github.com/user-attachments/assets/26edab45-7be8-4d6a-b471-558016615fc8" /> \| <img width="1356" height="1632" alt="image" src="https://github.com/user-attachments/assets/5f3634e0-3888-420b-9aaf-deefd9397b40" />	2025-08-24 19:56:24 -07:00
ae	8b49346657	fix: update gpt-5 stats (#2649 ) - To match what's on <https://platform.openai.com/docs/models/gpt-5>.	2025-08-24 16:45:41 -07:00
dependabot[bot]	e49116a4c5	chore(deps): bump whoami from 1.6.0 to 1.6.1 in /codex-rs (#2497 ) Bumps [whoami](https://github.com/ardaku/whoami) from 1.6.0 to 1.6.1. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/ardaku/whoami/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=whoami&package-manager=cargo&previous-version=1.6.0&new-version=1.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-24 14:38:30 -07:00
Dylan	4157788310	[apply_patch] disable default freeform tool (#2643 ) ## Summary We're seeing some issues in the freeform tool - let's disable by default until it stabilizes. ## Testing - [x] Ran locally, confirmed codex-cli could make edits	2025-08-24 11:12:37 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00

1 2 3 4 5 ...

337 Commits