valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	f09170b574	chore: print stderr from MCP server to test output using eprintln! (#2849 ) Related to https://github.com/openai/codex/pull/2848, I don't see the stderr from `codex mcp` colocated with the other stderr from `test_shell_command_approval_triggers_elicitation()` when it fails even though we have `RUST_LOG=debug` set when we spawn `codex mcp`: `1e9e703b96/codex-rs/mcp-server/tests/common/mcp_process.rs (L65)` Let's try this new logic which should be more explicit.	2025-08-28 12:43:13 -07:00
Michael Bolin	1e9e703b96	chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848 ) `test_shell_command_approval_triggers_elicitation()` is one of a number of integration tests that we have observed to be flaky on GitHub CI, so this PR tries to reduce the flakiness _and_ to provide us with more information when it flakes. Specifically: - Changed the command that we use to trigger the elicitation from `git init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'` because running `git` seems more likely to invite variance. - Increased the timeout to wait for the task response from 10s to 20s. - Added more logging.	2025-08-28 12:33:33 -07:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
Jeremy Rose	e5611aab07	disallow some slash commands while a task is running (#2792 ) /new, /init, /models, /approvals, etc. don't work correctly during a turn. disable them.	2025-08-28 10:15:59 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Jeremy Rose	3e309805ae	fix cursor after suspend (#2690 ) This was supposed to be fixed by #2569, but I think the actual fix got lost in the refactoring. Intended behavior: pressing ^Z moves the cursor below the viewport before suspending.	2025-08-27 14:17:10 -07:00
Jeremy Rose	488a40211a	fix (most) doubled lines and hanging list markers (#2789 ) This was mostly written by codex under heavy guidance via test cases drawn from logged session data and fuzzing. It also uncovered some bugs in tui_markdown, which will in some cases split a list marker from the list item content. We're not addressing those bugs for now.	2025-08-27 13:55:59 -07:00
Reuben Narad	6e4c9d5243	Added back codex-rs/config.md to link to new location (#2778 ) Quick fix: point old config.md to new location	2025-08-27 18:37:41 +00:00
Reuben Narad	459363e17b	README / docs refactor (#2724 ) This PR cleans up the monolithic README by breaking it into a set navigable pages under docs/ (install, getting started, configuration, authentication, sandboxing and approvals, platform details, FAQ, ZDR, contributing, license). The top‑level README is now more concise and intuitive, (with corrected screenshots). It also consolidates overlapping content from codex-rs/README.md into the top‑level docs and updates links accordingly. The codex-rs README remains in place for now as a pointer and for continuity. Finally, added an extensive config reference table at the bottom of docs/config.md. --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-27 10:30:39 -07:00
Michael Bolin	ffe585387b	fix: for now, limit the number of deltas sent back to the UI (#2776 ) This is a stopgap solution, but today, we are seeing the client get flooded with events. Since we already truncate the output we send to the model, it feels reasonable to limit how many deltas we send to the client.	2025-08-27 10:23:25 -07:00
Dylan	0cec0770e2	[mcp-server] Add GetConfig endpoint (#2725 ) ## Summary Adds a GetConfig request to the MCP Protocol, so MCP clients can evaluate the resolved config.toml settings which the harness is using. ## Testing - [x] Added an end to end test of the endpoint	2025-08-27 09:59:03 -07:00
Ahmed Ibrahim	2d2f66f9c5	Bug fix: deduplicate assistant messages (#2758 ) We are treating assistant messages in a different way than other messages which resulted in a duplicated history. See #2698	2025-08-27 01:29:16 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Gabriel Peal	4b6c6ce98f	Make git_diff_against_sha more robust (#2749 ) 1. Ignore custom git diff drivers users may have set 2. Allow diffing against filenames that start with a dash	2025-08-27 01:53:00 -04:00
easong-openai	5df04c8a13	Cache transcript wraps (#2739 ) Previously long transcripts would become unusable.	2025-08-26 22:20:09 -07:00
ae	3d8bca7814	feat: decrease testing when running interactively (#2707 )	2025-08-26 19:57:04 -07:00
Ahmed Ibrahim	3eb11c10d0	Don't send Exec deltas on apply patch (#2742 ) We are now sending exec deltas on apply patch which doesn't make sense.	2025-08-26 19:16:51 -07:00
mattsu	bd65c4db87	Fix crash when backspacing placeholders adjacent to multibyte text (#2674 ) Prevented panics when deleting placeholders near multibyte characters by clamping the cursor to a valid boundary and using get-based slicing Added a regression test to ensure backspacing after multibyte text leaves placeholders intact without crashing --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-26 18:31:49 -07:00
Jeremy Rose	b367790d9b	fix emoji spacing (#2735 ) before: <img width="295" height="266" alt="Screenshot 2025-08-26 at 5 05 03 PM" src="https://github.com/user-attachments/assets/3e876f08-26d0-407e-a995-28fd072e288f" /> after: <img width="295" height="129" alt="Screenshot 2025-08-26 at 5 05 30 PM" src="https://github.com/user-attachments/assets/2a019d52-19ed-40ef-8155-4f02c400796a" />	2025-08-26 17:34:24 -07:00
Jeremy Rose	435154ce93	fix transcript lines being added to diff view (#2721 ) This fixes a bug where if you ran /diff while at turn was running, transcript lines would be added to the end of the diff view. Also, refactor to make this kind of issue less likely in future.	2025-08-27 00:03:11 +00:00
vinaybantupalli	fb3f6456cf	fix issue #2713 : adding support for alt+ctrl+h to delete backward word (#2717 ) This pr addresses the fix for https://github.com/openai/codex/issues/2713 ### Changes: - Added key handler for `Alt+Ctrl+H` → `delete_backward_word()` - Added test coverage in `delete_backward_word_alt_keys()` that verifies both: - Standard `Alt+Backspace` binding continues to work - New `Alt+Ctrl+H` binding works correctly for backward word deletion ### Testing: The test ensures both key combinations produce identical behavior: - Delete the previous word from "hello world" → "hello " - Cursor positioned correctly after deletion ### Backward Compatibility: This change is backward compatible - existing `Alt+Backspace` functionality remains unchanged while adding support for the terminal-specific `Alt+Ctrl+H` variant	2025-08-26 16:37:46 -07:00
Jeremy Rose	f2603a4e50	Esc while there are queued messages drops the messages back into the composer (#2687 ) https://github.com/user-attachments/assets/bbb427c4-cdc7-4997-a4ef-8156e8170742	2025-08-26 16:26:50 -07:00
Jeremy Rose	eb161116f0	tui: render keyboard icon with emoji variation selector (⌨️) (#2728 ) Use emoji variation selector (VS16) for the keyboard icon so it consistently renders as emoji (⌨️) rather than text (⌨) across terminals. Touches TUI command rendering for unknown parsed commands. No behavior change beyond display.	2025-08-26 16:11:21 -07:00
Wang	c229a67312	feat(core): Add `remove_conversation` to `ConversationManager` for ma… (#2613 ) ### What this PR does This PR introduces a new public method, remove_conversation(conversation_id: Uuid), to the ConversationManager. This allows consumers of the codex-core library to manually remove a conversation from the manager's in-memory storage. ### Why this change is needed I am currently adapting the Codex client to run as a long-lived server application. In this server environment, ConversationManager instances persist for extended periods, and new conversations are created for each incoming user request. The current implementation of ConversationManager stores all created conversations in a HashMap indefinitely, with no mechanism for removal. This leads to unbounded memory growth in a server context, as every new conversation permanently occupies memory. While an automatic TTL-based cleanup mechanism could be one solution, a simpler, more direct remove_conversation method provides the necessary control for my use case. It allows my server application to explicitly manage the lifecycle of conversations, such as cleaning them up after a request is fully processed or after a period of inactivity is detected at the application level. This change provides a minimal, non-intrusive way to address the memory management issue for server-like applications built on top of codex-core, giving developers the flexibility to implement their own cleanup logic. Signed-off-by: M4n5ter <m4n5terrr@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-26 15:16:43 -07:00
Jeremy Rose	db98d2ce25	enable alternate scroll in transcript mode (#2686 ) this allows the mouse wheel to scroll the transcript / diff views.	2025-08-26 11:47:00 -07:00
ae	274d9b413f	[feat] Simplfy command approval UI (#2708 ) - Removed the plain "No" option, which confused the model, since we already have the "No, provide feedback" option, which works better. # Before <img width="476" height="168" alt="image" src="https://github.com/user-attachments/assets/6e783d9f-dec9-4610-9cad-8442eb377a90" /> # After <img width="553" height="175" alt="image" src="https://github.com/user-attachments/assets/3cdae582-3366-47bc-9753-288930df2324" />	2025-08-26 10:08:06 -07:00
Eric Traut	d32e4f25cf	Added caps on retry config settings (#2701 ) The CLI supports config settings `stream_max_retries` and `request_max_retries` that allow users to override the default retry counts (4 and 5, respectively). However, there's currently no cap placed on these values. In theory, a user could configure an effectively infinite retry count which could hammer the server. This PR adds a reasonable cap (currently 100) to both of these values.	2025-08-25 22:51:01 -07:00
ae	a4d34235bc	[fix] emoji padding (#2702 ) - We use emojis as bullet icons of sorts, and in some common terminals like Terminal or iTerm, these can render with insufficient padding between the emoji and following text. - This PR makes emoji look better in Terminal and iTerm, at the expense of Ghostty. (All default fonts.) # Terminal <img width="420" height="123" alt="image" src="https://github.com/user-attachments/assets/93590703-e35a-4781-a697-881d7ec95598" /> # iTerm <img width="465" height="163" alt="image" src="https://github.com/user-attachments/assets/f11e6558-d2db-4727-bb7e-2b61eed0a3b1" /> # Ghostty <img width="485" height="142" alt="image" src="https://github.com/user-attachments/assets/7a7b021f-5238-4672-8066-16cd1da32dc6" />	2025-08-25 22:49:19 -07:00
ae	d085f73a2a	[feat] reduce bottom padding to 1 line (#2704 )	2025-08-25 22:47:26 -07:00
Eric Traut	ab9250e714	Improved user message for rate-limit errors (#2695 ) This PR improves the error message presented to the user when logged in with ChatGPT and a rate-limit error occurs. In particular, it provides the user with information about when the rate limit will be reset. It removes older code that attempted to do the same but relied on parsing of error messages that are not generated by the ChatGPT endpoint. The new code uses newly-added error fields.	2025-08-25 21:42:10 -07:00
Jeremy Rose	e5283b6126	single control flow for both Esc and Ctrl+C (#2691 ) Esc and Ctrl+C while a task is running should do the same thing. There were some cases where pressing Esc would leave a "stuck" widget in the history; this fixes that and cleans up the logic so there's just one path for interrupting the task. Also clean up some subtly mishandled key events (e.g. Ctrl+D would quit the app while an approval modal was showing if the textarea was empty). --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-25 20:15:38 -07:00
Eric Traut	d63e44ae29	Fixed a bug that causes token refresh to not work in a seamless manner (#2699 ) This PR fixes a bug in the token refresh logic. Token refresh is performed in a retry loop so if we receive a 401 error, we refresh the token, then we go around the loop again and reissue the fetch with a fresh token. The bug is that we're not using the updated token on the second and subsequent times through the loop. The result is that we'll try to refresh the token a few more times until we hit the retry limit (default of 4). The 401 error is then passed back up to the caller. Subsequent calls will use the refreshed token, so the problem clears itself up. The fix is straightforward — make sure we use the updated auth information each time through the retry loop.	2025-08-25 19:18:16 -07:00
Jeremy Rose	17e5077507	do not show timeouts as "sandbox error"s (#2587 ) 🙅🫸 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q sandbox error: command timed out ``` 😌👉 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q error: command timed out ```	2025-08-25 17:52:23 -07:00
Jeremy Rose	b1079187e4	queued messages rendered italic (#2693 ) <img width="416" height="215" alt="Screenshot 2025-08-25 at 5 29 53 PM" src="https://github.com/user-attachments/assets/0f4178c9-6997-4e7a-bb30-0817b98d9748" />	2025-08-26 00:36:05 +00:00
Jeremy Rose	ae8f772ef2	do not schedule frames for Tui::Draw events in backtrack (#2692 ) this was causing continuous rerendering when a transcript overlay was present	2025-08-26 00:29:24 +00:00
dedrisian-oai	468a8b4c38	Copying / Dragging image files (MacOS Terminal + iTerm) (#2567 ) In this PR: - [x] Add support for dragging / copying image files into chat. - [x] Don't remove image placeholders when submitting. - [x] Add tests. Works for: - Image Files - Dragging MacOS Screenshots (Terminal, iTerm) Todos: - [ ] In some terminals (VSCode, WIndows Powershell, and remote SSH-ing), copy-pasting a file streams the escaped filepath as individual key events rather than a single Paste event. We'll need to have a function (in a separate PR) for detecting these paste events.	2025-08-25 16:39:42 -07:00
Gabriel Peal	cb32f9c64e	Add auth to send_user_turn (#2688 ) It is there for send_user_message but was omitted from send_user_turn. Presumably this was a mistake	2025-08-25 18:57:20 -04:00
Ahmed Ibrahim	907afc9425	Fix esc (#2661 ) Esc should have other functionalities when it's not used in a backtracking situation. i.e. to cancel pop up menu when selecting model/approvals or to interrupt an active turn.	2025-08-25 15:38:46 -07:00
Dylan	7f7d1e30f3	[exec] Clean up apply-patch tests (#2648 ) ## Summary These tests were getting a bit unwieldy, and they're starting to become load-bearing. Let's clean them up, and get them working solidly so we can easily expand this harness with new tests. ## Test Plan - [x] Tests continue to pass	2025-08-25 15:08:01 -07:00
Michael Bolin	568d6f819f	fix: use backslash as path separator on Windows (#2684 ) I noticed that when running `/status` on Windows, I saw something like: ``` Path: ~/src\codex ``` so now it should be: ``` Path: ~\src\codex ``` Admittedly, `~` is understood by PowerShell but not on Windows, in general, but it's much less verbose than `%USERPROFILE%`.	2025-08-25 14:47:17 -07:00
Jeremy Rose	251c4c2ba9	tui: queue messages (#2637 ) https://github.com/user-attachments/assets/44349aa6-3b97-4029-99e1-5484e9a8775f	2025-08-25 21:38:38 +00:00
Odysseas Yiakoumis	a6c346b9e1	avoid error when /compact response has no token_usage (#2417 ) (#2640 ) Context When running `/compact`, `drain_to_completed` would throw an error if `token_usage` was `None` in `ResponseEvent::Completed`. This made the command fail even though everything else had succeeded. What changed - Instead of erroring, we now just check `if let Some(token_usage)` before sending the event. - If it’s missing, we skip it and move on. Why This makes `AgentTask::compact()` behave in the same way as `AgentTask::spawn()`, which also doesn’t error out when `token_usage` isn’t available. Keeps things consistent and avoids unnecessary failures. Fixes Closes #2417 --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-25 18:42:22 +00:00
Gabriel Peal	e307040f10	Index file (#2678 )	2025-08-25 13:23:32 -04:00
dependabot[bot]	7d67e54628	chore(deps): bump toml_edit from 0.23.3 to 0.23.4 in /codex-rs (#2665 )	2025-08-25 08:20:30 -07:00
Michael Bolin	295ca27e98	fix: Scope ExecSessionManager to Session instead of using global singleton (#2664 ) The `SessionManager` in `exec_command` owns a number of `ExecCommandSession` objects where `ExecCommandSession` has a non-trivial implementation of `Drop`, so we want to be able to drop an individual `SessionManager` to help ensure things get cleaned up in a timely fashion. To that end, we should have one `SessionManager` per session rather than one global one for the lifetime of the CLI process.	2025-08-24 22:52:49 -07:00
Michael Bolin	7b20db942a	fix: build is broken on main; introduce ToolsConfigParams to help fix (#2663 ) `ToolsConfig::new()` taking a large number of boolean params was hard to manage and it finally bit us (see https://github.com/openai/codex/pull/2660). This changes `ToolsConfig::new()` so that it takes a struct (and also reduces the visibility of some members, where possible).	2025-08-24 22:43:42 -07:00
Uhyeon Park	ee2ccb5cb6	Fix cache hit rate by making MCP tools order deterministic (#2611 ) Fixes https://github.com/openai/codex/issues/2610 This PR sorts the tools in `get_openai_tools` by name to ensure a consistent MCP tool order. Currently, MCP servers are stored in a HashMap, which does not guarantee ordering. As a result, the tool order changes across turns, effectively breaking prompt caching in multi-turn sessions. An alternative solution would be to replace the HashMap with an ordered structure, but that would require a much larger code change. Given that it is unrealistic to have so many MCP tools that sorting would cause performance issues, this lightweight fix is chosen instead. By ensuring deterministic tool order, this change should significantly improve cache hit rates and prevent users from hitting usage limits too quickly. (For reference, my own sessions last week reached the limit unusually fast, with cache hit rates falling below 1%.) ## Result After this fix, sessions with MCP servers now show caching behavior almost identical to sessions without MCP servers. Without MCP \| With MCP :-------------------------:\|:-------------------------: <img width="1368" height="1634" alt="image" src="https://github.com/user-attachments/assets/26edab45-7be8-4d6a-b471-558016615fc8" /> \| <img width="1356" height="1632" alt="image" src="https://github.com/user-attachments/assets/5f3634e0-3888-420b-9aaf-deefd9397b40" />	2025-08-24 19:56:24 -07:00
ae	8b49346657	fix: update gpt-5 stats (#2649 ) - To match what's on <https://platform.openai.com/docs/models/gpt-5>.	2025-08-24 16:45:41 -07:00
dependabot[bot]	e49116a4c5	chore(deps): bump whoami from 1.6.0 to 1.6.1 in /codex-rs (#2497 ) Bumps [whoami](https://github.com/ardaku/whoami) from 1.6.0 to 1.6.1. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/ardaku/whoami/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=whoami&package-manager=cargo&previous-version=1.6.0&new-version=1.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-24 14:38:30 -07:00
Michael Bolin	517ffd00c6	feat: use the arg0 trick with apply_patch (#2646 ) Historically, Codex CLI has treated `apply_patch` (and its sometimes misspelling, `applypatch`) as a "virtual CLI," intercepting it when it appears as the first arg to `command` for the `"container.exec", `"shell"`, or `"local_shell"` tools. This approach has a known limitation where if, say, the model created a Python script that runs `apply_patch` and then tried to run the Python script, we have no insight as to what the model is trying to do and the Python Script would fail because `apply_patch` was never really on the `PATH`. One way to solve this problem is to require users to install an `apply_patch` executable alongside the `codex` executable (or at least put it someplace where Codex can discover it). Though to keep Codex CLI as a standalone executable, we exploit "the arg0 trick" where we create a temporary directory with an entry named `apply_patch` and prepend that directory to the `PATH` for the duration of the invocation of Codex. - On UNIX, `apply_patch` is a symlink to `codex`, which now changes its behavior to behave like `apply_patch` if arg0 is `apply_patch` (or `applypatch`) - On Windows, `apply_patch.bat` is a batch script that runs `codex --codex-run-as-apply-patch %*`, as Codex also changes its behavior if the first argument is `--codex-run-as-apply-patch`.	2025-08-24 14:35:51 -07:00

1 2 3 4 5 ...

646 Commits