valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	c988ce28fe	fix: drop Mutex before calling tx_approve.send() (#2876 )	2025-08-28 22:49:29 -07:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Ahmed Ibrahim	c9ca63dc1e	burst paste edge cases (#2683 ) This PR fixes two edge cases in managing burst paste (mainly on power shell). Bugs: - Needs an event key after paste to render the pasted items > ChatComposer::flush_paste_burst_if_due() flushes on timeout. Called: > - Pre-render in App on TuiEvent::Draw. > - Via a delayed frame > BottomPane::request_redraw_in(ChatComposer::recommended_paste_flush_delay()). - Parses two key events separately before starting parsing burst paste > When threshold is crossed, pull preceding burst chars out of the textarea and prepend to paste_burst_buffer, then keep buffering. - Integrates with #2567 to bring image pasting to windows.	2025-08-28 12:54:12 -07:00
Ahmed Ibrahim	ed06f90fb3	Race condition in compact (#2746 ) This fixes the flakiness in `summarize_context_three_requests_and_instructions` because we should trim history before sending task complete.	2025-08-28 12:53:00 -07:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Michael Bolin	ffe585387b	fix: for now, limit the number of deltas sent back to the UI (#2776 ) This is a stopgap solution, but today, we are seeing the client get flooded with events. Since we already truncate the output we send to the model, it feels reasonable to limit how many deltas we send to the client.	2025-08-27 10:23:25 -07:00
Ahmed Ibrahim	2d2f66f9c5	Bug fix: deduplicate assistant messages (#2758 ) We are treating assistant messages in a different way than other messages which resulted in a duplicated history. See #2698	2025-08-27 01:29:16 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Gabriel Peal	4b6c6ce98f	Make git_diff_against_sha more robust (#2749 ) 1. Ignore custom git diff drivers users may have set 2. Allow diffing against filenames that start with a dash	2025-08-27 01:53:00 -04:00
Ahmed Ibrahim	3eb11c10d0	Don't send Exec deltas on apply patch (#2742 ) We are now sending exec deltas on apply patch which doesn't make sense.	2025-08-26 19:16:51 -07:00
Wang	c229a67312	feat(core): Add `remove_conversation` to `ConversationManager` for ma… (#2613 ) ### What this PR does This PR introduces a new public method, remove_conversation(conversation_id: Uuid), to the ConversationManager. This allows consumers of the codex-core library to manually remove a conversation from the manager's in-memory storage. ### Why this change is needed I am currently adapting the Codex client to run as a long-lived server application. In this server environment, ConversationManager instances persist for extended periods, and new conversations are created for each incoming user request. The current implementation of ConversationManager stores all created conversations in a HashMap indefinitely, with no mechanism for removal. This leads to unbounded memory growth in a server context, as every new conversation permanently occupies memory. While an automatic TTL-based cleanup mechanism could be one solution, a simpler, more direct remove_conversation method provides the necessary control for my use case. It allows my server application to explicitly manage the lifecycle of conversations, such as cleaning them up after a request is fully processed or after a period of inactivity is detected at the application level. This change provides a minimal, non-intrusive way to address the memory management issue for server-like applications built on top of codex-core, giving developers the flexibility to implement their own cleanup logic. Signed-off-by: M4n5ter <m4n5terrr@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-26 15:16:43 -07:00
Eric Traut	d32e4f25cf	Added caps on retry config settings (#2701 ) The CLI supports config settings `stream_max_retries` and `request_max_retries` that allow users to override the default retry counts (4 and 5, respectively). However, there's currently no cap placed on these values. In theory, a user could configure an effectively infinite retry count which could hammer the server. This PR adds a reasonable cap (currently 100) to both of these values.	2025-08-25 22:51:01 -07:00
Eric Traut	ab9250e714	Improved user message for rate-limit errors (#2695 ) This PR improves the error message presented to the user when logged in with ChatGPT and a rate-limit error occurs. In particular, it provides the user with information about when the rate limit will be reset. It removes older code that attempted to do the same but relied on parsing of error messages that are not generated by the ChatGPT endpoint. The new code uses newly-added error fields.	2025-08-25 21:42:10 -07:00
Eric Traut	d63e44ae29	Fixed a bug that causes token refresh to not work in a seamless manner (#2699 ) This PR fixes a bug in the token refresh logic. Token refresh is performed in a retry loop so if we receive a 401 error, we refresh the token, then we go around the loop again and reissue the fetch with a fresh token. The bug is that we're not using the updated token on the second and subsequent times through the loop. The result is that we'll try to refresh the token a few more times until we hit the retry limit (default of 4). The 401 error is then passed back up to the caller. Subsequent calls will use the refreshed token, so the problem clears itself up. The fix is straightforward — make sure we use the updated auth information each time through the retry loop.	2025-08-25 19:18:16 -07:00
Jeremy Rose	17e5077507	do not show timeouts as "sandbox error"s (#2587 ) 🙅🫸 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q sandbox error: command timed out ``` 😌👉 ``` ✗ Failed (exit -1) └ 🧪 cargo test --all-features -q error: command timed out ```	2025-08-25 17:52:23 -07:00
Gabriel Peal	cb32f9c64e	Add auth to send_user_turn (#2688 ) It is there for send_user_message but was omitted from send_user_turn. Presumably this was a mistake	2025-08-25 18:57:20 -04:00
Odysseas Yiakoumis	a6c346b9e1	avoid error when /compact response has no token_usage (#2417 ) (#2640 ) Context When running `/compact`, `drain_to_completed` would throw an error if `token_usage` was `None` in `ResponseEvent::Completed`. This made the command fail even though everything else had succeeded. What changed - Instead of erroring, we now just check `if let Some(token_usage)` before sending the event. - If it’s missing, we skip it and move on. Why This makes `AgentTask::compact()` behave in the same way as `AgentTask::spawn()`, which also doesn’t error out when `token_usage` isn’t available. Keeps things consistent and avoids unnecessary failures. Fixes Closes #2417 --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-25 18:42:22 +00:00
Michael Bolin	295ca27e98	fix: Scope ExecSessionManager to Session instead of using global singleton (#2664 ) The `SessionManager` in `exec_command` owns a number of `ExecCommandSession` objects where `ExecCommandSession` has a non-trivial implementation of `Drop`, so we want to be able to drop an individual `SessionManager` to help ensure things get cleaned up in a timely fashion. To that end, we should have one `SessionManager` per session rather than one global one for the lifetime of the CLI process.	2025-08-24 22:52:49 -07:00
Michael Bolin	7b20db942a	fix: build is broken on main; introduce ToolsConfigParams to help fix (#2663 ) `ToolsConfig::new()` taking a large number of boolean params was hard to manage and it finally bit us (see https://github.com/openai/codex/pull/2660). This changes `ToolsConfig::new()` so that it takes a struct (and also reduces the visibility of some members, where possible).	2025-08-24 22:43:42 -07:00
Uhyeon Park	ee2ccb5cb6	Fix cache hit rate by making MCP tools order deterministic (#2611 ) Fixes https://github.com/openai/codex/issues/2610 This PR sorts the tools in `get_openai_tools` by name to ensure a consistent MCP tool order. Currently, MCP servers are stored in a HashMap, which does not guarantee ordering. As a result, the tool order changes across turns, effectively breaking prompt caching in multi-turn sessions. An alternative solution would be to replace the HashMap with an ordered structure, but that would require a much larger code change. Given that it is unrealistic to have so many MCP tools that sorting would cause performance issues, this lightweight fix is chosen instead. By ensuring deterministic tool order, this change should significantly improve cache hit rates and prevent users from hitting usage limits too quickly. (For reference, my own sessions last week reached the limit unusually fast, with cache hit rates falling below 1%.) ## Result After this fix, sessions with MCP servers now show caching behavior almost identical to sessions without MCP servers. Without MCP \| With MCP :-------------------------:\|:-------------------------: <img width="1368" height="1634" alt="image" src="https://github.com/user-attachments/assets/26edab45-7be8-4d6a-b471-558016615fc8" /> \| <img width="1356" height="1632" alt="image" src="https://github.com/user-attachments/assets/5f3634e0-3888-420b-9aaf-deefd9397b40" />	2025-08-24 19:56:24 -07:00
ae	8b49346657	fix: update gpt-5 stats (#2649 ) - To match what's on <https://platform.openai.com/docs/models/gpt-5>.	2025-08-24 16:45:41 -07:00
Dylan	4157788310	[apply_patch] disable default freeform tool (#2643 ) ## Summary We're seeing some issues in the freeform tool - let's disable by default until it stabilizes. ## Testing - [x] Ran locally, confirmed codex-cli could make edits	2025-08-24 11:12:37 -07:00
Reuben Narad	363636f5eb	Add web search tool (#2371 ) Adds web_search tool, enabling the model to use Responses API web_search tool. - Disabled by default, enabled by --search flag - When --search is passed, exposes web_search_request function tool to the model, which triggers user approval. When approved, the model can use the web_search tool for the remainder of the turn <img width="1033" height="294" alt="image" src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f" /> --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-23 22:58:56 -07:00
Ahmed Ibrahim	957d44918d	send-aggregated output (#2364 ) We want to send an aggregated output of stderr and stdout so we don't have to aggregate it stderr+stdout as we lose order sometimes. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-23 16:54:31 +00:00
Michael Bolin	e3b03eaccb	feat: StreamableShell with exec_command and write_stdin tools (#2574 )	2025-08-22 18:10:55 -07:00
Ahmed Ibrahim	311ad0ce26	fork conversation from a previous message (#2575 ) This can be the underlying logic in order to start a conversation from a previous message. will need some love in the UI. Base for building this: #2588	2025-08-22 17:06:09 -07:00
Jeremy Rose	d994019f3f	tui: coalesce command output; show unabridged commands in transcript (#2590 ) https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b	2025-08-22 16:32:31 -07:00
Ahmed Ibrahim	097782c775	Move models.rs to protocol (#2595 ) Moving models.rs to protocol so we can use them in `Codex` operations	2025-08-22 22:18:54 +00:00
Michael Bolin	8ba8089592	fix: prefer sending MCP structuredContent as the function call response, if available (#2594 ) Prior to this change, when we got a `CallToolResult` from an MCP server, we JSON-serialized its `content` field as the `content` to send back to the model as part of the function call output that we send back to the model. This meant that we were dropping the `structuredContent` on the floor. Though reading https://modelcontextprotocol.io/specification/2025-06-18/schema#tool, it appears that if `outputSchema` is specified, then `structuredContent` should be set, which seems to be a "higher-fidelity" response to the function call. This PR updates our handling of `CallToolResult` to prefer using the JSON-serialization of `structuredContent`, if present, using `content` as a fallback. Also, it appears that the sense of `success` was inverted prior to this PR!	2025-08-22 14:10:18 -07:00
Jeremy Rose	57c498159a	test: simplify tests in config.rs (#2586 ) this is much easier to read, thanks @bolinfest for the suggestion.	2025-08-22 14:04:21 -07:00
Dylan	6f0b499594	[config] Detect git worktrees for project trust (#2585 ) ## Summary When resolving our current directory as a project, we want to be a little bit more clever: 1. If we're in a sub-directory of a git repo, resolve our project against the root of the git repo 2. If we're in a git worktree, resolve the project against the root of the git repo ## Testing - [x] Added unit tests - [x] Confirmed locally with a git worktree (the one i was using for this feature)	2025-08-22 13:54:51 -07:00
Dylan	236c4f76a6	[apply_patch] freeform apply_patch tool (#2576 ) ## Summary GPT-5 introduced the concept of [custom tools](https://platform.openai.com/docs/guides/function-calling#custom-tools), which allow the model to send a raw string result back, simplifying json-escape issues. We are migrating gpt-5 to use this by default. However, gpt-oss models do not support custom tools, only normal functions. So we keep both tool definitions, and provide whichever one the model family supports. ## Testing - [x] Tested locally with various models - [x] Unit tests pass	2025-08-22 13:42:34 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
vjain419	80b00a193e	feat(gpt5): add model_verbosity for GPT‑5 via Responses API (#2108 ) Summary - Adds `model_verbosity` config (values: low, medium, high). - Sends `text.verbosity` only for GPT‑5 family models via the Responses API. - Updates docs and adds serialization tests. Motivation - GPT‑5 introduces a verbosity control to steer output length/detail without pro mpt surgery. - Exposing it as a config knob keeps prompts stable and makes behavior explicit and repeatable. Changes - Config: - Added `Verbosity` enum (low\|medium\|high). - Added optional `model_verbosity` to `ConfigToml`, `Config`, and `ConfigProfi le`. - Request wiring: - Extended `ResponsesApiRequest` with optional `text` object. - Populates `text.verbosity` only when model family is `gpt-5`; omitted otherw ise. - Tests: - Verifies `text.verbosity` serializes when set and is omitted when not set. - Docs: - Added “GPT‑5 Verbosity” section in `codex-rs/README.md`. - Added `model_verbosity` section to `codex-rs/config.md`. Usage - In `~/.codex/config.toml`: - `model = "gpt-5"` - `model_verbosity = "low"` (or `"medium"` default, `"high"`) - CLI override example: - `codex -c model="gpt-5" -c model_verbosity="high"` API Impact - Requests to GPT‑5 via Responses API include: `text: { verbosity: "low\|medium\|h igh" }` when configured. - For legacy models or Chat Completions providers, `text` is omitted. Backward Compatibility - Default behavior unchanged when `model_verbosity` is not set (server default “ medium”). Testing - Added unit tests for serialization/omission of `text.verbosity`. - Ran `cargo fmt` and `cargo test --all-features` (all green). Docs - `README.md`: new “GPT‑5 Verbosity” note under Config with example. - `config.md`: new `model_verbosity` section. Out of Scope - No changes to temperature/top_p or other GPT‑5 parameters. - No changes to Chat Completions wiring. Risks / Notes - If OpenAI changes the wire shape for verbosity, we may need to update `Respons esApiRequest`. - Behavior gated to `gpt-5` model family to avoid unexpected effects elsewhere. Checklist - [x] Code gated to GPT‑5 family only - [x] Docs updated (`README.md`, `config.md`) - [x] Tests added and passing - [x] Formatting applied Release note: Add `model_verbosity` config to control GPT‑5 output verbosity via the Responses API (low\|medium\|high).	2025-08-22 09:12:10 -07:00
Dylan	e4c275d615	[apply-patch] Clean up apply-patch tool definitions (#2539 ) ## Summary We've experienced a bit of drift in system prompting for `apply_patch`: - As pointed out in #2030 , our prettier formatting started altering prompt.md in a few ways - We introduced a separate markdown file for apply_patch instructions in #993, but currently duplicate them in the prompt.md file - We added a first-class apply_patch tool in #2303, which has yet another definition This PR starts to consolidate our logic in a few ways: - We now only use `apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea) for system instructions - We no longer include apply_patch system instructions if the tool is specified I'm leaving the definition in openai_tools.rs as duplicated text for now because we're going to be iterated on the first-class tool soon. ## Testing - [x] Added integration tests to verify prompt stability - [x] Tested locally with several different models (gpt-5, gpt-oss, o4-mini)	2025-08-21 20:07:41 -07:00
Dylan	9f71dcbf57	[shell_tool] Small updates to ensure shell consistency (#2571 ) ## Summary Small update to hopefully improve some shell edge cases, and make the function clearer to the model what is going on. Keeping `timeout` as an alias means that calls with the previous name will still work. ## Test Plan - [x] Tested locally, model still works	2025-08-21 19:58:07 -07:00
Jeremy Rose	750ca9e21d	core: write explicit [projects] tables for trusted projects (#2523 ) all of my trust_level settings in my ~/.codex/config.toml were on one line.	2025-08-21 13:20:36 -07:00
Jeremy Rose	db934e438e	read all AGENTS.md up to git root (#2532 ) This updates our logic for AGENTS.md to match documented behavior, which is to read all AGENTS.md files from cwd up to git root.	2025-08-21 08:52:17 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Dylan	d2b2a6d13a	[prompt] xml-format EnvironmentContext (#2272 ) ## Summary Before we land #2243, let's start printing environment_context in our preferred format. This struct will evolve over time with new information, xml gives us a balance of human readable without too much parsing, llm readable, and extensible. Also moves us over to an Option-based struct, so we can easily provide diffs to the model. ## Testing - [x] Updated tests to reflect new format	2025-08-20 23:45:16 -07:00
eddy-win	050b9baeb6	Bridge command generation to powershell when on Windows (#2319 ) ## What? Why? How? - When running on Windows, codex often tries to invoke bash commands, which commonly fail (unless WSL is installed) - Fix: Detect if powershell is available and, if so, route commands to it - Also add a shell_name property to environmental context for codex to default to powershell commands when running in that environment ## Testing - Tested within WSL and powershell (e.g. get top 5 largest files within a folder and validated that commands generated were powershell commands) - Tested within Zsh - Updated unit tests --------- Co-authored-by: Eddy Escardo <eddy@openai.com>	2025-08-20 16:30:34 -07:00
Ahmed Ibrahim	c579ae41ae	Fix login for internal employees (#2528 ) This PR: - fixes for internal employee because we currently want to prefer SIWC for them. - fixes retrying forever on unauthorized access. we need to break eventually on max retries.	2025-08-20 14:05:20 -07:00
Jeremy Rose	0ad4e11c84	detect terminal and include in request headers (#2437 ) This adds the terminal version to the UA header.	2025-08-20 16:54:26 +00:00
Michael Bolin	ce434b1219	fix: prefer config var to env var (#2495 )	2025-08-20 04:51:59 +00:00
Ahmed Ibrahim	d1f1e36836	Refresh ChatGPT auth token (#2484 ) ChatGPT token's live for only 1 hour. If the session is longer we don't refresh the token. We should get the expiry timestamp and attempt to refresh before it.	2025-08-19 21:01:31 -07:00
Gabriel Peal	eaae56a1b0	Client headers (#2487 )	2025-08-19 23:32:15 -04:00
Gabriel Peal	77148a5c61	Diff command (#2476 )	2025-08-19 22:50:28 -04:00
Michael Bolin	e58125e6c1	chore: Rust 1.89 promoted file locking to the standard library, so prefer stdlib to fs2 (#2467 ) --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2467). * __->__ #2467 * #2465	2025-08-19 13:22:46 -07:00

1 2 3 4 5 ...

286 Commits