valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Ahmed Ibrahim	a30e5e40ee	enable-resume (#3537 ) Adding the ability to resume conversations. we have one verb `resume`. Behavior: `tui`: `codex resume`: opens session picker `codex resume --last`: continue last message `codex resume <session id>`: continue conversation with `session id` `exec`: `codex resume --last`: continue last conversation `codex resume <session id>`: continue conversation with `session id` Implementation: - I added a function to find the path in `~/.codex/sessions/` with a `UUID`. This is helpful in resuming with session id. - Added the above mentioned flags - Added lots of testing	2025-09-14 19:33:19 -04:00
dedrisian-oai	b2f6fc3b9a	Fix flaky windows test (#3564 ) There are exactly 4 types of flaky tests in Windows x86 right now: 1. `review_input_isolated_from_parent_history` => Times out waiting for closing events 2. `review_does_not_emit_agent_message_on_structured_output` => Times out waiting for closing events 3. `auto_compact_runs_after_token_limit_hit` => Times out waiting for closing events 4. `auto_compact_runs_after_token_limit_hit` => Also has a problem where auto compact should add a third request, but receives 4 requests. 1, 2, and 3 seem to be solved with increasing threads on windows runner from 2 -> 4. Don't know yet why # 4 is happening, but probably also because of WireMock issues on windows causing races.	2025-09-14 23:20:25 +00:00
pakrym-oai	863d9c237e	Include command output when sending timeout to model (#3576 ) Being able to see the output helps the model decide how to handle the timeout.	2025-09-14 14:38:26 -07:00
Ahmed Ibrahim	bbea6bbf7e	Handle resuming/forking after compact (#3533 ) We need to construct the history different when compact happens. For this, we need to just consider the history after compact and convert compact to a response item. This needs to change and use `build_compact_history` when this #3446 is merged.	2025-09-14 13:23:31 +00:00
Jeremy Rose	4891ee29c5	refactor transcript view to handle HistoryCells (#3538 ) No (intended) functional change. This refactors the transcript view to hold a list of HistoryCells instead of a list of Lines. This simplifies and makes much of the logic more robust, as well as laying the groundwork for future changes, e.g. live-updating history cells in the transcript. Similar to #2879 in goal. Fixes #2755.	2025-09-13 19:23:14 -07:00
pakrym-oai	3d4acbaea0	Preserve IDs for more item types in azure (#3542 ) https://github.com/openai/codex/issues/3509	2025-09-13 01:09:56 +00:00
pakrym-oai	414b8be8b6	Always request encrypted cot (#3539 ) Otherwise future requests will fail with 500	2025-09-12 23:51:30 +00:00
dedrisian-oai	90a0fd342f	Review Mode (Core) (#3401 ) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining why this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" \| "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.	2025-09-12 23:25:10 +00:00
pakrym-oai	e3c6903199	Add Azure Responses API workaround (#3528 ) Azure Responses API doesn't work well with store:false and response items. If store = false and id is sent an error is thrown that ID is not found If store = false and id is not sent an error is thrown that ID is required Add detection for Azure urls and add a workaround to preserve reasoning item IDs and send store:true	2025-09-12 13:52:15 -07:00
jif-oai	ea225df22e	feat: context compaction (#3446 ) ## Compact feature: 1. Stops the model when the context window become too large 2. Add a user turn, asking for the model to summarize 3. Build a bridge that contains all the previous user message + the summary. Rendered from a template 4. Start sampling again from a clean conversation with only that bridge	2025-09-12 13:07:10 -07:00
jif-oai	c6fd056aa6	feat: reasoning effort as optional (#3527 ) Allow the reasoning effort to be optional	2025-09-12 12:06:33 -07:00
jif-oai	bba567cee9	bug: fix model save (#3525 ) Fix those 2 behaviors: 1. The model does not get saved if we don't CTRL + S 2. The reasoning effort get saved	2025-09-12 10:38:12 -07:00
Ahmed Ibrahim	674e3d3c90	Add Compact and Turn Context to the rollout items (#3444 ) Adding compact and turn context to the rollout items based on #3440	2025-09-11 18:08:51 +00:00
jif-oai	114ce9ff4d	NIT unified exec (#3479 ) Fix the default value of the experimental flag of unified_exec	2025-09-11 16:19:12 +00:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Ahmed Ibrahim	162e1235a8	Change forking to read the rollout from file (#3440 ) This PR changes get history op to get path. Then, forking will use a path. This will help us have one unified codepath for resuming/forking conversations. Will also help in having rollout history in order. It also fixes a bug where you won't see the UI when resuming after forking.	2025-09-10 17:42:54 -07:00
jif-oai	c09ed74a16	Unified execution (#3288 ) ## Unified PTY-Based Exec Tool Note: this requires to have this flag in the config: `use_experimental_unified_exec_tool=true` - Adds a PTY-backed interactive exec feature (“unified_exec”) with session reuse via session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s). - Protocol: introduces ResponseItem::UnifiedExec { session_id, arguments, timeout_ms }. - Tools: exposes unified_exec as a function tool (Responses API); excluded from Chat Completions payload while still supported in tool lists. - Path handling: resolves commands via PATH (or explicit paths), with UTF‑8/newline‑aware truncation (truncate_middle). - Tests: cover command parsing, path resolution, session persistence/cleanup, multi‑session isolation, timeouts, and truncation behavior.	2025-09-10 17:38:11 -07:00
Jeremy Rose	f69f07b028	put workspace roots in the environment context (#3375 ) to keep the tool description constant when the writable roots change.	2025-09-10 15:10:52 -07:00
Michael Bolin	51d9e05de7	Back out "feat: POSIX unification and snapshot sessions (#3179 )" (#3430 ) This reverts https://github.com/openai/codex/pull/3179. #3179 appears to introduce a regression where sourcing dotfiles causes a bunch of activity in the title bar (and potentially slows things down?) https://github.com/user-attachments/assets/a68f7fb3-0749-4e0e-a321-2aa6993e01da Verified this no longer happens after backing out #3179. Original commit changeset: `62bd0e3d9d`	2025-09-10 12:40:24 -07:00
Ahmed Ibrahim	45bd5ca4b9	Move initial history to protocol (#3422 ) To fix an edge case of forking then resuming #3419	2025-09-10 10:17:24 -07:00
Ahmed Ibrahim	43809a454e	Introduce rollout items (#3380 ) This PR introduces Rollout items. This enable us to rollout eventmsgs and session meta. This is mostly #3214 with rebase on main	2025-09-09 23:52:33 +00:00
Gabriel Peal	5eab4c7ab4	Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388 ) The previous config approach had a few issues: 1. It is part of the config but not designed to be used externally 2. It had to be wired through many places (look at the +/- on this PR 3. It wasn't guaranteed to be set consistently everywhere because we don't have a super well defined way that configs stack. For example, the extension would configure during newConversation but anything that happened outside of that (like login) wouldn't get it. This env var approach is cleaner and also creates one less thing we have to deal with when coming up with a better holistic story around configs. One downside is that I removed the unit test testing for the override because I don't want to deal with setting the global env or spawning child processes and figuring out how to introspect their originator header. The new code is sufficiently simple and I tested it e2e that I feel as if this is still worth it.	2025-09-09 17:23:23 -04:00
jif-oai	62bd0e3d9d	feat: POSIX unification and snapshot sessions (#3179 ) ## Session snapshot For POSIX shell, the goal is to take a snapshot of the interactive shell environment, store it in a session file located in `.codex/` and only source this file for every command that is run. As a result, if a snapshot files exist, `bash -lc <CALL>` get replaced by `bash -c <CALL>`. This also fixes the issue that `bash -lc` does not source `.bashrc`, resulting in missing env variables and aliases in the codex session. ## POSIX unification Unify `bash` and `zsh` shell into a POSIX shell. The rational is that the tool will not use any `zsh` specific capabilities. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-09-08 18:09:45 -07:00
Jeremy Rose	ac58749bd3	allow mach-lookup for com.apple.system.opendirectoryd.libinfo (#3334 ) in the base sandbox policy. this is [allowed in Chrome renderers](https://source.chromium.org/chromium/chromium/src/+/main:sandbox/policy/mac/common.sb;l=266;drc=7afa0043cfcddb3ef9dafe5acbfc01c2f7e7df01), so I feel it's fairly safe.	2025-09-08 16:28:52 -07:00
Gabriel Peal	5eaaf307e1	Generate more typescript types and return conversation id with ConversationSummary (#3219 ) This PR does multiple things that are necessary for conversation resume to work from the extension. I wanted to make sure everything worked so these changes wound up in one PR: 1. Generate more ts types 2. Resume rollout history files rather than create a new one every time it is resumed so you don't see a duplicate conversation in history for every resume. Chatted with @aibrahim-oai to verify this 3. Return conversation_id in conversation summaries 4. [Cleanup] Use serde and strong types for a lot of the rollout file parsing	2025-09-08 17:54:47 -04:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00
Ahmed Ibrahim	6b83c1c3f3	Fix failing CI (#3130 ) In this test, the ChatGPT token path is used, and the auth layer tries to refresh the token if it thinks the token is “old.” Your helper writes a fixed last_refresh timestamp that has now aged past the 28‑day threshold, so the code attempts a real refresh against auth.openai.com, never reaches the mock, and you end up with received_requests().await.unwrap() being empty.	2025-09-03 22:38:32 +00:00
pakrym-oai	c636f821ae	Add a common way to create HTTP client (#3110 ) Ensure User-Agent and originator are always sent.	2025-09-03 10:11:02 -07:00
Dominik Kundel	b127a3643f	Improve gpt-oss compatibility (#2461 ) The gpt-oss models require reasoning with subsequent Chat Completions requests because otherwise the model forgets why the tools were called. This change fixes that and also adds some additional missing documentation around how to handle context windows in Ollama and how to show the CoT if you desire to.	2025-09-02 19:49:03 -07:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Ahmed Ibrahim	431a10fc50	chore: unify history loading (#2736 ) We have two ways of loading conversation with a previous history. Fork conversation and the experimental resume that we had before. In this PR, I am unifying their code path. The path is getting the history items and recording them in a brand new conversation. This PR also constraint the rollout recorder responsibilities to be only recording to the disk and loading from the disk. The PR also fixes a current bug when we have two forking in a row: History 1: <Environment Context> UserMessage_1 UserMessage_2 UserMessage_3 Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> This shouldn't happen but because we were appending the `<Environment Context>` after each spawning and it's considered as _user message_. Now, we don't add this message if restoring and old conversation.	2025-09-02 22:44:29 +00:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Ahmed Ibrahim	2d2f66f9c5	Bug fix: deduplicate assistant messages (#2758 ) We are treating assistant messages in a different way than other messages which resulted in a duplicated history. See #2698	2025-08-27 01:29:16 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00
Ahmed Ibrahim	957d44918d	send-aggregated output (#2364 ) We want to send an aggregated output of stderr and stdout so we don't have to aggregate it stderr+stdout as we lose order sometimes. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-23 16:54:31 +00:00
Dylan	236c4f76a6	[apply_patch] freeform apply_patch tool (#2576 ) ## Summary GPT-5 introduced the concept of [custom tools](https://platform.openai.com/docs/guides/function-calling#custom-tools), which allow the model to send a raw string result back, simplifying json-escape issues. We are migrating gpt-5 to use this by default. However, gpt-oss models do not support custom tools, only normal functions. So we keep both tool definitions, and provide whichever one the model family supports. ## Testing - [x] Tested locally with various models - [x] Unit tests pass	2025-08-22 13:42:34 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
Dylan	e4c275d615	[apply-patch] Clean up apply-patch tool definitions (#2539 ) ## Summary We've experienced a bit of drift in system prompting for `apply_patch`: - As pointed out in #2030 , our prettier formatting started altering prompt.md in a few ways - We introduced a separate markdown file for apply_patch instructions in #993, but currently duplicate them in the prompt.md file - We added a first-class apply_patch tool in #2303, which has yet another definition This PR starts to consolidate our logic in a few ways: - We now only use `apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea) for system instructions - We no longer include apply_patch system instructions if the tool is specified I'm leaving the definition in openai_tools.rs as duplicated text for now because we're going to be iterated on the first-class tool soon. ## Testing - [x] Added integration tests to verify prompt stability - [x] Tested locally with several different models (gpt-5, gpt-oss, o4-mini)	2025-08-21 20:07:41 -07:00
Dylan	d2b2a6d13a	[prompt] xml-format EnvironmentContext (#2272 ) ## Summary Before we land #2243, let's start printing environment_context in our preferred format. This struct will evolve over time with new information, xml gives us a balance of human readable without too much parsing, llm readable, and extensible. Also moves us over to an Option-based struct, so we can easily provide diffs to the model. ## Testing - [x] Updated tests to reflect new format	2025-08-20 23:45:16 -07:00
eddy-win	050b9baeb6	Bridge command generation to powershell when on Windows (#2319 ) ## What? Why? How? - When running on Windows, codex often tries to invoke bash commands, which commonly fail (unless WSL is installed) - Fix: Detect if powershell is available and, if so, route commands to it - Also add a shell_name property to environmental context for codex to default to powershell commands when running in that environment ## Testing - Tested within WSL and powershell (e.g. get top 5 largest files within a folder and validated that commands generated were powershell commands) - Tested within Zsh - Updated unit tests --------- Co-authored-by: Eddy Escardo <eddy@openai.com>	2025-08-20 16:30:34 -07:00
Michael Bolin	ce434b1219	fix: prefer config var to env var (#2495 )	2025-08-20 04:51:59 +00:00
Michael Bolin	50c48e88f5	chore: upgrade to Rust 1.89 (#2465 ) Codex created this PR from the following prompt: > upgrade this entire repo to Rust 1.89. Note that this requires updating codex-rs/rust-toolchain.toml as well as the workflows in .github/. Make sure that things are "clippy clean" as this change will likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests` are sufficient to check for correctness Note this modifies a lot of lines because it folds nested `if` statements using `&&`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465). * #2467 * __->__ #2465	2025-08-19 13:22:02 -07:00
Ahmed Ibrahim	97f995a749	Show login options when not signed in with ChatGPT (#2440 ) Motivation: we have users who uses their API key although they want to use ChatGPT account. We want to give them the chance to always login with their account. This PR displays login options when the user is not signed in with ChatGPT. Even if you have set an OpenAI API key as an environment variable, you will still be prompted to log in with ChatGPT. We’ve also added a new flag, `always_use_api_key_signing` false by default, which ensures you are never asked to log in with ChatGPT and always defaults to using your API key. https://github.com/user-attachments/assets/b61ebfa9-3c5e-4ab7-bf94-395c23a0e0af After ChatGPT sign in: https://github.com/user-attachments/assets/d58b366b-c46a-428f-a22f-2ac230f991c0	2025-08-19 03:22:48 +00:00
Ahmed Ibrahim	ecb388045c	Add cache tests for UserTurn (#2432 )	2025-08-18 21:28:09 +00:00
Ahmed Ibrahim	c283f9f6ce	Add an operation to override current task context (#2431 ) - Added an operation to override current task context - Added a test to check that cache stays the same	2025-08-18 19:59:19 +00:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00

1 2 3

117 Commits