valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	2a76a08a9e	fix: include rollout_path in NewConversationResponse (#3352 ) Adding the `rollout_path` to the `NewConversationResponse` makes it so a client can perform subsequent operations on a `(ConversationId, PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352). * #3353 * __->__ #3352	2025-09-09 00:11:48 -07:00
jif-oai	62bd0e3d9d	feat: POSIX unification and snapshot sessions (#3179 ) ## Session snapshot For POSIX shell, the goal is to take a snapshot of the interactive shell environment, store it in a session file located in `.codex/` and only source this file for every command that is run. As a result, if a snapshot files exist, `bash -lc <CALL>` get replaced by `bash -c <CALL>`. This also fixes the issue that `bash -lc` does not source `.bashrc`, resulting in missing env variables and aliases in the codex session. ## POSIX unification Unify `bash` and `zsh` shell into a POSIX shell. The rational is that the tool will not use any `zsh` specific capabilities. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-09-08 18:09:45 -07:00
Gabriel Peal	5eaaf307e1	Generate more typescript types and return conversation id with ConversationSummary (#3219 ) This PR does multiple things that are necessary for conversation resume to work from the extension. I wanted to make sure everything worked so these changes wound up in one PR: 1. Generate more ts types 2. Resume rollout history files rather than create a new one every time it is resumed so you don't see a duplicate conversation in history for every resume. Chatted with @aibrahim-oai to verify this 3. Return conversation_id in conversation summaries 4. [Cleanup] Use serde and strong types for a lot of the rollout file parsing	2025-09-08 17:54:47 -04:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	0269096229	Move token usage/context information to session level (#3221 ) Move context information into the main loop so it can be used to interrupt the loop or start auto-compaction.	2025-09-06 15:19:23 +00:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Michael Bolin	bd4fa85507	fix: add callback to map before sending request to fix race condition (#3146 ) Last week, I thought I found the smoking gun in our flaky integration tests where holding these locks could have led to potential deadlock: - https://github.com/openai/codex/pull/2876 - https://github.com/openai/codex/pull/2878 Yet even after those PRs went in, we continued to see flakinees in our integration tests! Though with the additional logging added as part of debugging those tests, I now saw things like: ``` read message from stdout: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) }) notification: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) }) read message from stdout: Request(JSONRPCRequest { id: Integer(0), jsonrpc: "2.0", method: "execCommandApproval", params: Some(Object {"conversation_id": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}) }) writing message to stdin: Response(JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} }) in read_stream_until_notification_message(codex/event/task_complete) [mcp stderr] 2025-09-04T00:00:59.738585Z INFO codex_mcp_server::message_processor: <- response: JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} } [mcp stderr] 2025-09-04T00:00:59.738740Z DEBUG codex_core::codex: Submission sub=Submission { id: "1", op: ExecApproval { id: "0", decision: Approved } } [mcp stderr] 2025-09-04T00:00:59.738832Z WARN codex_core::codex: No pending approval found for sub_id: 0 ``` That is, a response was sent for a request, but no callback was in place to handle the response! This time, I think I may have found the underlying issue (though the fixes for holding locks for too long may have also been part of it), which is I found cases where we were sending the request: `234c0a0469/codex-rs/core/src/codex.rs (L597)` before inserting the `Sender` into the `pending_approvals` map (which has to wait on acquiring a mutex): `234c0a0469/codex-rs/core/src/codex.rs (L598-L601)` so it is possible the request could go out and the client could respond before `pending_approvals` was updated! Note this was happening in both `request_command_approval()` and `request_patch_approval()`, which maps to the sorts of errors we have been seeing when these integration tests have been flaking on us. While here, I am also adding some extra logging that prints if inserting into `pending_approvals` overwrites an entry as opposed to purely inserting one. Today, a conversation can have only one pending request at a time, but as we are planning to support parallel tool calls, this invariant may not continue to hold, in which case we need to revisit this abstraction.	2025-09-04 07:38:28 -07:00
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Ahmed Ibrahim	431a10fc50	chore: unify history loading (#2736 ) We have two ways of loading conversation with a previous history. Fork conversation and the experimental resume that we had before. In this PR, I am unifying their code path. The path is getting the history items and recording them in a brand new conversation. This PR also constraint the rollout recorder responsibilities to be only recording to the disk and loading from the disk. The PR also fixes a current bug when we have two forking in a row: History 1: <Environment Context> UserMessage_1 UserMessage_2 UserMessage_3 Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> Fork with n = 1 (only remove one element) History 2: <Environment Context> UserMessage_1 UserMessage_2 <Environment Context> This shouldn't happen but because we were appending the `<Environment Context>` after each spawning and it's considered as _user message_. Now, we don't add this message if restoring and old conversation.	2025-09-02 22:44:29 +00:00
Michael Bolin	c988ce28fe	fix: drop Mutex before calling tx_approve.send() (#2876 )	2025-08-28 22:49:29 -07:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Ahmed Ibrahim	ed06f90fb3	Race condition in compact (#2746 ) This fixes the flakiness in `summarize_context_three_requests_and_instructions` because we should trim history before sending task complete.	2025-08-28 12:53:00 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Ahmed Ibrahim	2d2f66f9c5	Bug fix: deduplicate assistant messages (#2758 ) We are treating assistant messages in a different way than other messages which resulted in a duplicated history. See #2698	2025-08-27 01:29:16 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Ahmed Ibrahim	3eb11c10d0	Don't send Exec deltas on apply patch (#2742 ) We are now sending exec deltas on apply patch which doesn't make sense.	2025-08-26 19:16:51 -07:00
Gabriel Peal	cb32f9c64e	Add auth to send_user_turn (#2688 ) It is there for send_user_message but was omitted from send_user_turn. Presumably this was a mistake	2025-08-25 18:57:20 -04:00
Odysseas Yiakoumis	a6c346b9e1	avoid error when /compact response has no token_usage (#2417 ) (#2640 ) Context When running `/compact`, `drain_to_completed` would throw an error if `token_usage` was `None` in `ResponseEvent::Completed`. This made the command fail even though everything else had succeeded. What changed - Instead of erroring, we now just check `if let Some(token_usage)` before sending the event. - If it’s missing, we skip it and move on. Why This makes `AgentTask::compact()` behave in the same way as `AgentTask::spawn()`, which also doesn’t error out when `token_usage` isn’t available. Keeps things consistent and avoids unnecessary failures. Fixes Closes #2417 --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-25 18:42:22 +00:00
Michael Bolin	295ca27e98	fix: Scope ExecSessionManager to Session instead of using global singleton (#2664 ) The `SessionManager` in `exec_command` owns a number of `ExecCommandSession` objects where `ExecCommandSession` has a non-trivial implementation of `Drop`, so we want to be able to drop an individual `SessionManager` to help ensure things get cleaned up in a timely fashion. To that end, we should have one `SessionManager` per session rather than one global one for the lifetime of the CLI process.	2025-08-24 22:52:49 -07:00
Michael Bolin	7b20db942a	fix: build is broken on main; introduce ToolsConfigParams to help fix (#2663 ) `ToolsConfig::new()` taking a large number of boolean params was hard to manage and it finally bit us (see https://github.com/openai/codex/pull/2660). This changes `ToolsConfig::new()` so that it takes a struct (and also reduces the visibility of some members, where possible).	2025-08-24 22:43:42 -07:00
Reuben Narad	363636f5eb	Add web search tool (#2371 ) Adds web_search tool, enabling the model to use Responses API web_search tool. - Disabled by default, enabled by --search flag - When --search is passed, exposes web_search_request function tool to the model, which triggers user approval. When approved, the model can use the web_search tool for the remainder of the turn <img width="1033" height="294" alt="image" src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f" /> --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-23 22:58:56 -07:00
Ahmed Ibrahim	957d44918d	send-aggregated output (#2364 ) We want to send an aggregated output of stderr and stdout so we don't have to aggregate it stderr+stdout as we lose order sometimes. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-23 16:54:31 +00:00
Michael Bolin	e3b03eaccb	feat: StreamableShell with exec_command and write_stdin tools (#2574 )	2025-08-22 18:10:55 -07:00
Ahmed Ibrahim	311ad0ce26	fork conversation from a previous message (#2575 ) This can be the underlying logic in order to start a conversation from a previous message. will need some love in the UI. Base for building this: #2588	2025-08-22 17:06:09 -07:00
Jeremy Rose	d994019f3f	tui: coalesce command output; show unabridged commands in transcript (#2590 ) https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b	2025-08-22 16:32:31 -07:00
Ahmed Ibrahim	097782c775	Move models.rs to protocol (#2595 ) Moving models.rs to protocol so we can use them in `Codex` operations	2025-08-22 22:18:54 +00:00
Michael Bolin	8ba8089592	fix: prefer sending MCP structuredContent as the function call response, if available (#2594 ) Prior to this change, when we got a `CallToolResult` from an MCP server, we JSON-serialized its `content` field as the `content` to send back to the model as part of the function call output that we send back to the model. This meant that we were dropping the `structuredContent` on the floor. Though reading https://modelcontextprotocol.io/specification/2025-06-18/schema#tool, it appears that if `outputSchema` is specified, then `structuredContent` should be set, which seems to be a "higher-fidelity" response to the function call. This PR updates our handling of `CallToolResult` to prefer using the JSON-serialization of `structuredContent`, if present, using `content` as a fallback. Also, it appears that the sense of `success` was inverted prior to this PR!	2025-08-22 14:10:18 -07:00
Dylan	236c4f76a6	[apply_patch] freeform apply_patch tool (#2576 ) ## Summary GPT-5 introduced the concept of [custom tools](https://platform.openai.com/docs/guides/function-calling#custom-tools), which allow the model to send a raw string result back, simplifying json-escape issues. We are migrating gpt-5 to use this by default. However, gpt-oss models do not support custom tools, only normal functions. So we keep both tool definitions, and provide whichever one the model family supports. ## Testing - [x] Tested locally with various models - [x] Unit tests pass	2025-08-22 13:42:34 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Dylan	d2b2a6d13a	[prompt] xml-format EnvironmentContext (#2272 ) ## Summary Before we land #2243, let's start printing environment_context in our preferred format. This struct will evolve over time with new information, xml gives us a balance of human readable without too much parsing, llm readable, and extensible. Also moves us over to an Option-based struct, so we can easily provide diffs to the model. ## Testing - [x] Updated tests to reflect new format	2025-08-20 23:45:16 -07:00
eddy-win	050b9baeb6	Bridge command generation to powershell when on Windows (#2319 ) ## What? Why? How? - When running on Windows, codex often tries to invoke bash commands, which commonly fail (unless WSL is installed) - Fix: Detect if powershell is available and, if so, route commands to it - Also add a shell_name property to environmental context for codex to default to powershell commands when running in that environment ## Testing - Tested within WSL and powershell (e.g. get top 5 largest files within a folder and validated that commands generated were powershell commands) - Tested within Zsh - Updated unit tests --------- Co-authored-by: Eddy Escardo <eddy@openai.com>	2025-08-20 16:30:34 -07:00
Michael Bolin	50c48e88f5	chore: upgrade to Rust 1.89 (#2465 ) Codex created this PR from the following prompt: > upgrade this entire repo to Rust 1.89. Note that this requires updating codex-rs/rust-toolchain.toml as well as the workflows in .github/. Make sure that things are "clippy clean" as this change will likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests` are sufficient to check for correctness Note this modifies a lot of lines because it folds nested `if` statements using `&&`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465). * #2467 * __->__ #2465	2025-08-19 13:22:02 -07:00
Dylan	e7e5fe91c8	[tui] Support /mcp command (#2430 ) ## Summary Adds a `/mcp` command to list active tools. We can extend this command to allow configuration of MCP tools, but for now a simple list command will help debug if your config.toml and your tools are working as expected.	2025-08-19 09:00:31 -07:00
Ahmed Ibrahim	c283f9f6ce	Add an operation to override current task context (#2431 ) - Added an operation to override current task context - Added a test to check that cache stays the same	2025-08-18 19:59:19 +00:00
Ahmed Ibrahim	c9963b52e9	consolidate reasoning enums into one (#2428 ) We have three enums for each of reasoning summaries and reasoning effort with same values. They can be consolidated into one.	2025-08-18 11:50:17 -07:00
Michael Bolin	b581498882	fix: introduce EventMsg::TurnAborted (#2365 ) Introduces `EventMsg::TurnAborted` that should be sent in response to `Op::Interrupt`. In the MCP server, updates the handling of a `ClientRequest::InterruptConversation` request such that it sends the `Op::Interrupt` but does not respond to the request until it sees an `EventMsg::TurnAborted`.	2025-08-17 21:40:31 -07:00
Michael Bolin	d262244725	fix: introduce codex-protocol crate (#2355 )	2025-08-15 12:44:40 -07:00
Michael Bolin	17aa394ae7	feat: introduce Op:UserTurn (#2329 ) This introduces `Op::UserTurn`, which makes it possible to override many of the fields that were set when the `Session` was originally created when creating a new conversation turn. This is one way we could support changing things like `model` or `cwd` in the middle of the conversation, though we may want to consider making each field optional, or alternatively having a separate `Op` that mutates the `TurnContext` associated with a `submission_loop()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2329). * #2345 * __->__ #2329 * #2343 * #2340 * #2338	2025-08-15 09:56:05 -07:00
Michael Bolin	13ed67cfc1	feat: introduce TurnContext (#2343 ) This PR introduces `TurnContext`, which is designed to hold a set of fields that should be constant for a turn of a conversation. Note that the fields of `TurnContext` were previously governed by `Session`. Ultimately, we want to enable users to change these values between turns (changing model, approval policy, etc.), though in the current implementation, the `TurnContext` is constant for the entire conversation. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * #2345 * #2329 * __->__ #2343 * #2340 * #2338	2025-08-15 09:40:02 -07:00
Michael Bolin	6730592433	fix: introduce MutexExt::lock_unchecked() so we stop ignoring unwrap() throughout codex.rs (#2340 ) This way we are sure a dangerous `unwrap()` does not sneak in! --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2340). * #2345 * #2329 * #2343 * __->__ #2340 * #2338	2025-08-15 09:14:44 -07:00
Michael Bolin	26c8373821	fix: tighten up checks against writable folders for SandboxPolicy (#2338 ) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338	2025-08-15 09:06:15 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Michael Bolin	8f11652458	fix: parallelize logic in Session::new() (#2305 ) #2291 made it so that `Session::new()` is on the critical path to `Codex::spawn()`, which means it is on the hot path to CLI startup. This refactors `Session::new()` to run a number of async tasks in parallel that were previously run serially to try to reduce latency.	2025-08-14 13:29:58 -07:00
Dylan	544980c008	[context] Store context messages in rollouts (#2243 ) ## Summary Currently, we use request-time logic to determine the user_instructions and environment_context messages. This means that neither of these values can change over time as conversations go on. We want to add in additional details here, so we're migrating these to save these messages to the rollout file instead. This is simpler for the client, and allows us to append additional environment_context messages to each turn if we want ## Testing - [x] Integration test coverage - [x] Tested locally with a few turns, confirmed model could reference environment context and cached token metrics were reasonably high	2025-08-14 14:51:13 -04:00
Michael Bolin	cf7a7e63a3	exploration: create Session as part of Codex::spawn() (#2291 ) Historically, `Codex::spawn()` would create the instance of `Codex` and enforce, by construction, that `Op::ConfigureSession` was the first `Op` submitted via `submit()`. Then over in `submission_loop()`, it would handle the case for taking the parameters of `Op::ConfigureSession` and turning it into a `Session`. This approach has two challenges from a state management perspective: `f968a1327a/codex-rs/core/src/codex.rs (L718)` - The local `sess` variable in `submission_loop()` has to be `mut` and `Option<Arc<Session>>` because it is not invariant that a `Session` is present for the lifetime of the loop, so there is a lot of logic to deal with the case where `sess` is `None` (e.g., the `send_no_session_event` function and all of its callsites). - `submission_loop()` is written in such a way that `Op::ConfigureSession` could be observed multiple times, but in practice, it is only observed exactly once at the start of the loop. In this PR, we try to simplify the state management by _removing_ the `Op::ConfigureSession` enum variant and constructing the `Session` as part of `Codex::spawn()` so that it can be passed to `submission_loop()` as `Arc<Session>`. The original logic from the `Op::ConfigureSession` has largely been moved to the new `Session::new()` constructor. --- Incidentally, I also noticed that the handling of `Op::ConfigureSession` can result in events being dispatched in addition to `EventMsg::SessionConfigured`, as an `EventMsg::Error` is created for every MCP initialization error, so it is important to preserve that behavior: `f968a1327a/codex-rs/core/src/codex.rs (L901-L916)` Though admittedly, I believe this does not play nice with #2264, as these error messages will likely be dispatched before the client has a chance to call `addConversationListener`, so we likely need to make it so `newConversation` automatically creates the subscription, but we must also guarantee that the "ack" from `newConversation` is returned before any other conversation-related notifications are sent so the client knows what `conversation_id` to match on.	2025-08-14 09:55:28 -07:00

1 2 3

144 Commits