valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	cb2f952143	fix: remove unnecessary flush() calls (#2873 ) Because we are writing to a pipe, these `flush()` calls are unnecessary, so removing these saves us one syscall per write in these two cases.	2025-08-28 22:41:10 -07:00
Michael Bolin	970e466ab3	fix: switch to unbounded channel (#2874 ) #2747 encouraged me to audit our codebase for similar issues, as now I am particularly suspicious that our flaky tests are due to a racy deadlock. I asked Codex to audit our code, and one of its suggestions was this: > High-Risk Patterns > > All `send_` methods await on a bounded `mpsc::Sender<OutgoingMessage>`. If the writer blocks, the channel fills and the processor task blocks on send, stops draining incoming requests, and stdin reader eventually blocks on its send. This creates a backpressure deadlock cycle across the three tasks. > > Recommendations* > * Server outgoing path: break the backpressure cycle > * Option A (minimal risk): Change `OutgoingMessageSender` to use an unbounded channel to decouple producer from stdout. Add rate logging so floods are visible. > * Option B (bounded + drop policy): Change `send_` to try_send and drop messages (or coalesce) when the queue is full, logging a warning. This prevents processor stalls at the cost of losing messages under extreme backpressure. > Option C (two-stage buffer): Keep bounded channel, but have a dedicated “egress” task that drains an unbounded internal queue, writing to stdout with retries and a shutdown timeout. This centralizes backpressure policy. So this PR is Option A. Indeed, we previously used a bounded channel with a capacity of `128`, but as we discovered recently with #2776, there are certainly cases where we can get flooded with events. That said, `test_shell_command_approval_triggers_elicitation` just failed one one build when I put up this PR, so clearly we are not out of the woods yet... Update: I think I found the true source of the deadlock! See https://github.com/openai/codex/pull/2876	2025-08-28 22:20:10 -07:00
unship	f7cb2f87a0	Bug fix: clone of incoming_tx can lead to deadlock (#2747 ) POC code ```rust use tokio::sync::mpsc; use std::time::Duration; #[tokio::main] async fn main() { println!("=== Test 1: Simulating original MCP server pattern ==="); test_original_pattern().await; } async fn test_original_pattern() { println!("Testing the original pattern from MCP server..."); // Create channel - this simulates the original incoming_tx/incoming_rx let (tx, mut rx) = mpsc::channel::<String>(10); // Task 1: Simulates stdin reader that will naturally terminate let stdin_task = tokio::spawn({ let tx_clone = tx.clone(); async move { println!(" stdin_task: Started, will send 3 messages then exit"); for i in 0..3 { let msg = format!("Message {}", i); if tx_clone.send(msg.clone()).await.is_err() { println!(" stdin_task: Receiver dropped, exiting"); break; } println!(" stdin_task: Sent {}", msg); tokio::time::sleep(Duration::from_millis(300)).await; } println!(" stdin_task: Finished (simulating EOF)"); // tx_clone is dropped here } }); // Task 2: Simulates message processor let processor_task = tokio::spawn(async move { println!(" processor_task: Started, waiting for messages"); while let Some(msg) = rx.recv().await { println!(" processor_task: Processing {}", msg); tokio::time::sleep(Duration::from_millis(100)).await; } println!(" processor_task: Finished (channel closed)"); }); // Task 3: Simulates stdout writer or other background task let background_task = tokio::spawn(async move { for i in 0..2 { tokio::time::sleep(Duration::from_millis(500)).await; println!(" background_task: Tick {}", i); } println!(" background_task: Finished"); }); println!(" main: Original tx is still alive here"); println!(" main: About to call tokio::join! - will this deadlock?"); // This is the pattern from the original code let _ = tokio::join!(stdin_task, processor_task, background_task); } ``` --------- Co-authored-by: Michael Bolin <bolinfest@gmail.com>	2025-08-28 19:28:17 -07:00
Ahmed Ibrahim	9dbe7284d2	Following up on #2371 post commit feedback (#2852 ) - Introduce websearch end to complement the begin - Moves the logic of adding the sebsearch tool to create_tools_json_for_responses_api - Making it the client responsibility to toggle the tool on or off - Other misc in #2371 post commit feedback - Show the query: <img width="1392" height="151" alt="image" src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89" />	2025-08-28 19:24:38 -07:00
dedrisian-oai	b8e8454b3f	Custom /prompts (#2696 ) Adds custom `/prompts` to `~/.codex/prompts/<command>.md`. <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM" src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679" /> --- Details: 1. Adds `Op::ListCustomPrompts` to core. 2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt` (name, content). 3. TUI calls the operation on load, and populates the custom prompts (excluding prompts that collide with builtins). 4. Selecting the custom prompt automatically sends the prompt to the agent.	2025-08-29 02:16:39 +00:00
Michael Bolin	f09170b574	chore: print stderr from MCP server to test output using eprintln! (#2849 ) Related to https://github.com/openai/codex/pull/2848, I don't see the stderr from `codex mcp` colocated with the other stderr from `test_shell_command_approval_triggers_elicitation()` when it fails even though we have `RUST_LOG=debug` set when we spawn `codex mcp`: `1e9e703b96/codex-rs/mcp-server/tests/common/mcp_process.rs (L65)` Let's try this new logic which should be more explicit.	2025-08-28 12:43:13 -07:00
Michael Bolin	1e9e703b96	chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848 ) `test_shell_command_approval_triggers_elicitation()` is one of a number of integration tests that we have observed to be flaky on GitHub CI, so this PR tries to reduce the flakiness _and_ to provide us with more information when it flakes. Specifically: - Changed the command that we use to trigger the elicitation from `git init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'` because running `git` seems more likely to invite variance. - Increased the timeout to wait for the task response from 10s to 20s. - Added more logging.	2025-08-28 12:33:33 -07:00
Michael Bolin	74d2741729	chore: require uninlined_format_args from clippy (#2845 ) - added `uninlined_format_args` to `[workspace.lints.clippy]` in the `Cargo.toml` for the workspace - ran `cargo clippy --tests --fix` - ran `just fmt`	2025-08-28 11:25:23 -07:00
dedrisian-oai	4e9ad23864	Add "View Image" tool (#2723 ) Adds a "View Image" tool so Codex can find and see images by itself: <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40 04 AM" src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03" />	2025-08-27 17:41:23 -07:00
Dylan	0cec0770e2	[mcp-server] Add GetConfig endpoint (#2725 ) ## Summary Adds a GetConfig request to the MCP Protocol, so MCP clients can evaluate the resolved config.toml settings which the harness is using. ## Testing - [x] Added an end to end test of the endpoint	2025-08-27 09:59:03 -07:00
Ahmed Ibrahim	d0e06f74e2	send context window with task started (#2752 ) - Send context window with task started - Accounting for changing the model per turn	2025-08-27 00:04:21 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00
Reuben Narad	363636f5eb	Add web search tool (#2371 ) Adds web_search tool, enabling the model to use Responses API web_search tool. - Disabled by default, enabled by --search flag - When --search is passed, exposes web_search_request function tool to the model, which triggers user approval. When approved, the model can use the web_search tool for the remainder of the turn <img width="1033" height="294" alt="image" src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f" /> --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-23 22:58:56 -07:00
Ahmed Ibrahim	311ad0ce26	fork conversation from a previous message (#2575 ) This can be the underlying logic in order to start a conversation from a previous message. will need some love in the UI. Base for building this: #2588	2025-08-22 17:06:09 -07:00
Gabriel Peal	697c7cf4bf	Fix flakiness in shell command approval test (#2547 ) ## Summary - read the shell exec approval request's actual id instead of assuming it is always 0 - use that id when validating and responding in the test ## Testing - `cargo test -p codex-mcp-server test_shell_command_approval_triggers_elicitation` ------ https://chatgpt.com/codex/tasks/task_i_68a6ab9c732c832c81522cbf11812be0	2025-08-22 18:46:35 -04:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Eric Traut	dacff9675a	Added new auth-related methods and events to mcp server (#2496 ) This PR adds the following: * A getAuthStatus method on the mcp server. This returns the auth method currently in use (chatgpt or apikey) or none if the user is not authenticated. It also returns the "preferred auth method" which reflects the `preferred_auth_method` value in the config. * A logout method on the mcp server. If called, it logs out the user and deletes the `auth.json` file — the same behavior in the cli's `/logout` command. * An `authStatusChange` event notification that is sent when the auth status changes due to successful login or logout operations. * Logic to pass command-line config overrides to the mcp server at startup time. This allows use cases like `codex mcp -c preferred_auth_method=apikey`.	2025-08-20 20:36:34 -07:00
Gabriel Peal	77148a5c61	Diff command (#2476 )	2025-08-19 22:50:28 -04:00
Ahmed Ibrahim	e91c3d6d1c	Support changing reasoning effort (#2435 ) https://github.com/user-attachments/assets/50198ee8-5915-47a3-bb71-69af65add1ef Building up on #2431 #2428	2025-08-19 17:55:07 +00:00
Dylan	e7e5fe91c8	[tui] Support /mcp command (#2430 ) ## Summary Adds a `/mcp` command to list active tools. We can extend this command to allow configuration of MCP tools, but for now a simple list command will help debug if your config.toml and your tools are working as expected.	2025-08-19 09:00:31 -07:00
Michael Bolin	2aad3a13b8	fix: remove shutdown_flag param to run_login_server() (#2399 ) In practice, this was always passed in as `None`, so eliminated the param and updated all the call sites. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2399). * __->__ #2399 * #2398 * #2396 * #2395 * #2394 * #2393 * #2389	2025-08-19 01:15:50 +00:00
Michael Bolin	d5b42ba1ac	fix: make ShutdownHandle a private field of LoginServer (#2396 ) Folds the top-level `shutdown()` function into a method of `ShutdownHandle` and then simply stores `ShutdownHandle` on `LoginServer` since the two fields it contains were always being used together, anyway. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2396). * #2399 * #2398 * __->__ #2396 * #2395 * #2394 * #2393 * #2389	2025-08-19 00:57:04 +00:00
Michael Bolin	7f21634165	fix: eliminate ServerOptions.login_timeout and have caller use tokio::time::timeout() instead (#2395 ) https://github.com/openai/codex/pull/2373 introduced `ServerOptions.login_timeout` and `spawn_timeout_watcher()` to use an extra thread to manage the timeout for the login server. Now that we have asyncified the login stack, we can use `tokio::time::timeout()` from "outside" the login library to manage the timeout rather than having to a commit to a specific "timeout" concept from within. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2395). * #2399 * #2398 * #2396 * __->__ #2395 * #2394 * #2393 * #2389	2025-08-19 00:49:13 +00:00
Michael Bolin	6e8c055fd5	fix: async-ify login flow (#2393 ) This replaces blocking I/O with async/non-blocking I/O in a number of cases. This facilitates the use of `tokio::sync::Notify` and `tokio::select!` in #2394. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2393). * #2399 * #2398 * #2396 * #2395 * #2394 * __->__ #2393 * #2389	2025-08-18 17:23:40 -07:00
Michael Bolin	712bfa04ac	chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423 ) The existing `wire_format.rs` should share more types with the `codex-protocol` crate (like `AskForApproval` instead of maintaining a parallel `CodexToolCallApprovalPolicy` enum), so this PR moves `wire_format.rs` into `codex-protocol`, renaming it as `mcp-protocol.rs`. We also de-dupe types, where appropriate. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423). * #2424 * __->__ #2423	2025-08-18 09:36:57 -07:00
Michael Bolin	a269754668	remove mcp-server/src/mcp_protocol.rs and the code that depends on it (#2360 )	2025-08-18 00:29:18 -07:00
Michael Bolin	b581498882	fix: introduce EventMsg::TurnAborted (#2365 ) Introduces `EventMsg::TurnAborted` that should be sent in response to `Op::Interrupt`. In the MCP server, updates the handling of a `ClientRequest::InterruptConversation` request such that it sends the `Op::Interrupt` but does not respond to the request until it sees an `EventMsg::TurnAborted`.	2025-08-17 21:40:31 -07:00
Eric Traut	350b00d54b	Added MCP server command to enable authentication using ChatGPT (#2373 ) This PR adds two new APIs for the MCP server: 1) loginChatGpt, and 2) cancelLoginChatGpt. The first starts a login server and returns a local URL that allows for browser-based authentication, and the second provides a way to cancel the login attempt. If the login attempt succeeds, a notification (in the form of an event) is sent to a subscriber. I also added a timeout mechanism for the existing login server. The loginChatGpt code path uses a 10-minute timeout by default, so if the user fails to complete the login flow in that timeframe, the login server automatically shuts down. I tested the timeout code by manually setting the timeout to a much lower number and confirming that it works as expected when used e2e.	2025-08-17 10:03:52 -07:00
Michael Bolin	d262244725	fix: introduce codex-protocol crate (#2355 )	2025-08-15 12:44:40 -07:00
Michael Bolin	eda50d8372	feat: introduce ClientRequest::SendUserTurn (#2345 ) This adds a new request type, `SendUserTurn`, that makes it possible to submit a `Op::UserTurn` operation (introduced in #2329) to a conversation. This PR also adds a new integration test that verifies that changing from `AskForApproval::UnlessTrusted` to `AskForApproval::Never` mid-conversation ensures that an elicitation is no longer sent for running `python3 -c print(42)`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * __->__ #2345 * #2329 * #2343 * #2340 * #2338	2025-08-15 10:05:58 -07:00
Michael Bolin	265fd89e31	fix: try to fix flakiness in test_shell_command_approval_triggers_elicitation (#2344 ) I still see flakiness in `test_shell_command_approval_triggers_elicitation()` on occasion where `MockServer` claims it has not received all of its expected requests. I recently introduced a similar type of test in #2264, `test_codex_jsonrpc_conversation_flow()`, which I have not seen flake (yet!), so this PR pulls over two things I did in that test: - increased `worker_threads` from `2` to `4` - added an assertion to make sure the `task_complete` notification is received Honestly, I'm still not sure why `MockServer` claims it sometimes does not receive all its expected requests given that we assert that the final `JSONRPCResponse` is read on the stream, but let's give this a shot. Assuming this fixes things, my hypothesis is that the increase in `worker_threads` helps because perhaps there are async tasks in `MockServer` that do not reliably complete fully when there are not enough threads available? If that is correct, it seems like the test would still be flaky, though perhaps with lower frequency?	2025-08-15 09:17:20 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Michael Bolin	6a0f709cff	fix: add call_id to ApprovalParams in mcp-server/src/wire_format.rs (#2322 ) Clients still need this field.	2025-08-14 16:09:12 -07:00
Gabriel Peal	cdd33b2c04	Tag InputItem (#2304 ) Instead of: ``` { Text: { text: string } } ``` It is now: ``` { type: "text", data: { text: string } } ``` which makes for cleaner discriminated unions	2025-08-14 17:58:04 +00:00
Michael Bolin	f968a1327a	feat: add support for an InterruptConversation request (#2287 ) This adds `ClientRequest::InterruptConversation`, which effectively maps directly to `Op::Interrupt`. --- * __->__ #2287 * #2286 * #2285	2025-08-13 23:12:03 -07:00
Michael Bolin	539f4b290e	fix: add support for exec and apply_patch approvals in the new wire format (#2286 ) Now when `CodexMessageProcessor` receives either a `EventMsg::ApplyPatchApprovalRequest` or a `EventMsg::ExecApprovalRequest`, it sends the appropriate request from the server to the client. When it gets a response, it forwards it on to the `CodexConversation`. Note this takes a lot of code from: https://github.com/openai/codex/blob/main/codex-rs/mcp-server/src/conversation_loop.rs https://github.com/openai/codex/blob/main/codex-rs/mcp-server/src/exec_approval.rs https://github.com/openai/codex/blob/main/codex-rs/mcp-server/src/patch_approval.rs I am copy/pasting for now because I am trying to consolidate around the new `wire_format.rs`, so I plan to delete these other files soon. Now that we have requests going both from client-to-server and server-to-client, I renamed `CodexRequest` to `ClientRequest`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2286). * #2287 * __->__ #2286 * #2285	2025-08-13 23:00:50 -07:00
Michael Bolin	a62510e0ae	fix: verify notifications are sent with the conversationId set (#2278 ) This updates `CodexMessageProcessor` so that each notification it sends for a `EventMsg` from a `CodexConversation` such that: - The `params` always has an appropriate `conversationId` field. - The `method` is now includes the name of the `EventMsg` type rather than using `codex/event` as the `method` type for all notifications. (We currently prefix the method name with `codex/event/`, but I think that should go away once we formalize the notification schema in `wire_format.rs`.) As part of this, we update `test_codex_jsonrpc_conversation_flow()` to verify that the `task_finished` notification has made it through the system instead of sleeping for 5s and "hoping" the server finished processing the task. Note we have seen some flakiness in some of our other, similar integration tests, and I expect adding a similar check would help in those cases, as well.	2025-08-13 17:54:12 -07:00
Michael Bolin	e7bad650ff	feat: support traditional JSON-RPC request/response in MCP server (#2264 ) This introduces a new set of request types that our `codex mcp` supports. Note that these do not conform to MCP tool calls so that instead of having to send something like this: ```json { "jsonrpc": "2.0", "method": "tools/call", "id": 42, "params": { "name": "newConversation", "arguments": { "model": "gpt-5", "approvalPolicy": "on-request" } } } ``` we can send something like this: ```json { "jsonrpc": "2.0", "method": "newConversation", "id": 42, "params": { "model": "gpt-5", "approvalPolicy": "on-request" } } ``` Admittedly, this new format is not a valid MCP tool call, but we are OK with that right now. (That is, not everything we might want to request of `codex mcp` is something that is appropriate for an autonomous agent to do.) To start, this introduces four request types: - `newConversation` - `sendUserMessage` - `addConversationListener` - `removeConversationListener` The new `mcp-server/tests/codex_message_processor_flow.rs` shows how these can be used. The types are defined on the `CodexRequest` enum, so we introduce a new `CodexMessageProcessor` that is responsible for dealing with requests from this enum. The top-level `MessageProcessor` has been updated so that when `process_request()` is called, it first checks whether the request conforms to `CodexRequest` and dispatches it to `CodexMessageProcessor` if so. Note that I also decided to use `camelCase` for the on-the-wire format, as that seems to be the convention for MCP. For the moment, the new protocol is defined in `wire_format.rs` within the `mcp-server` crate, but in a subsequent PR, I will probably move it to its own crate to ensure the protocol has minimal dependencies and that we can codegen a schema from it. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2264). * #2278 * __->__ #2264	2025-08-13 17:36:29 -07:00
Michael Bolin	37fc4185ef	fix: update `OutgoingMessageSender::send_response()` to take `Serialize` (#2263 ) This makes `send_response()` easier to work with.	2025-08-13 14:29:13 -07:00
Michael Bolin	08ed618f72	chore: introduce ConversationManager as a clearinghouse for all conversations (#2240 ) This PR does two things because after I got deep into the first one I started pulling on the thread to the second: - Makes `ConversationManager` the place where all in-memory conversations are created and stored. Previously, `MessageProcessor` in the `codex-mcp-server` crate was doing this via its `session_map`, but this is something that should be done in `codex-core`. - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded throughout our code. I think this made sense at one time, but now that we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event, I don't think this was quite right, so I removed it. For `codex exec` and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we no longer make `Notify` a field of `Codex` or `CodexConversation`. Changes of note: - Adds the files `conversation_manager.rs` and `codex_conversation.rs` to `codex-core`. - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`: other crates must use `CodexConversation` instead (which is created via `ConversationManager`). - `core/src/codex_wrapper.rs` has been deleted in favor of `ConversationManager`. - `ConversationManager::new_conversation()` returns `NewConversation`, which is in line with the `new_conversation` tool we want to add to the MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so we eliminate checks in cases like `codex-rs/core/tests/client.rs` to verify `SessionConfiguredEvent` is the first event because that is now internal to `ConversationManager`. - Quite a bit of code was deleted from `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to manage multiple conversations itself: it goes through `ConversationManager` instead. - `core/tests/live_agent.rs` has been deleted because I had to update a bunch of tests and all the tests in here were ignored, and I don't think anyone ever ran them, so this was just technical debt, at this point. - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I hope to refactor the blandly-named `util.rs` into more descriptive files). - In general, I started replacing local variables named `codex` as `conversation`, where appropriate, though admittedly I didn't do it through all the integration tests because that would have added a lot of noise to this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240). * #2264 * #2263 * __->__ #2240	2025-08-13 13:38:18 -07:00
easong-openai	6340acd885	Re-add markdown streaming (#2029 ) Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.	2025-08-12 17:37:28 -07:00
Dylan	c6b46fe220	[mcp-server] Support CodexToolCallApprovalPolicy::OnRequest (#2187 ) ## Summary #1865 added `AskForApproval::OnRequest`, but missed adding it to our custom struct in `mcp-server`. This adds the missing configuration ## Testing - [x] confirmed locally	2025-08-11 11:38:47 -07:00
Gabriel Peal	7f6408720b	[1/3] Parse exec commands and format them more nicely in the UI (#2095 ) # Note for reviewers The bulk of this PR is in in the new file, `parse_command.rs`. This file is designed to be written TDD and implemented with Codex. Do not worry about reviewing the code, just review the unit tests (if you want). If any cases are missing, we'll add more tests and have Codex fix them. I think the best approach will be to land and iterate. I have some follow-ups I want to do after this lands. The next PR after this will let us merge (and dedupe) multiple sequential cells of the same such as multiple read commands. The deduping will also be important because the model often reads the same file multiple times in a row in chunks === This PR formats common commands like reading, formatting, testing, etc more nicely: It tries to extract things like file names, tests and falls back to the cmd if it doesn't. It also only shows stdout/err if the command failed. <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15" src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e" /> <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32" src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9" /> <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2" src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f" /> Part 2: https://github.com/openai/codex/pull/2097 Part 3: https://github.com/openai/codex/pull/2110	2025-08-11 14:26:15 -04:00
Dylan	dc468d563f	[env] Remove git config for now (#1884 ) ## Summary Forgot to remove this in #1869 last night! Too much of a performance hit on the main thread. We can bring it back via an async thread on startup.	2025-08-06 08:05:17 -07:00
Dylan	3e8bcf0247	[prompts] Add <environment_context> (#1869 ) ## Summary Includes a new user message in the api payload which provides useful environment context for the model, so it knows about things like the current working directory and the sandbox. ## Testing Updated unit tests	2025-08-06 01:13:31 -07:00
Dylan	cda39e417f	[tests] Investigate flakey mcp-server test (#1877 ) ## Summary Have seen these tests flaking over the course of today on different boxes. `wiremock` seems to be generally written with tokio/threads in mind but based on the weird panics from the tests, let's see if this helps.	2025-08-06 00:07:58 -07:00
Michael Bolin	42bd73e150	chore: remove unnecessary default_ prefix (#1854 ) This prefix is not inline with the other fields on the `ConfigOverrides` struct.	2025-08-05 14:42:49 -07:00
easong-openai	9285350842	Introduce `--oss` flag to use gpt-oss models (#1848 ) This adds support for easily running Codex backed by a local Ollama instance running our new open source models. See https://github.com/openai/gpt-oss for details. If you pass in `--oss` you'll be prompted to install/launch ollama, and it will automatically download the 20b model and attempt to use it. We'll likely want to expand this with some options later to make the experience smoother for users who can't run the 20b or want to run the 120b. Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-05 11:31:11 -07:00

1 2 3

113 Commits