valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	37fc4185ef	fix: update `OutgoingMessageSender::send_response()` to take `Serialize` (#2263 ) This makes `send_response()` easier to work with.	2025-08-13 14:29:13 -07:00
Michael Bolin	08ed618f72	chore: introduce ConversationManager as a clearinghouse for all conversations (#2240 ) This PR does two things because after I got deep into the first one I started pulling on the thread to the second: - Makes `ConversationManager` the place where all in-memory conversations are created and stored. Previously, `MessageProcessor` in the `codex-mcp-server` crate was doing this via its `session_map`, but this is something that should be done in `codex-core`. - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded throughout our code. I think this made sense at one time, but now that we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event, I don't think this was quite right, so I removed it. For `codex exec` and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we no longer make `Notify` a field of `Codex` or `CodexConversation`. Changes of note: - Adds the files `conversation_manager.rs` and `codex_conversation.rs` to `codex-core`. - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`: other crates must use `CodexConversation` instead (which is created via `ConversationManager`). - `core/src/codex_wrapper.rs` has been deleted in favor of `ConversationManager`. - `ConversationManager::new_conversation()` returns `NewConversation`, which is in line with the `new_conversation` tool we want to add to the MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so we eliminate checks in cases like `codex-rs/core/tests/client.rs` to verify `SessionConfiguredEvent` is the first event because that is now internal to `ConversationManager`. - Quite a bit of code was deleted from `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to manage multiple conversations itself: it goes through `ConversationManager` instead. - `core/tests/live_agent.rs` has been deleted because I had to update a bunch of tests and all the tests in here were ignored, and I don't think anyone ever ran them, so this was just technical debt, at this point. - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I hope to refactor the blandly-named `util.rs` into more descriptive files). - In general, I started replacing local variables named `codex` as `conversation`, where appropriate, though admittedly I didn't do it through all the integration tests because that would have added a lot of noise to this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240). * #2264 * #2263 * __->__ #2240	2025-08-13 13:38:18 -07:00
easong-openai	6340acd885	Re-add markdown streaming (#2029 ) Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.	2025-08-12 17:37:28 -07:00
easong-openai	e0303dbac0	Rescue chat completion changes (#1846 ) https://github.com/openai/codex/pull/1835 has some messed up history. This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-05 08:56:13 +00:00
Ahmed Ibrahim	e38ce39c51	Revert to `3f13ebce10` without rewriting history. Wrong merge	2025-08-04 17:03:24 -07:00
Ahmed Ibrahim	bd171e5206	add raw reasoning	2025-08-04 16:49:42 -07:00
Gabriel Peal	1f3318c1c5	Add a TurnDiffTracker to create a unified diff for an entire turn (#1770 ) This lets us show an accumulating diff across all patches in a turn. Refer to the docs for TurnDiffTracker for implementation details. There are multiple ways this could have been done and this felt like the right tradeoff between reliability and completeness: Pros * It will pick up all changes to files that the model touched including if they prettier or another command that updates them. * It will not pick up changes made by the user or other agents to files it didn't modify. Cons * It will pick up changes that the user made to a file that the model also touched * It will not pick up changes to codegen or files that were not modified with apply_patch	2025-08-04 11:57:04 -04:00
aibrahim-oai	f20de21cb6	collabse `stdout` and `stderr` delta events into one (#1787 )	2025-08-01 14:00:19 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
Gabriel Peal	8828f6f082	Add an experimental plan tool (#1726 ) This adds a tool the model can call to update a plan. The tool doesn't actually _do_ anything but it gives clients a chance to read and render the structured plan. We will likely iterate on the prompt and tools exposed for planning over time.	2025-07-29 14:22:02 -04:00
Dylan	094d7af8c3	[mcp-server] Populate notifications._meta with requestId (#1704 ) ## Summary Per the [latest MCP spec](https://modelcontextprotocol.io/specification/2025-06-18/basic#meta), the `_meta` field is reserved for metadata. In the [Typescript Schema](`0695a497eb/schema/2025-06-18/schema.ts (L37-L40)`), `progressToken` is defined as a value to be attached to subsequent notifications for that request. The [CallToolRequestParams](`0695a497eb/schema/2025-06-18/schema.ts (L806-L817)`) extends this definition but overwrites the params field. This ambiguity makes our generated type definitions tricky, so I'm going to skip `progressToken` field for now and just send back the `requestId` instead. In a future PR, we can clarify, update our `generate_mcp_types.py` script, and update our progressToken logic accordingly. ## Testing - [x] Added unit tests - [x] Manually tested with mcp client	2025-07-28 13:32:09 -07:00
Michael Bolin	2405c40026	chore: update Codex::spawn() to return a struct instead of a tuple (#1677 ) Also update `init_codex()` to return a `struct` instead of a tuple, as well.	2025-07-27 20:01:35 -07:00
aibrahim-oai	b4ab7c1b73	Flaky CI fix (#1647 ) Flushing before sending `TaskCompleteEvent` and ending the submission loop to avoid race conditions.	2025-07-23 15:03:26 -07:00
Gabriel Peal	084236f717	Add call_id to patch approvals and elicitations (#1660 ) Builds on https://github.com/openai/codex/pull/1659 and adds call_id to a few more places for the same reason.	2025-07-23 15:55:35 -04:00
Gabriel Peal	bc944e77f5	Improve messages emitted for exec failures (#1659 ) 1. Emit call_id to exec approval elicitations for mcp client convenience 2. Remove the `-retry` from the call id for the same reason as above but upstream the reset behavior to the mcp client	2025-07-23 14:43:53 -04:00
aibrahim-oai	01c0896f0f	Adding interrupt Support to MCP (#1646 )	2025-07-22 20:33:49 +00:00
Gabriel Peal	710f728124	Add an elicitation for approve patch and refactor tool calls (#1642 ) 1. Added an elicitation for `approve-patch` which is very similar to `approve-exec`. 2. Extracted both elicitations to their own files to prevent `codex_tool_runner` from blowing up in size.	2025-07-22 02:58:41 -04:00
Dylan	18b2b30841	[mcp-server] Add reply tool call (#1643 ) ## Summary Adds a new mcp tool call, `codex-reply`, so we can continue existing sessions. This is a first draft and does not yet support sessions from previous processes. ## Testing - [x] tested with mcp client	2025-07-21 21:01:56 -07:00
Michael Bolin	d49d802b06	test: add integration test for MCP server (#1633 ) This PR introduces a single integration test for `cargo mcp`, though it also introduces a number of reusable components so that it should be easier to introduce more integration tests going forward. The new test is introduced in `codex-rs/mcp-server/tests/elicitation.rs` and the reusable pieces are in `codex-rs/mcp-server/tests/common`. The test itself verifies new functionality around elicitations introduced in https://github.com/openai/codex/pull/1623 (and the fix introduced in https://github.com/openai/codex/pull/1629) by doing the following: - starts a mock model provider with canned responses for `/v1/chat/completions` - starts the MCP server with a `config.toml` to use that model provider (and `approval_policy = "untrusted"`) - sends the `codex` tool call which causes the mock model provider to request a shell call for `git init` - the MCP server sends an elicitation to the client to approve the request - the client replies to the elicitation with `"approved"` - the MCP server runs the command and re-samples the model, getting a `"finish_reason": "stop"` - in turn, the MCP server sends the final response to the original `codex` tool call - verifies that `git init` ran as expected To test: ``` cargo test shell_command_approval_triggers_elicitation ``` In writing this test, I discovered that `ExecApprovalResponse` does not conform to `ElicitResult`, so I added a TODO to fix that, since I think that should be updated in a separate PR. As it stands, this PR does not update any business logic, though it does make a number of members of the `mcp-server` crate `pub` so they can be used in the test. One additional learning from this PR is that `std::process::Command::cargo_bin()` from the `assert_cmd` trait is only available for `std::process::Command`, but we really want to use `tokio::process::Command` so that everything is async and we can leverage utilities like `tokio::time::timeout()`. The trick I came up with was to use `cargo_bin()` to locate the program, and then to use `std::process::Command::get_program()` when constructing the `tokio::process::Command`.	2025-07-21 10:27:07 -07:00
Michael Bolin	8a6c6cee88	fix: address review feedback on #1621 and #1623 (#1631 ) - formalizes `ExecApprovalElicitRequestParams` - adds some defensive logic when messages fail to parse - fixes a typo in a comment	2025-07-20 14:42:11 -07:00
Gabriel Peal	8b590105de	Don't drop sessions on elicitation responses (#1629 )	2025-07-20 13:31:19 -04:00
Michael Bolin	018003e52f	feat: leverage elicitations in the MCP server (#1623 ) This updates the MCP server so that if it receives an `ExecApprovalRequest` from the `Codex` session, it in turn sends an [MCP elicitation](https://modelcontextprotocol.io/specification/draft/client/elicitation) to the client to ask for the approval decision. Upon getting a response, it forwards the client's decision via `Op::ExecApproval`. Admittedly, we should be doing the same thing for `ApplyPatchApprovalRequest`, but this is our first time experimenting with elicitations, so I'm inclined to defer wiring that code path up until we feel good about how this one works. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1623). * __->__ #1623 * #1622 * #1621 * #1620	2025-07-19 01:32:03 -04:00
Michael Bolin	11fd3123be	chore: introduce OutgoingMessageSender (#1622 ) Previous to this change, `MessageProcessor` had a `tokio::sync::mpsc::Sender<JSONRPCMessage>` as an abstraction for server code to send a message down to the MCP client. Because `Sender` is cheap to `clone()`, it was straightforward to make it available to tasks scheduled with `tokio::task::spawn()`. This worked well when we were only sending notifications or responses back down to the client, but we want to add support for sending elicitations in #1623, which means that we need to be able to send _requests_ to the client, and now we need a bit of centralization to ensure all request ids are unique. To that end, this PR introduces `OutgoingMessageSender`, which houses the existing `Sender<OutgoingMessage>` as well as an `AtomicI64` to mint out new, unique request ids. It has methods like `send_request()` and `send_response()` so that callers do not have to deal with `JSONRPCMessage` directly, as having to set the `jsonrpc` for each message was a bit tedious (this cleans up `codex_tool_runner.rs` quite a bit). We do not have `OutgoingMessageSender` implement `Clone` because it is important that the `AtomicI64` is shared across all users of `OutgoingMessageSender`. As such, `Arc<OutgoingMessageSender>` must be used instead, as it is frequently shared with new tokio tasks. As part of this change, we update `message_processor.rs` to embrace `await`, though we must be careful that no individual handler blocks the main loop and prevents other messages from being handled. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1622). * #1623 * __->__ #1622 * #1621 * #1620	2025-07-19 00:30:56 -04:00
Michael Bolin	e78ec00e73	chore: support MCP schema 2025-06-18 (#1621 ) This updates the schema in `generate_mcp_types.py` from `2025-03-26` to `2025-06-18`, regenerates `mcp-types/src/lib.rs`, and then updates all the code that uses `mcp-types` to honor the changes. Ran ``` npx @modelcontextprotocol/inspector just codex mcp ``` and verified that I was able to invoke the `codex` tool, as expected. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1621). * #1623 * #1622 * __->__ #1621	2025-07-19 00:09:34 -04:00
aibrahim-oai	2bd3314886	support deltas in core (#1587 ) - Added support for message and reasoning deltas - Skipped adding the support in the cli and tui for later - Commented a failing test (wrong merge) that needs fix in a separate PR. Side note: I think we need to disable merge when the CI don't pass.	2025-07-16 15:11:18 -07:00
Michael Bolin	94f5cad895	fix: when invoking Codex via MCP, use the request id as the Submission id (#1554 ) Small quality-of-life improvement when using `codex mcp`.	2025-07-12 16:22:02 -07:00
Michael Bolin	fcfe43c7df	feat: show number of tokens remaining in UI (#1388 ) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: `296996d74e/codex-cli/src/utils/approximate-tokens-used.ts (L3-L16)` Fixes https://github.com/openai/codex/issues/1242	2025-06-25 23:31:11 -07:00
Michael Bolin	d766e845b3	feat: experimental --output-last-message flag to exec subcommand (#1037 ) This introduces an experimental `--output-last-message` flag that can be used to identify a file where the final message from the agent will be written. Two use cases: - Ultimately, we will likely add a `--quiet` option to `exec`, but even if the user does not want any output written to the terminal, they probably want to know what the agent did. Writing the output to a file makes it possible to get that information in a clean way. - Relatedly, when using `exec` in CI, it is easier to review the transcript written "normally," (i.e., not as JSON or something with extra escapes), but getting programmatic access to the last message is likely helpful, so writing the last message to a file gets the best of both worlds. I am calling this "experimental" because it is possible that we are overfitting and will want a more general solution to this problem that would justify removing this flag.	2025-05-19 16:08:18 -07:00
Michael Bolin	ce2ecbe72f	feat: record messages from user in ~/.codex/history.jsonl (#939 ) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: `3fdf9df133/codex-cli/src/utils/storage/command-history.ts (L10-L17)` We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.	2025-05-15 16:26:23 -07:00
Michael Bolin	34aa1991f1	chore: handle all cases for EventMsg (#936 ) For now, this removes the `#[non_exhaustive]` directive on `EventMsg` so that we are forced to handle all `EventMsg` by default. (We may revisit this if/when we publish `core/` as a `lib` crate.) For now, it is helpful to have this as a forcing function because we have effectively two UIs (`tui` and `exec`) and usually when we add a new variant to `EventMsg`, we want to be sure that we update both.	2025-05-14 13:36:43 -07:00
Michael Bolin	a5f3a34827	fix: change EventMsg enum so every variant takes a single struct (#925 ) https://github.com/openai/codex/pull/922 did this for the `SessionConfigured` enum variant, and I think it is generally helpful to be able to work with the values as each enum variant as their own type, so this converts the remaining variants and updates all of the callsites. Added a simple unit test to verify that the JSON-serialized version of `Event` does not have any unexpected nesting.	2025-05-13 20:44:42 -07:00
jcoens-openai	f3bd143867	Disallow expect via lints (#865 ) Adds `expect()` as a denied lint. Same deal applies with `unwrap()` where we now need to put `#[expect(...` on ones that we legit want. Took care to enable `expect()` in test contexts. # Tests ``` cargo fmt cargo clippy --all-features --all-targets --no-deps -- -D warnings cargo test ```	2025-05-12 08:45:46 -07:00
jcoens-openai	8a89d3aeda	Update cargo to 2024 edition (#842 ) Some effects of this change: - New formatting changes across many files. No functionality changes should occur from that. - Calls to `set_env` are considered unsafe, since this only happens in tests we wrap them in `unsafe` blocks	2025-05-07 08:37:48 -07:00
Michael Bolin	2b72d05c5e	feat: make Codex available as a tool when running it as an MCP server (#811 ) This PR replaces the placeholder `"echo"` tool call in the MCP server with a `"codex"` tool that calls Codex. Events such as `ExecApprovalRequest` and `ApplyPatchApprovalRequest` are not handled properly yet, but I have `approval_policy = "never"` set in my `~/.codex/config.toml` such that those codepaths are not exercised. The schema for this MPC tool is defined by a new `CodexToolCallParam` struct introduced in this PR. It is fairly similar to `ConfigOverrides`, as the param is used to help create the `Config` used to start the Codex session, though it also includes the `prompt` used to kick off the session. This PR also introduces the use of the third-party `schemars` crate to generate the JSON schema, which is verified in the `verify_codex_tool_json_schema()` unit test. Events that are dispatched during the Codex session are sent back to the MCP client as MCP notifications. This gives the client a way to monitor progress as the tool call itself may take minutes to complete depending on the complexity of the task requested by the user. In the video below, I launched the server via: ```shell mcp-server$ RUST_LOG=debug npx @modelcontextprotocol/inspector cargo run -- ``` In the video, you can see the flow of: * requesting the list of tools * choosing the codex tool * entering a value for prompt and then making the tool call Note that I left the other fields blank because when unspecified, the values in my `~/.codex/config.toml` were used: https://github.com/user-attachments/assets/1975058c-b004-43ef-8c8d-800a953b8192 Note that while using the inspector, I did run into https://github.com/modelcontextprotocol/inspector/issues/293, though the tip about ensuring I had only one instance of the MCP Inspector tab open in my browser seemed to fix things.	2025-05-05 07:16:19 -07:00

34 Commits