valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	d9dbf48828	fix: separate `codex mcp` into `codex mcp-server` and `codex app-server` (#4471 ) This is a very large PR with some non-backwards-compatible changes. Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish server that had two overlapping responsibilities: - Running an MCP server, providing some basic tool calls. - Running the app server used to power experiences such as the VS Code extension. This PR aims to separate these into distinct concepts: - `codex mcp-server` for the MCP server - `codex app-server` for the "application server" Note `codex mcp` still exists because it already has its own subcommands for MCP management (`list`, `add`, etc.) The MCP logic continues to live in `codex-rs/mcp-server` whereas the refactored app server logic is in the new `codex-rs/app-server` folder. Note that most of the existing integration tests in `codex-rs/mcp-server/tests/suite` were actually for the app server, so all the tests have been moved with the exception of `codex-rs/mcp-server/tests/suite/mod.rs`. Because this is already a large diff, I tried not to change more than I had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses the name `McpProcess` for now, but I will do some mechanical renamings to things like `AppServer` in subsequent PRs. While `mcp-server` and `app-server` share some overlapping functionality (like reading streams of JSONL and dispatching based on message types) and some differences (completely different message types), I ended up doing a bit of copypasta between the two crates, as both have somewhat similar `message_processor.rs` and `outgoing_message.rs` files for now, though I expect them to diverge more in the near future. One material change is that of the initialize handshake for `codex app-server`, as we no longer use the MCP types for that handshake. Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an `Initialize` variant to `ClientRequest`, which takes the `ClientInfo` object we need to update the `USER_AGENT_SUFFIX` in `codex-rs/app-server/src/message_processor.rs`. One other material change is in `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated a use of the `send_event_as_notification()` method I am generally trying to deprecate (because it blindly maps an `EventMsg` into a `JSONNotification`) in favor of `send_server_notification()`, which takes a `ServerNotification`, as that is intended to be a custom enum of all notification types supported by the app server. So to make this update, I had to introduce a new variant of `ServerNotification`, `SessionConfigured`, which is a non-backwards compatible change with the old `codex mcp`, and clients will have to be updated after the next release that contains this PR. Note that `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update to reflect this change. I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility crate to avoid some of the copying between `mcp-server` and `app-server`.	2025-09-30 07:06:18 +00:00
Dylan	197f45a3be	[mcp-server] Expose fuzzy file search in MCP (#2677 ) ## Summary Expose a simple fuzzy file search implementation for mcp clients to work with ## Testing - [x] Tested locally	2025-09-29 12:19:09 -07:00
Michael Bolin	c172e8e997	feat: added SetDefaultModel to JSON-RPC server (#3512 ) This adds `SetDefaultModel`, which takes `model` and `reasoning_effort` as optional fields. If set, the field will overwrite what is in the user's `config.toml`. This reuses logic that was added to support the `/model` command in the TUI: https://github.com/openai/codex/pull/2799.	2025-09-11 23:44:17 -07:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Michael Bolin	65f3528cad	feat: add UserInfo request to JSON-RPC server (#3428 ) This adds a simple endpoint that provides the email address encoded in `$CODEX_HOME/auth.json`. As noted, for now, we do not hit the server to verify this is the user's true email address.	2025-09-10 17:03:35 -07:00
Michael Bolin	44262d8fd8	fix: ensure output of codex-rs/mcp-types/generate_mcp_types.py matches codex-rs/mcp-types/src/lib.rs (#3439 ) https://github.com/openai/codex/pull/3395 updated `mcp-types/src/lib.rs` by hand, but that file is generated code that is produced by `mcp-types/generate_mcp_types.py`. Unfortunately, we do not have anything in CI to verify this right now, but I will address that in a subsequent PR. #3395 ended up introducing a change that added a required field when deserializing `InitializeResult`, breaking Codex when used as an MCP client, so the quick fix in #3436 was to make the new field `Optional` with `skip_serializing_if = "Option::is_none"`, but that did not address the problem that `mcp-types/generate_mcp_types.py` and `mcp-types/src/lib.rs` are out of sync. This PR gets things back to where they are in sync. It removes the custom `mcp_types::McpClientInfo` type that was added to `mcp-types/src/lib.rs` and forces us to use the generated `mcp_types::Implementation` type. Though this PR also updates `generate_mcp_types.py` to generate the additional `user_agent: Optional<String>` field on `Implementation` so that we can continue to specify it when Codex operates as an MCP server. However, this also requires us to specify `user_agent: None` when Codex operates as an MCP client. We may want to introduce our own `InitializeResult` type that is specific to when we run as a server to avoid this in the future, but my immediate goal is just to get things back in sync.	2025-09-10 16:14:41 -07:00
Eric Traut	acb28bf914	Improved resiliency of two auth-related tests (#3427 ) This PR improves two existing auth-related tests. They were failing when run in an environment where an `OPENAI_API_KEY` env variable was defined. The change makes them more resilient.	2025-09-10 11:46:02 -07:00
Gabriel Peal	8636bff46d	Set a user agent suffix when used as a mcp server (#3395 ) This automatically adds a user agent suffix whenever the CLI is used as a MCP server	2025-09-10 02:32:57 +00:00
Michael Bolin	ace14e8d36	feat: add ArchiveConversation to ClientRequest (#3353 ) Adds support for `ArchiveConversation` in the JSON-RPC server that takes a `(ConversationId, PathBuf)` pair and: - verifies the `ConversationId` corresponds to the rollout id at the `PathBuf` - if so, invokes `ConversationManager.remove_conversation(ConversationId)` - if the `CodexConversation` was in memory, send `Shutdown` and wait for `ShutdownComplete` with a timeout - moves the `.jsonl` file to `$CODEX_HOME/archived_sessions` --------- Co-authored-by: Gabriel Peal <gabriel@openai.com>	2025-09-09 11:39:00 -04:00
Gabriel Peal	5c1416d99b	Add a getUserAgent MCP method (#3320 ) This will allow the extension to pass this user agent + a suffix for its requests	2025-09-08 13:30:13 -04:00
Ahmed Ibrahim	907d3dd348	MCP: add session resume + history listing; (#3185 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-09-04 23:44:18 +00:00
Dylan	82ed7bd285	[mcp-server] Update read config interface (#3093 ) ## Summary Follow-up to #3056 This PR updates the mcp-server interface for reading the config settings saved by the user. At risk of introducing _another_ Config struct, I think it makes sense to avoid tying our protocol to ConfigToml, as its become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for this already - better to make it explicit, in my opinion. This is technically a breaking change of the mcp-server protocol, but given the previous interface was introduced so recently in #2725, and we have not yet even started to call it, I propose proceeding with the breaking change - but am open to preserving the old endpoint. ## Testing - [x] Added additional integration test coverage	2025-09-04 16:26:41 -07:00
Michael Bolin	f09170b574	chore: print stderr from MCP server to test output using eprintln! (#2849 ) Related to https://github.com/openai/codex/pull/2848, I don't see the stderr from `codex mcp` colocated with the other stderr from `test_shell_command_approval_triggers_elicitation()` when it fails even though we have `RUST_LOG=debug` set when we spawn `codex mcp`: `1e9e703b96/codex-rs/mcp-server/tests/common/mcp_process.rs (L65)` Let's try this new logic which should be more explicit.	2025-08-28 12:43:13 -07:00
Michael Bolin	1e9e703b96	chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848 ) `test_shell_command_approval_triggers_elicitation()` is one of a number of integration tests that we have observed to be flaky on GitHub CI, so this PR tries to reduce the flakiness _and_ to provide us with more information when it flakes. Specifically: - Changed the command that we use to trigger the elicitation from `git init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'` because running `git` seems more likely to invite variance. - Increased the timeout to wait for the task response from 10s to 20s. - Added more logging.	2025-08-28 12:33:33 -07:00
Dylan	0cec0770e2	[mcp-server] Add GetConfig endpoint (#2725 ) ## Summary Adds a GetConfig request to the MCP Protocol, so MCP clients can evaluate the resolved config.toml settings which the harness is using. ## Testing - [x] Added an end to end test of the endpoint	2025-08-27 09:59:03 -07:00
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
Michael Bolin	712bfa04ac	chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423 ) The existing `wire_format.rs` should share more types with the `codex-protocol` crate (like `AskForApproval` instead of maintaining a parallel `CodexToolCallApprovalPolicy` enum), so this PR moves `wire_format.rs` into `codex-protocol`, renaming it as `mcp-protocol.rs`. We also de-dupe types, where appropriate. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423). * #2424 * __->__ #2423	2025-08-18 09:36:57 -07:00
Michael Bolin	a269754668	remove mcp-server/src/mcp_protocol.rs and the code that depends on it (#2360 )	2025-08-18 00:29:18 -07:00
Michael Bolin	eda50d8372	feat: introduce ClientRequest::SendUserTurn (#2345 ) This adds a new request type, `SendUserTurn`, that makes it possible to submit a `Op::UserTurn` operation (introduced in #2329) to a conversation. This PR also adds a new integration test that verifies that changing from `AskForApproval::UnlessTrusted` to `AskForApproval::Never` mid-conversation ensures that an elicitation is no longer sent for running `python3 -c print(42)`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * __->__ #2345 * #2329 * #2343 * #2340 * #2338	2025-08-15 10:05:58 -07:00
Michael Bolin	265fd89e31	fix: try to fix flakiness in test_shell_command_approval_triggers_elicitation (#2344 ) I still see flakiness in `test_shell_command_approval_triggers_elicitation()` on occasion where `MockServer` claims it has not received all of its expected requests. I recently introduced a similar type of test in #2264, `test_codex_jsonrpc_conversation_flow()`, which I have not seen flake (yet!), so this PR pulls over two things I did in that test: - increased `worker_threads` from `2` to `4` - added an assertion to make sure the `task_complete` notification is received Honestly, I'm still not sure why `MockServer` claims it sometimes does not receive all its expected requests given that we assert that the final `JSONRPCResponse` is read on the stream, but let's give this a shot. Assuming this fixes things, my hypothesis is that the increase in `worker_threads` helps because perhaps there are async tasks in `MockServer` that do not reliably complete fully when there are not enough threads available? If that is correct, it seems like the test would still be flaky, though perhaps with lower frequency?	2025-08-15 09:17:20 -07:00
Michael Bolin	a62510e0ae	fix: verify notifications are sent with the conversationId set (#2278 ) This updates `CodexMessageProcessor` so that each notification it sends for a `EventMsg` from a `CodexConversation` such that: - The `params` always has an appropriate `conversationId` field. - The `method` is now includes the name of the `EventMsg` type rather than using `codex/event` as the `method` type for all notifications. (We currently prefix the method name with `codex/event/`, but I think that should go away once we formalize the notification schema in `wire_format.rs`.) As part of this, we update `test_codex_jsonrpc_conversation_flow()` to verify that the `task_finished` notification has made it through the system instead of sleeping for 5s and "hoping" the server finished processing the task. Note we have seen some flakiness in some of our other, similar integration tests, and I expect adding a similar check would help in those cases, as well.	2025-08-13 17:54:12 -07:00
Michael Bolin	e7bad650ff	feat: support traditional JSON-RPC request/response in MCP server (#2264 ) This introduces a new set of request types that our `codex mcp` supports. Note that these do not conform to MCP tool calls so that instead of having to send something like this: ```json { "jsonrpc": "2.0", "method": "tools/call", "id": 42, "params": { "name": "newConversation", "arguments": { "model": "gpt-5", "approvalPolicy": "on-request" } } } ``` we can send something like this: ```json { "jsonrpc": "2.0", "method": "newConversation", "id": 42, "params": { "model": "gpt-5", "approvalPolicy": "on-request" } } ``` Admittedly, this new format is not a valid MCP tool call, but we are OK with that right now. (That is, not everything we might want to request of `codex mcp` is something that is appropriate for an autonomous agent to do.) To start, this introduces four request types: - `newConversation` - `sendUserMessage` - `addConversationListener` - `removeConversationListener` The new `mcp-server/tests/codex_message_processor_flow.rs` shows how these can be used. The types are defined on the `CodexRequest` enum, so we introduce a new `CodexMessageProcessor` that is responsible for dealing with requests from this enum. The top-level `MessageProcessor` has been updated so that when `process_request()` is called, it first checks whether the request conforms to `CodexRequest` and dispatches it to `CodexMessageProcessor` if so. Note that I also decided to use `camelCase` for the on-the-wire format, as that seems to be the convention for MCP. For the moment, the new protocol is defined in `wire_format.rs` within the `mcp-server` crate, but in a subsequent PR, I will probably move it to its own crate to ensure the protocol has minimal dependencies and that we can codegen a schema from it. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2264). * #2278 * __->__ #2264	2025-08-13 17:36:29 -07:00
aibrahim-oai	97ab8fb610	MCP: add conversation.create tool [Stack 2/2] (#1783 ) Introduce conversation.create handler (handle_create_conversation) and wire it in MessageProcessor. Stack: Top: #1783 Bottom: #1784 --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-01 22:18:36 +00:00
aibrahim-oai	ad0295b893	MCP server: route structured tool-call requests and expose mcp_protocol [Stack 2/3] (#1751 ) - Expose mcp_protocol from mcp-server for reuse in tests and callers. - In MessageProcessor, detect structured ToolCallRequestParams in tools/call and forward to a new handler. - Add handle_new_tool_calls scaffold (returns error for now). - Test helper: add send_send_user_message_tool_call to McpProcess to send ConversationSendMessage requests; This is the second PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 02:46:04 +00:00
aibrahim-oai	19bef7659f	Serializing the `eventmsg` type to snake_case (#1709 ) This was an abrupt change on our clients. We need to serialize as snake_case.	2025-07-28 10:26:27 -07:00
aibrahim-oai	5a0079fea2	Changing method in MCP notifications (#1684 ) - Changing the codex/event type	2025-07-26 10:35:49 -07:00
Michael Bolin	7af9cedbd7	fix: create separate test_support crates to eliminate #[allow(dead_code)] (#1667 ) Because of a quirk of how implementation tests work in Rust, we had a number of `#[allow(dead_code)]` annotations that were misleading because the functions _were_ being used, just not by all integration tests in a `tests/` folder, so when compiling the test that did not use the function, clippy would complain that it was unused. This fixes things by create a "test_support" crate under the `tests/` folder that is imported as a dev dependency for the respective crate.	2025-07-24 12:19:46 -07:00
aibrahim-oai	01c0896f0f	Adding interrupt Support to MCP (#1646 )	2025-07-22 20:33:49 +00:00
pakrym-oai	6d82907082	Add support for custom base instructions (#1645 ) Allows providing custom instructions file as a config parameter and custom instruction text via MCP tool call.	2025-07-22 09:42:22 -07:00
Gabriel Peal	710f728124	Add an elicitation for approve patch and refactor tool calls (#1642 ) 1. Added an elicitation for `approve-patch` which is very similar to `approve-exec`. 2. Extracted both elicitations to their own files to prevent `codex_tool_runner` from blowing up in size.	2025-07-22 02:58:41 -04:00
Michael Bolin	d49d802b06	test: add integration test for MCP server (#1633 ) This PR introduces a single integration test for `cargo mcp`, though it also introduces a number of reusable components so that it should be easier to introduce more integration tests going forward. The new test is introduced in `codex-rs/mcp-server/tests/elicitation.rs` and the reusable pieces are in `codex-rs/mcp-server/tests/common`. The test itself verifies new functionality around elicitations introduced in https://github.com/openai/codex/pull/1623 (and the fix introduced in https://github.com/openai/codex/pull/1629) by doing the following: - starts a mock model provider with canned responses for `/v1/chat/completions` - starts the MCP server with a `config.toml` to use that model provider (and `approval_policy = "untrusted"`) - sends the `codex` tool call which causes the mock model provider to request a shell call for `git init` - the MCP server sends an elicitation to the client to approve the request - the client replies to the elicitation with `"approved"` - the MCP server runs the command and re-samples the model, getting a `"finish_reason": "stop"` - in turn, the MCP server sends the final response to the original `codex` tool call - verifies that `git init` ran as expected To test: ``` cargo test shell_command_approval_triggers_elicitation ``` In writing this test, I discovered that `ExecApprovalResponse` does not conform to `ElicitResult`, so I added a TODO to fix that, since I think that should be updated in a separate PR. As it stands, this PR does not update any business logic, though it does make a number of members of the `mcp-server` crate `pub` so they can be used in the test. One additional learning from this PR is that `std::process::Command::cargo_bin()` from the `assert_cmd` trait is only available for `std::process::Command`, but we really want to use `tokio::process::Command` so that everything is async and we can leverage utilities like `tokio::time::timeout()`. The trick I came up with was to use `cargo_bin()` to locate the program, and then to use `std::process::Command::get_program()` when constructing the `tokio::process::Command`.	2025-07-21 10:27:07 -07:00

31 Commits