This updates `CodexMessageProcessor` so that each notification it sends
for a `EventMsg` from a `CodexConversation` such that:
- The `params` always has an appropriate `conversationId` field.
- The `method` is now includes the name of the `EventMsg` type rather
than using `codex/event` as the `method` type for all notifications. (We
currently prefix the method name with `codex/event/`, but I think that
should go away once we formalize the notification schema in
`wire_format.rs`.)
As part of this, we update `test_codex_jsonrpc_conversation_flow()` to
verify that the `task_finished` notification has made it through the
system instead of sleeping for 5s and "hoping" the server finished
processing the task. Note we have seen some flakiness in some of our
other, similar integration tests, and I expect adding a similar check
would help in those cases, as well.
This introduces a new set of request types that our `codex mcp`
supports. Note that these do not conform to MCP tool calls so that
instead of having to send something like this:
```json
{
"jsonrpc": "2.0",
"method": "tools/call",
"id": 42,
"params": {
"name": "newConversation",
"arguments": {
"model": "gpt-5",
"approvalPolicy": "on-request"
}
}
}
```
we can send something like this:
```json
{
"jsonrpc": "2.0",
"method": "newConversation",
"id": 42,
"params": {
"model": "gpt-5",
"approvalPolicy": "on-request"
}
}
```
Admittedly, this new format is not a valid MCP tool call, but we are OK
with that right now. (That is, not everything we might want to request
of `codex mcp` is something that is appropriate for an autonomous agent
to do.)
To start, this introduces four request types:
- `newConversation`
- `sendUserMessage`
- `addConversationListener`
- `removeConversationListener`
The new `mcp-server/tests/codex_message_processor_flow.rs` shows how
these can be used.
The types are defined on the `CodexRequest` enum, so we introduce a new
`CodexMessageProcessor` that is responsible for dealing with requests
from this enum. The top-level `MessageProcessor` has been updated so
that when `process_request()` is called, it first checks whether the
request conforms to `CodexRequest` and dispatches it to
`CodexMessageProcessor` if so.
Note that I also decided to use `camelCase` for the on-the-wire format,
as that seems to be the convention for MCP.
For the moment, the new protocol is defined in `wire_format.rs` within
the `mcp-server` crate, but in a subsequent PR, I will probably move it
to its own crate to ensure the protocol has minimal dependencies and
that we can codegen a schema from it.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2264).
* #2278
* __->__ #2264
This PR does two things because after I got deep into the first one I
started pulling on the thread to the second:
- Makes `ConversationManager` the place where all in-memory
conversations are created and stored. Previously, `MessageProcessor` in
the `codex-mcp-server` crate was doing this via its `session_map`, but
this is something that should be done in `codex-core`.
- It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
throughout our code. I think this made sense at one time, but now that
we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
I don't think this was quite right, so I removed it. For `codex exec`
and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
no longer make `Notify` a field of `Codex` or `CodexConversation`.
Changes of note:
- Adds the files `conversation_manager.rs` and `codex_conversation.rs`
to `codex-core`.
- `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
other crates must use `CodexConversation` instead (which is created via
`ConversationManager`).
- `core/src/codex_wrapper.rs` has been deleted in favor of
`ConversationManager`.
- `ConversationManager::new_conversation()` returns `NewConversation`,
which is in line with the `new_conversation` tool we want to add to the
MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
verify `SessionConfiguredEvent` is the first event because that is now
internal to `ConversationManager`.
- Quite a bit of code was deleted from
`codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
manage multiple conversations itself: it goes through
`ConversationManager` instead.
- `core/tests/live_agent.rs` has been deleted because I had to update a
bunch of tests and all the tests in here were ignored, and I don't think
anyone ever ran them, so this was just technical debt, at this point.
- Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
hope to refactor the blandly-named `util.rs` into more descriptive
files).
- In general, I started replacing local variables named `codex` as
`conversation`, where appropriate, though admittedly I didn't do it
through all the integration tests because that would have added a lot of
noise to this PR.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
* #2264
* #2263
* __->__ #2240
Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.
## Summary
#1865 added `AskForApproval::OnRequest`, but missed adding it to our
custom struct in `mcp-server`. This adds the missing configuration
## Testing
- [x] confirmed locally
# Note for reviewers
The bulk of this PR is in in the new file, `parse_command.rs`. This file
is designed to be written TDD and implemented with Codex. Do not worry
about reviewing the code, just review the unit tests (if you want). If
any cases are missing, we'll add more tests and have Codex fix them.
I think the best approach will be to land and iterate. I have some
follow-ups I want to do after this lands. The next PR after this will
let us merge (and dedupe) multiple sequential cells of the same such as
multiple read commands. The deduping will also be important because the
model often reads the same file multiple times in a row in chunks
===
This PR formats common commands like reading, formatting, testing, etc
more nicely:
It tries to extract things like file names, tests and falls back to the
cmd if it doesn't. It also only shows stdout/err if the command failed.
<img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
/>
<img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
/>
<img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
/>
Part 2: https://github.com/openai/codex/pull/2097
Part 3: https://github.com/openai/codex/pull/2110
## Summary
Forgot to remove this in #1869 last night! Too much of a performance hit
on the main thread. We can bring it back via an async thread on startup.
## Summary
Includes a new user message in the api payload which provides useful
environment context for the model, so it knows about things like the
current working directory and the sandbox.
## Testing
Updated unit tests
## Summary
Have seen these tests flaking over the course of today on different
boxes. `wiremock` seems to be generally written with tokio/threads in
mind but based on the weird panics from the tests, let's see if this
helps.
This adds support for easily running Codex backed by a local Ollama
instance running our new open source models. See
https://github.com/openai/gpt-oss for details.
If you pass in `--oss` you'll be prompted to install/launch ollama, and
it will automatically download the 20b model and attempt to use it.
We'll likely want to expand this with some options later to make the
experience smoother for users who can't run the 20b or want to run the
120b.
Co-authored-by: Michael Bolin <mbolin@openai.com>
https://github.com/openai/codex/pull/1835 has some messed up history.
This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR.
---------
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
This lets us show an accumulating diff across all patches in a turn.
Refer to the docs for TurnDiffTracker for implementation details.
There are multiple ways this could have been done and this felt like the
right tradeoff between reliability and completeness:
*Pros*
* It will pick up all changes to files that the model touched including
if they prettier or another command that updates them.
* It will not pick up changes made by the user or other agents to files
it didn't modify.
*Cons*
* It will pick up changes that the user made to a file that the model
also touched
* It will not pick up changes to codegen or files that were not modified
with apply_patch
Introduce conversation.create handler (handle_create_conversation) and
wire it in MessageProcessor.
Stack:
Top: #1783
Bottom: #1784
---------
Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
## Summary
- stream command stdout as `ExecCommandStdout` events
- forward streamed stdout to clients and ignore in human output
processor
- adjust call sites for new streaming API
- MCP server: add send-user-message tool to send user input to a running
Codex session
- Added an integration tests for the happy and sad paths
Changes:
• Add tool definition and schema.
• Expose tool in capabilities.
• Route and handle tool requests with validation.
• Tests for success, bad UUID, and missing session.
follow‑ups
• Listen path not implemented yet; the tool is present but marked “don’t
use yet” in code comments.
• Session run flag reset: clear running_session_id_set appropriately
after turn completion/errors.
This is the third PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
- Expose mcp_protocol from mcp-server for reuse in tests and callers.
- In MessageProcessor, detect structured ToolCallRequestParams in
tools/call and forward to a new handler.
- Add handle_new_tool_calls scaffold (returns error for now).
- Test helper: add send_send_user_message_tool_call to McpProcess to
send ConversationSendMessage requests;
This is the second PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
# Summary
- Align MCP server responses with mcp_types by emitting [CallToolResult,
RequestId] instead of an object.
Update send-message result to a tagged enum: Ok or Error { message }.
# Why
Protocol compliance with current MCP schema.
# Tests
- Updated assertions in mcp_protocol.rs for create/stream/send/list and
error cases.
This is the first PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
At 550 lines, `exec.rs` was a bit large. In particular, I found it hard
to locate the Seatbelt-related code quickly without a file with
`seatbelt` in the name, so this refactors things so:
- `spawn_command_under_seatbelt()` and dependent code moves to a new
`seatbelt.rs` file
- `spawn_child_async()` and dependent code moves to a new `spawn.rs`
file
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
## Summary
Per the [latest MCP
spec](https://modelcontextprotocol.io/specification/2025-06-18/basic#meta),
the `_meta` field is reserved for metadata. In the [Typescript
Schema](0695a497eb/schema/2025-06-18/schema.ts (L37-L40)),
`progressToken` is defined as a value to be attached to subsequent
notifications for that request.
The
[CallToolRequestParams](0695a497eb/schema/2025-06-18/schema.ts (L806-L817))
extends this definition but overwrites the params field. This ambiguity
makes our generated type definitions tricky, so I'm going to skip
`progressToken` field for now and just send back the `requestId`
instead.
In a future PR, we can clarify, update our `generate_mcp_types.py`
script, and update our progressToken logic accordingly.
## Testing
- [x] Added unit tests
- [x] Manually tested with mcp client
Because of a quirk of how implementation tests work in Rust, we had a
number of `#[allow(dead_code)]` annotations that were misleading because
the functions _were_ being used, just not by all integration tests in a
`tests/` folder, so when compiling the test that did not use the
function, clippy would complain that it was unused.
This fixes things by create a "test_support" crate under the `tests/`
folder that is imported as a dev dependency for the respective crate.
1. Emit call_id to exec approval elicitations for mcp client convenience
2. Remove the `-retry` from the call id for the same reason as above but
upstream the reset behavior to the mcp client
1. Added an elicitation for `approve-patch` which is very similar to
`approve-exec`.
2. Extracted both elicitations to their own files to prevent
`codex_tool_runner` from blowing up in size.
## Summary
Adds a new mcp tool call, `codex-reply`, so we can continue existing
sessions. This is a first draft and does not yet support sessions from
previous processes.
## Testing
- [x] tested with mcp client
This PR introduces a single integration test for `cargo mcp`, though it
also introduces a number of reusable components so that it should be
easier to introduce more integration tests going forward.
The new test is introduced in `codex-rs/mcp-server/tests/elicitation.rs`
and the reusable pieces are in `codex-rs/mcp-server/tests/common`.
The test itself verifies new functionality around elicitations
introduced in https://github.com/openai/codex/pull/1623 (and the fix
introduced in https://github.com/openai/codex/pull/1629) by doing the
following:
- starts a mock model provider with canned responses for
`/v1/chat/completions`
- starts the MCP server with a `config.toml` to use that model provider
(and `approval_policy = "untrusted"`)
- sends the `codex` tool call which causes the mock model provider to
request a shell call for `git init`
- the MCP server sends an elicitation to the client to approve the
request
- the client replies to the elicitation with `"approved"`
- the MCP server runs the command and re-samples the model, getting a
`"finish_reason": "stop"`
- in turn, the MCP server sends the final response to the original
`codex` tool call
- verifies that `git init` ran as expected
To test:
```
cargo test shell_command_approval_triggers_elicitation
```
In writing this test, I discovered that `ExecApprovalResponse` does not
conform to `ElicitResult`, so I added a TODO to fix that, since I think
that should be updated in a separate PR. As it stands, this PR does not
update any business logic, though it does make a number of members of
the `mcp-server` crate `pub` so they can be used in the test.
One additional learning from this PR is that
`std::process::Command::cargo_bin()` from the `assert_cmd` trait is only
available for `std::process::Command`, but we really want to use
`tokio::process::Command` so that everything is async and we can
leverage utilities like `tokio::time::timeout()`. The trick I came up
with was to use `cargo_bin()` to locate the program, and then to use
`std::process::Command::get_program()` when constructing the
`tokio::process::Command`.