This is a very large PR with some non-backwards-compatible changes.
Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish
server that had two overlapping responsibilities:
- Running an MCP server, providing some basic tool calls.
- Running the app server used to power experiences such as the VS Code
extension.
This PR aims to separate these into distinct concepts:
- `codex mcp-server` for the MCP server
- `codex app-server` for the "application server"
Note `codex mcp` still exists because it already has its own subcommands
for MCP management (`list`, `add`, etc.)
The MCP logic continues to live in `codex-rs/mcp-server` whereas the
refactored app server logic is in the new `codex-rs/app-server` folder.
Note that most of the existing integration tests in
`codex-rs/mcp-server/tests/suite` were actually for the app server, so
all the tests have been moved with the exception of
`codex-rs/mcp-server/tests/suite/mod.rs`.
Because this is already a large diff, I tried not to change more than I
had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses
the name `McpProcess` for now, but I will do some mechanical renamings
to things like `AppServer` in subsequent PRs.
While `mcp-server` and `app-server` share some overlapping functionality
(like reading streams of JSONL and dispatching based on message types)
and some differences (completely different message types), I ended up
doing a bit of copypasta between the two crates, as both have somewhat
similar `message_processor.rs` and `outgoing_message.rs` files for now,
though I expect them to diverge more in the near future.
One material change is that of the initialize handshake for `codex
app-server`, as we no longer use the MCP types for that handshake.
Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an
`Initialize` variant to `ClientRequest`, which takes the `ClientInfo`
object we need to update the `USER_AGENT_SUFFIX` in
`codex-rs/app-server/src/message_processor.rs`.
One other material change is in
`codex-rs/app-server/src/codex_message_processor.rs` where I eliminated
a use of the `send_event_as_notification()` method I am generally trying
to deprecate (because it blindly maps an `EventMsg` into a
`JSONNotification`) in favor of `send_server_notification()`, which
takes a `ServerNotification`, as that is intended to be a custom enum of
all notification types supported by the app server. So to make this
update, I had to introduce a new variant of `ServerNotification`,
`SessionConfigured`, which is a non-backwards compatible change with the
old `codex mcp`, and clients will have to be updated after the next
release that contains this PR. Note that
`codex-rs/app-server/tests/suite/list_resume.rs` also had to be update
to reflect this change.
I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility
crate to avoid some of the copying between `mcp-server` and
`app-server`.
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
This PR addresses an edge-case bug that appears in the VS Code extension
in the following situation:
1. Log in using ChatGPT (using either the CLI or extension). This will
create an `auth.json` file.
2. Manually modify `config.toml` to specify a custom provider.
3. Start a fresh copy of the VS Code extension.
The profile menu in the VS Code extension will indicate that you are
logged in using ChatGPT even though you're not.
This is caused by the `get_auth_status` method returning an
`auth_method: 'chatgpt'` when a custom provider is configured and it
doesn't use OpenAI auth (i.e. `requires_openai_auth` is false). The
method should always return `auth_method: None` if
`requires_openai_auth` is false.
The same bug also causes the NUX (new user experience) screen to be
displayed in the VSCE in this situation.
It turns out that we want slightly different behavior for the
`SetDefaultModel` RPC because some models do not work with reasoning
(like GPT-4.1), so we should be able to explicitly clear this value.
Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.
This adds `SetDefaultModel`, which takes `model` and `reasoning_effort`
as optional fields. If set, the field will overwrite what is in the
user's `config.toml`.
This reuses logic that was added to support the `/model` command in the
TUI: https://github.com/openai/codex/pull/2799.
`ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
This PR does the following:
* Adds the ability to paste or type an API key.
* Removes the `preferred_auth_method` config option. The last login
method is always persisted in auth.json, so this isn't needed.
* If OPENAI_API_KEY env variable is defined, the value is used to
prepopulate the new UI. The env variable is otherwise ignored by the
CLI.
* Adds a new MCP server entry point "login_api_key" so we can implement
this same API key behavior for the VS Code extension.
<img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
/>
<img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
/>
This adds a simple endpoint that provides the email address encoded in
`$CODEX_HOME/auth.json`.
As noted, for now, we do not hit the server to verify this is the user's
true email address.
https://github.com/openai/codex/pull/3395 updated `mcp-types/src/lib.rs`
by hand, but that file is generated code that is produced by
`mcp-types/generate_mcp_types.py`. Unfortunately, we do not have
anything in CI to verify this right now, but I will address that in a
subsequent PR.
#3395 ended up introducing a change that added a required field when
deserializing `InitializeResult`, breaking Codex when used as an MCP
client, so the quick fix in #3436 was to make the new field `Optional`
with `skip_serializing_if = "Option::is_none"`, but that did not address
the problem that `mcp-types/generate_mcp_types.py` and
`mcp-types/src/lib.rs` are out of sync.
This PR gets things back to where they are in sync. It removes the
custom `mcp_types::McpClientInfo` type that was added to
`mcp-types/src/lib.rs` and forces us to use the generated
`mcp_types::Implementation` type. Though this PR also updates
`generate_mcp_types.py` to generate the additional `user_agent:
Optional<String>` field on `Implementation` so that we can continue to
specify it when Codex operates as an MCP server.
However, this also requires us to specify `user_agent: None` when Codex
operates as an MCP client.
We may want to introduce our own `InitializeResult` type that is
specific to when we run as a server to avoid this in the future, but my
immediate goal is just to get things back in sync.
This PR improves two existing auth-related tests. They were failing when
run in an environment where an `OPENAI_API_KEY` env variable was
defined. The change makes them more resilient.
The previous config approach had a few issues:
1. It is part of the config but not designed to be used externally
2. It had to be wired through many places (look at the +/- on this PR
3. It wasn't guaranteed to be set consistently everywhere because we
don't have a super well defined way that configs stack. For example, the
extension would configure during newConversation but anything that
happened outside of that (like login) wouldn't get it.
This env var approach is cleaner and also creates one less thing we have
to deal with when coming up with a better holistic story around configs.
One downside is that I removed the unit test testing for the override
because I don't want to deal with setting the global env or spawning
child processes and figuring out how to introspect their originator
header. The new code is sufficiently simple and I tested it e2e that I
feel as if this is still worth it.
Adds support for `ArchiveConversation` in the JSON-RPC server that takes
a `(ConversationId, PathBuf)` pair and:
- verifies the `ConversationId` corresponds to the rollout id at the
`PathBuf`
- if so, invokes
`ConversationManager.remove_conversation(ConversationId)`
- if the `CodexConversation` was in memory, send `Shutdown` and wait for
`ShutdownComplete` with a timeout
- moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`
---------
Co-authored-by: Gabriel Peal <gabriel@openai.com>
Adding the `rollout_path` to the `NewConversationResponse` makes it so a
client can perform subsequent operations on a `(ConversationId,
PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
* #3353
* __->__ #3352
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.
I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.
Not having the mode to store requests also allows us to simplify the
code.
## Breaking change
The `disable_response_storage` configuration option is removed.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
## Summary
Follow-up to #3056
This PR updates the mcp-server interface for reading the config settings
saved by the user. At risk of introducing _another_ Config struct, I
think it makes sense to avoid tying our protocol to ConfigToml, as its
become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
this already - better to make it explicit, in my opinion.
This is technically a breaking change of the mcp-server protocol, but
given the previous interface was introduced so recently in #2725, and we
have not yet even started to call it, I propose proceeding with the
breaking change - but am open to preserving the old endpoint.
## Testing
- [x] Added additional integration test coverage
`test_shell_command_approval_triggers_elicitation()` is one of a number
of integration tests that we have observed to be flaky on GitHub CI, so
this PR tries to reduce the flakiness _and_ to provide us with more
information when it flakes. Specifically:
- Changed the command that we use to trigger the elicitation from `git
init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'`
because running `git` seems more likely to invite variance.
- Increased the timeout to wait for the task response from 10s to 20s.
- Added more logging.
## Summary
Adds a GetConfig request to the MCP Protocol, so MCP clients can
evaluate the resolved config.toml settings which the harness is using.
## Testing
- [x] Added an end to end test of the endpoint
this dramatically improves time to run `cargo test -p codex-core` (~25x
speedup).
before:
```
cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total
```
after:
```
cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total
```
both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
to exclude compile times.
approach inspired by [Delete Cargo Integration
Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
we move all test cases in tests/ into a single suite in order to have a
single binary, as there is significant overhead for each test binary
executed, and because test execution is only parallelized with a single
binary.
## Summary
- read the shell exec approval request's actual id instead of assuming
it is always 0
- use that id when validating and responding in the test
## Testing
- `cargo test -p codex-mcp-server
test_shell_command_approval_triggers_elicitation`
------
https://chatgpt.com/codex/tasks/task_i_68a6ab9c732c832c81522cbf11812be0
This PR adds a central `AuthManager` struct that manages the auth
information used across conversations and the MCP server. Prior to this,
each conversation and the MCP server got their own private snapshots of
the auth information, and changes to one (such as a logout or token
refresh) were not seen by others.
This is especially problematic when multiple instances of the CLI are
run. For example, consider the case where you start CLI 1 and log in to
ChatGPT account X and then start CLI 2 and log out and then log in to
ChatGPT account Y. The conversation in CLI 1 is still using account X,
but if you create a new conversation, it will suddenly (and
unexpectedly) switch to account Y.
With the `AuthManager`, auth information is read from disk at the time
the `ConversationManager` is constructed, and it is cached in memory.
All new conversations use this same auth information, as do any token
refreshes.
The `AuthManager` is also used by the MCP server's GetAuthStatus
command, which now returns the auth method currently used by the MCP
server.
This PR also includes an enhancement to the GetAuthStatus command. It
now accepts two new (optional) input parameters: `include_token` and
`refresh_token`. Callers can use this to request the in-use auth token
and can optionally request to refresh the token.
The PR also adds tests for the login and auth APIs that I recently added
to the MCP server.
The existing `wire_format.rs` should share more types with the
`codex-protocol` crate (like `AskForApproval` instead of maintaining a
parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
`wire_format.rs` into `codex-protocol`, renaming it as
`mcp-protocol.rs`. We also de-dupe types, where appropriate.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
* #2424
* __->__ #2423
Introduces `EventMsg::TurnAborted` that should be sent in response to
`Op::Interrupt`.
In the MCP server, updates the handling of a
`ClientRequest::InterruptConversation` request such that it sends the
`Op::Interrupt` but does not respond to the request until it sees an
`EventMsg::TurnAborted`.
This adds a new request type, `SendUserTurn`, that makes it possible to
submit a `Op::UserTurn` operation (introduced in #2329) to a
conversation. This PR also adds a new integration test that verifies
that changing from `AskForApproval::UnlessTrusted` to
`AskForApproval::Never` mid-conversation ensures that an elicitation is
no longer sent for running `python3 -c print(42)`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345).
* __->__ #2345
* #2329
* #2343
* #2340
* #2338
I still see flakiness in
`test_shell_command_approval_triggers_elicitation()` on occasion where
`MockServer` claims it has not received all of its expected requests.
I recently introduced a similar type of test in #2264,
`test_codex_jsonrpc_conversation_flow()`, which I have not seen flake
(yet!), so this PR pulls over two things I did in that test:
- increased `worker_threads` from `2` to `4`
- added an assertion to make sure the `task_complete` notification is
received
Honestly, I'm still not sure why `MockServer` claims it sometimes does
not receive all its expected requests given that we assert that the
final `JSONRPCResponse` is read on the stream, but let's give this a
shot.
Assuming this fixes things, my hypothesis is that the increase in
`worker_threads` helps because perhaps there are async tasks in
`MockServer` that do not reliably complete fully when there are not
enough threads available? If that is correct, it seems like the test
would still be flaky, though perhaps with lower frequency?
This PR:
* Added the clippy.toml to configure allowable expect / unwrap usage in
tests
* Removed as many expect/allow lines as possible from tests
* moved a bunch of allows to expects where possible
Note: in integration tests, non `#[test]` helper functions are not
covered by this so we had to leave a few lingering `expect(expect_used`
checks around
This updates `CodexMessageProcessor` so that each notification it sends
for a `EventMsg` from a `CodexConversation` such that:
- The `params` always has an appropriate `conversationId` field.
- The `method` is now includes the name of the `EventMsg` type rather
than using `codex/event` as the `method` type for all notifications. (We
currently prefix the method name with `codex/event/`, but I think that
should go away once we formalize the notification schema in
`wire_format.rs`.)
As part of this, we update `test_codex_jsonrpc_conversation_flow()` to
verify that the `task_finished` notification has made it through the
system instead of sleeping for 5s and "hoping" the server finished
processing the task. Note we have seen some flakiness in some of our
other, similar integration tests, and I expect adding a similar check
would help in those cases, as well.
This introduces a new set of request types that our `codex mcp`
supports. Note that these do not conform to MCP tool calls so that
instead of having to send something like this:
```json
{
"jsonrpc": "2.0",
"method": "tools/call",
"id": 42,
"params": {
"name": "newConversation",
"arguments": {
"model": "gpt-5",
"approvalPolicy": "on-request"
}
}
}
```
we can send something like this:
```json
{
"jsonrpc": "2.0",
"method": "newConversation",
"id": 42,
"params": {
"model": "gpt-5",
"approvalPolicy": "on-request"
}
}
```
Admittedly, this new format is not a valid MCP tool call, but we are OK
with that right now. (That is, not everything we might want to request
of `codex mcp` is something that is appropriate for an autonomous agent
to do.)
To start, this introduces four request types:
- `newConversation`
- `sendUserMessage`
- `addConversationListener`
- `removeConversationListener`
The new `mcp-server/tests/codex_message_processor_flow.rs` shows how
these can be used.
The types are defined on the `CodexRequest` enum, so we introduce a new
`CodexMessageProcessor` that is responsible for dealing with requests
from this enum. The top-level `MessageProcessor` has been updated so
that when `process_request()` is called, it first checks whether the
request conforms to `CodexRequest` and dispatches it to
`CodexMessageProcessor` if so.
Note that I also decided to use `camelCase` for the on-the-wire format,
as that seems to be the convention for MCP.
For the moment, the new protocol is defined in `wire_format.rs` within
the `mcp-server` crate, but in a subsequent PR, I will probably move it
to its own crate to ensure the protocol has minimal dependencies and
that we can codegen a schema from it.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2264).
* #2278
* __->__ #2264
## Summary
Forgot to remove this in #1869 last night! Too much of a performance hit
on the main thread. We can bring it back via an async thread on startup.
## Summary
Includes a new user message in the api payload which provides useful
environment context for the model, so it knows about things like the
current working directory and the sandbox.
## Testing
Updated unit tests
## Summary
Have seen these tests flaking over the course of today on different
boxes. `wiremock` seems to be generally written with tokio/threads in
mind but based on the weird panics from the tests, let's see if this
helps.
Introduce conversation.create handler (handle_create_conversation) and
wire it in MessageProcessor.
Stack:
Top: #1783
Bottom: #1784
---------
Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>