Commit Graph

68 Commits

Author SHA1 Message Date
Dylan
dc468d563f [env] Remove git config for now (#1884)
## Summary
Forgot to remove this in #1869 last night! Too much of a performance hit
on the main thread. We can bring it back via an async thread on startup.
2025-08-06 08:05:17 -07:00
Dylan
3e8bcf0247 [prompts] Add <environment_context> (#1869)
## Summary
Includes a new user message in the api payload which provides useful
environment context for the model, so it knows about things like the
current working directory and the sandbox.

## Testing
Updated unit tests
2025-08-06 01:13:31 -07:00
Dylan
cda39e417f [tests] Investigate flakey mcp-server test (#1877)
## Summary
Have seen these tests flaking over the course of today on different
boxes. `wiremock` seems to be generally written with tokio/threads in
mind but based on the weird panics from the tests, let's see if this
helps.
2025-08-06 00:07:58 -07:00
Michael Bolin
42bd73e150 chore: remove unnecessary default_ prefix (#1854)
This prefix is not inline with the other fields on the `ConfigOverrides`
struct.
2025-08-05 14:42:49 -07:00
easong-openai
9285350842 Introduce --oss flag to use gpt-oss models (#1848)
This adds support for easily running Codex backed by a local Ollama
instance running our new open source models. See
https://github.com/openai/gpt-oss for details.

If you pass in `--oss` you'll be prompted to install/launch ollama, and
it will automatically download the 20b model and attempt to use it.

We'll likely want to expand this with some options later to make the
experience smoother for users who can't run the 20b or want to run the
120b.

Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-08-05 11:31:11 -07:00
easong-openai
e0303dbac0 Rescue chat completion changes (#1846)
https://github.com/openai/codex/pull/1835 has some messed up history.

This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR.

---------

Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
2025-08-05 08:56:13 +00:00
Ahmed Ibrahim
e38ce39c51 Revert to 3f13ebce10 without rewriting history. Wrong merge 2025-08-04 17:03:24 -07:00
Ahmed Ibrahim
bd171e5206 add raw reasoning 2025-08-04 16:49:42 -07:00
Gabriel Peal
1f3318c1c5 Add a TurnDiffTracker to create a unified diff for an entire turn (#1770)
This lets us show an accumulating diff across all patches in a turn.
Refer to the docs for TurnDiffTracker for implementation details.

There are multiple ways this could have been done and this felt like the
right tradeoff between reliability and completeness:
*Pros*
* It will pick up all changes to files that the model touched including
if they prettier or another command that updates them.
* It will not pick up changes made by the user or other agents to files
it didn't modify.

*Cons*
* It will pick up changes that the user made to a file that the model
also touched
* It will not pick up changes to codegen or files that were not modified
with apply_patch
2025-08-04 11:57:04 -04:00
Gabriel Peal
4c9f7b6bcc Fix flaky test_shell_command_approval_triggers_elicitation test (#1802)
This doesn't flake very often but this should fix it.
2025-08-03 10:19:12 -04:00
aibrahim-oai
97ab8fb610 MCP: add conversation.create tool [Stack 2/2] (#1783)
Introduce conversation.create handler (handle_create_conversation) and
wire it in MessageProcessor.

Stack:
Top: #1783 
Bottom: #1784

---------

Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
2025-08-01 22:18:36 +00:00
aibrahim-oai
fe62f859a6 Add Error variant to ConversationCreateResult [Stack 1/2] (#1784)
Switch ConversationCreateResult from a struct to a tagged enum (Ok |
Error)

Stack:
Top: #1783 
Bottom: #1784
2025-08-01 15:13:53 -07:00
aibrahim-oai
f20de21cb6 collabse stdout and stderr delta events into one (#1787) 2025-08-01 14:00:19 -07:00
aibrahim-oai
bc7beddaa2 feat: stream exec stdout events (#1786)
## Summary
- stream command stdout as `ExecCommandStdout` events
- forward streamed stdout to clients and ignore in human output
processor
- adjust call sites for new streaming API
2025-08-01 13:04:34 -07:00
aibrahim-oai
f918198bbb Introduce a new function to just send user message [Stack 3/3] (#1686)
- MCP server: add send-user-message tool to send user input to a running
Codex session
- Added an integration tests for the happy and sad paths

Changes:
•	Add tool definition and schema.
•	Expose tool in capabilities.
•	Route and handle tool requests with validation.
•	Tests for success, bad UUID, and missing session.


follow‑ups
• Listen path not implemented yet; the tool is present but marked “don’t
use yet” in code comments.
• Session run flag reset: clear running_session_id_set appropriately
after turn completion/errors.

This is the third PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
2025-08-01 17:04:12 +00:00
aibrahim-oai
ad0295b893 MCP server: route structured tool-call requests and expose mcp_protocol [Stack 2/3] (#1751)
- Expose mcp_protocol from mcp-server for reuse in tests and callers.
- In MessageProcessor, detect structured ToolCallRequestParams in
tools/call and forward to a new handler.
- Add handle_new_tool_calls scaffold (returns error for now).
- Test helper: add send_send_user_message_tool_call to McpProcess to
send ConversationSendMessage requests;

This is the second PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
2025-08-01 02:46:04 +00:00
aibrahim-oai
d3aa5f46b7 MCP Protocol: Align tool-call response with CallToolResult [Stack 1/3] (#1750)
# Summary
- Align MCP server responses with mcp_types by emitting [CallToolResult,
RequestId] instead of an object.
Update send-message result to a tagged enum: Ok or Error { message }.

# Why
Protocol compliance with current MCP schema.

# Tests
- Updated assertions in mcp_protocol.rs for create/stream/send/list and
error cases.

This is the first PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
2025-08-01 02:30:03 +00:00
Michael Bolin
5a0ad5ab8f chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762)
At 550 lines, `exec.rs` was a bit large. In particular, I found it hard
to locate the Seatbelt-related code quickly without a file with
`seatbelt` in the name, so this refactors things so:

- `spawn_command_under_seatbelt()` and dependent code moves to a new
`seatbelt.rs` file
- `spawn_child_async()` and dependent code moves to a new `spawn.rs`
file
2025-07-31 13:11:47 -07:00
pakrym-oai
51b6bdefbe Auto format toml (#1745)
Add recommended extension and configure it to auto format prompt.
2025-07-30 18:37:00 -07:00
aibrahim-oai
2f5557056d moving input item from MCP Protocol back to core Protocol (#1740)
- Currently we have duplicate input item. Let's have one source of truth
in the core.
- Used Requestid type
2025-07-30 13:43:08 -07:00
aibrahim-oai
93341797c4 fix ci (#1739)
I think this commit broke the CI because it changed the
`McpToolCallBeginEvent` type:
347c81ad00
2025-07-30 11:32:38 -07:00
aibrahim-oai
3823b32b7a Mcp protocol (#1715)
- Add typed MCP protocol surface in
`codex-rs/mcp-server/src/mcp_protocol.rs` for `requests`, `responses`,
and `notifications`
- Requests: `NewConversation`, `Connect`, `SendUserMessage`,
`GetConversations`
- Message content parts: `Text`, `Image` (`ImageUrl`/`FileId`, optional
`ImageDetail`), File (`Url`/`Id`/`inline Data`)
- Responses: `ToolCallResponseEnvelope` with optional `isError` and
`structuredContent` variants (`NewConversation`, `Connect`,
`SendUserMessageAccepted`, `GetConversations`)
- Notifications: `InitialState`, `ConnectionRevoked`, `CodexEvent`,
`Cancelled`
- Uniform `_meta` on `notifications` via `NotificationMeta`
(`conversationId`, `requestId`)
- Unit tests validate JSON wire shapes for key
`requests`/`responses`/`notifications`
2025-07-29 20:14:41 -07:00
Gabriel Peal
8828f6f082 Add an experimental plan tool (#1726)
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
2025-07-29 14:22:02 -04:00
Dylan
094d7af8c3 [mcp-server] Populate notifications._meta with requestId (#1704)
## Summary
Per the [latest MCP
spec](https://modelcontextprotocol.io/specification/2025-06-18/basic#meta),
the `_meta` field is reserved for metadata. In the [Typescript
Schema](0695a497eb/schema/2025-06-18/schema.ts (L37-L40)),
`progressToken` is defined as a value to be attached to subsequent
notifications for that request.

The
[CallToolRequestParams](0695a497eb/schema/2025-06-18/schema.ts (L806-L817))
extends this definition but overwrites the params field. This ambiguity
makes our generated type definitions tricky, so I'm going to skip
`progressToken` field for now and just send back the `requestId`
instead.
 
In a future PR, we can clarify, update our `generate_mcp_types.py`
script, and update our progressToken logic accordingly.

## Testing
- [x] Added unit tests
- [x] Manually tested with mcp client
2025-07-28 13:32:09 -07:00
aibrahim-oai
19bef7659f Serializing the eventmsg type to snake_case (#1709)
This was an abrupt change on our clients. We need to serialize as
snake_case.
2025-07-28 10:26:27 -07:00
Michael Bolin
9102255854 fix: move arg0 handling out of codex-linux-sandbox and into its own crate (#1697) 2025-07-28 08:31:24 -07:00
Michael Bolin
2405c40026 chore: update Codex::spawn() to return a struct instead of a tuple (#1677)
Also update `init_codex()` to return a `struct` instead of a tuple, as well.
2025-07-27 20:01:35 -07:00
aibrahim-oai
5a0079fea2 Changing method in MCP notifications (#1684)
- Changing the codex/event type
2025-07-26 10:35:49 -07:00
pakrym-oai
7ee87123a6 Optionally run using user profile (#1678) 2025-07-25 11:45:23 -07:00
Michael Bolin
7af9cedbd7 fix: create separate test_support crates to eliminate #[allow(dead_code)] (#1667)
Because of a quirk of how implementation tests work in Rust, we had a
number of `#[allow(dead_code)]` annotations that were misleading because
the functions _were_ being used, just not by all integration tests in a
`tests/` folder, so when compiling the test that did not use the
function, clippy would complain that it was unused.

This fixes things by create a "test_support" crate under the `tests/`
folder that is imported as a dev dependency for the respective crate.
2025-07-24 12:19:46 -07:00
aibrahim-oai
b4ab7c1b73 Flaky CI fix (#1647)
Flushing before sending `TaskCompleteEvent` and ending the submission
loop to avoid race conditions.
2025-07-23 15:03:26 -07:00
Gabriel Peal
084236f717 Add call_id to patch approvals and elicitations (#1660)
Builds on https://github.com/openai/codex/pull/1659 and adds call_id to
a few more places for the same reason.
2025-07-23 15:55:35 -04:00
Gabriel Peal
bc944e77f5 Improve messages emitted for exec failures (#1659)
1. Emit call_id to exec approval elicitations for mcp client convenience
2. Remove the `-retry` from the call id for the same reason as above but
upstream the reset behavior to the mcp client
2025-07-23 14:43:53 -04:00
aibrahim-oai
01c0896f0f Adding interrupt Support to MCP (#1646) 2025-07-22 20:33:49 +00:00
pakrym-oai
6d82907082 Add support for custom base instructions (#1645)
Allows providing custom instructions file as a config parameter and
custom instruction text via MCP tool call.
2025-07-22 09:42:22 -07:00
Gabriel Peal
710f728124 Add an elicitation for approve patch and refactor tool calls (#1642)
1. Added an elicitation for `approve-patch` which is very similar to
`approve-exec`.
2. Extracted both elicitations to their own files to prevent
`codex_tool_runner` from blowing up in size.
2025-07-22 02:58:41 -04:00
Dylan
18b2b30841 [mcp-server] Add reply tool call (#1643)
## Summary
Adds a new mcp tool call, `codex-reply`, so we can continue existing
sessions. This is a first draft and does not yet support sessions from
previous processes.

## Testing
- [x] tested with mcp client
2025-07-21 21:01:56 -07:00
Michael Bolin
d49d802b06 test: add integration test for MCP server (#1633)
This PR introduces a single integration test for `cargo mcp`, though it
also introduces a number of reusable components so that it should be
easier to introduce more integration tests going forward.

The new test is introduced in `codex-rs/mcp-server/tests/elicitation.rs`
and the reusable pieces are in `codex-rs/mcp-server/tests/common`.

The test itself verifies new functionality around elicitations
introduced in https://github.com/openai/codex/pull/1623 (and the fix
introduced in https://github.com/openai/codex/pull/1629) by doing the
following:

- starts a mock model provider with canned responses for
`/v1/chat/completions`
- starts the MCP server with a `config.toml` to use that model provider
(and `approval_policy = "untrusted"`)
- sends the `codex` tool call which causes the mock model provider to
request a shell call for `git init`
- the MCP server sends an elicitation to the client to approve the
request
- the client replies to the elicitation with `"approved"`
- the MCP server runs the command and re-samples the model, getting a
`"finish_reason": "stop"`
- in turn, the MCP server sends the final response to the original
`codex` tool call
- verifies that `git init` ran as expected

To test:

```
cargo test shell_command_approval_triggers_elicitation
```

In writing this test, I discovered that `ExecApprovalResponse` does not
conform to `ElicitResult`, so I added a TODO to fix that, since I think
that should be updated in a separate PR. As it stands, this PR does not
update any business logic, though it does make a number of members of
the `mcp-server` crate `pub` so they can be used in the test.

One additional learning from this PR is that
`std::process::Command::cargo_bin()` from the `assert_cmd` trait is only
available for `std::process::Command`, but we really want to use
`tokio::process::Command` so that everything is async and we can
leverage utilities like `tokio::time::timeout()`. The trick I came up
with was to use `cargo_bin()` to locate the program, and then to use
`std::process::Command::get_program()` when constructing the
`tokio::process::Command`.
2025-07-21 10:27:07 -07:00
Michael Bolin
8a6c6cee88 fix: address review feedback on #1621 and #1623 (#1631)
- formalizes `ExecApprovalElicitRequestParams`
- adds some defensive logic when messages fail to parse
- fixes a typo in a comment
2025-07-20 14:42:11 -07:00
Gabriel Peal
8b590105de Don't drop sessions on elicitation responses (#1629) 2025-07-20 13:31:19 -04:00
Michael Bolin
018003e52f feat: leverage elicitations in the MCP server (#1623)
This updates the MCP server so that if it receives an
`ExecApprovalRequest` from the `Codex` session, it in turn sends an [MCP
elicitation](https://modelcontextprotocol.io/specification/draft/client/elicitation)
to the client to ask for the approval decision. Upon getting a response,
it forwards the client's decision via `Op::ExecApproval`.

Admittedly, we should be doing the same thing for
`ApplyPatchApprovalRequest`, but this is our first time experimenting
with elicitations, so I'm inclined to defer wiring that code path up
until we feel good about how this one works.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1623).
* __->__ #1623
* #1622
* #1621
* #1620
2025-07-19 01:32:03 -04:00
Michael Bolin
11fd3123be chore: introduce OutgoingMessageSender (#1622)
Previous to this change, `MessageProcessor` had a
`tokio::sync::mpsc::Sender<JSONRPCMessage>` as an abstraction for server
code to send a message down to the MCP client. Because `Sender` is cheap
to `clone()`, it was straightforward to make it available to tasks
scheduled with `tokio::task::spawn()`.

This worked well when we were only sending notifications or responses
back down to the client, but we want to add support for sending
elicitations in #1623, which means that we need to be able to send
_requests_ to the client, and now we need a bit of centralization to
ensure all request ids are unique.

To that end, this PR introduces `OutgoingMessageSender`, which houses
the existing `Sender<OutgoingMessage>` as well as an `AtomicI64` to mint
out new, unique request ids. It has methods like `send_request()` and
`send_response()` so that callers do not have to deal with
`JSONRPCMessage` directly, as having to set the `jsonrpc` for each
message was a bit tedious (this cleans up `codex_tool_runner.rs` quite a
bit).

We do not have `OutgoingMessageSender` implement `Clone` because it is
important that the `AtomicI64` is shared across all users of
`OutgoingMessageSender`. As such, `Arc<OutgoingMessageSender>` must be
used instead, as it is frequently shared with new tokio tasks.

As part of this change, we update `message_processor.rs` to embrace
`await`, though we must be careful that no individual handler blocks the
main loop and prevents other messages from being handled.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1622).
* #1623
* __->__ #1622
* #1621
* #1620
2025-07-19 00:30:56 -04:00
Michael Bolin
e78ec00e73 chore: support MCP schema 2025-06-18 (#1621)
This updates the schema in `generate_mcp_types.py` from `2025-03-26` to
`2025-06-18`, regenerates `mcp-types/src/lib.rs`, and then updates all
the code that uses `mcp-types` to honor the changes.

Ran

```
npx @modelcontextprotocol/inspector just codex mcp
```

and verified that I was able to invoke the `codex` tool, as expected.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1621).
* #1623
* #1622
* __->__ #1621
2025-07-19 00:09:34 -04:00
aibrahim-oai
2bd3314886 support deltas in core (#1587)
- Added support for message and reasoning deltas
- Skipped adding the support in the cli and tui for later
- Commented a failing test (wrong merge) that needs fix in a separate
PR.

Side note: I think we need to disable merge when the CI don't pass.
2025-07-16 15:11:18 -07:00
Michael Bolin
94f5cad895 fix: when invoking Codex via MCP, use the request id as the Submission id (#1554)
Small quality-of-life improvement when using `codex mcp`.
2025-07-12 16:22:02 -07:00
Rene Leonhardt
82b0cebe8b chore(rs): update dependencies (#1494)
### Chores
- Update cargo dependencies
- Remove unused cargo dependencies
- Fix clippy warnings
- Update Dockerfile (package.json requires node 22)
- Let Dependabot update bun, cargo, devcontainers, docker,
github-actions, npm (nix still not supported)

### TODO
- Upgrade dependencies with breaking changes

```shell
$ cargo update --verbose
   Unchanged crossterm v0.28.1 (available: v0.29.0)
   Unchanged schemars v0.8.22 (available: v1.0.4)
```
2025-07-10 11:08:16 -07:00
Michael Bolin
e0c08cea4f feat: add support for --sandbox flag (#1476)
On a high-level, we try to design `config.toml` so that you don't have
to "comment out a lot of stuff" when testing different options.

Previously, defining a sandbox policy was somewhat at odds with this
principle because you would define the policy as attributes of
`[sandbox]` like so:

```toml
[sandbox]
mode = "workspace-write"
writable_roots = [ "/tmp" ]
```

but if you wanted to temporarily change to a read-only sandbox, you
might feel compelled to modify your file to be:

```toml
[sandbox]
mode = "read-only"
# mode = "workspace-write"
# writable_roots = [ "/tmp" ]
```

Technically, commenting out `writable_roots` would not be strictly
necessary, as `mode = "read-only"` would ignore `writable_roots`, but
it's still a reasonable thing to do to keep things tidy.

Currently, the various values for `mode` do not support that many
attributes, so this is not that hard to maintain, but one could imagine
this becoming more complex in the future.

In this PR, we change Codex CLI so that it no longer recognizes
`[sandbox]`. Instead, it introduces a top-level option, `sandbox_mode`,
and `[sandbox_workspace_write]` is used to further configure the sandbox
when when `sandbox_mode = "workspace-write"` is used:

```toml
sandbox_mode = "workspace-write"

[sandbox_workspace_write]
writable_roots = [ "/tmp" ]
```

This feels a bit more future-proof in that it is less tedious to
configure different sandboxes:

```toml
sandbox_mode = "workspace-write"

[sandbox_read_only]
# read-only options here...

[sandbox_workspace_write]
writable_roots = [ "/tmp" ]

[sandbox_danger_full_access]
# danger-full-access options here...
```

In this scheme, you never need to comment out the configuration for an
individual sandbox type: you only need to redefine `sandbox_mode`.

Relatedly, previous to this change, a user had to do `-c
sandbox.mode=read-only` to change the mode on the command line. With
this change, things are arguably a bit cleaner because the equivalent
option is `-c sandbox_mode=read-only` (and now `-c
sandbox_workspace_write=...` can be set separately).

Though more importantly, we introduce the `-s/--sandbox` option to the
CLI, which maps directly to `sandbox_mode` in `config.toml`, making
config override behavior easier to reason about. Moreover, as you can
see in the updates to the various Markdown files, it is much easier to
explain how to configure sandboxing when things like `--sandbox
read-only` can be used as an example.

Relatedly, this cleanup also made it straightforward to add support for
a `sandbox` option for Codex when used as an MCP server (see the changes
to `mcp-server/src/codex_tool_config.rs`).

Fixes https://github.com/openai/codex/issues/1248.
2025-07-07 22:31:30 -07:00
Michael Bolin
fcfe43c7df feat: show number of tokens remaining in UI (#1388)
When using the OpenAI Responses API, we now record the `usage` field for
a `"response.completed"` event, which includes metrics about the number
of tokens consumed. We also introduce `openai_model_info.rs`, which
includes current data about the most common OpenAI models available via
the API (specifically `context_window` and `max_output_tokens`). If
Codex does not recognize the model, you can set `model_context_window`
and `model_max_output_tokens` explicitly in `config.toml`.

When then introduce a new event type to `protocol.rs`, `TokenCount`,
which includes the `TokenUsage` for the most recent turn.

Finally, we update the TUI to record the running sum of tokens used so
the percentage of available context window remaining can be reported via
the placeholder text for the composer:

![Screenshot 2025-06-25 at 11 20
55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49)

We could certainly get much fancier with this (such as reporting the
estimated cost of the conversation), but for now, we are just trying to
achieve feature parity with the TypeScript CLI.

Though arguably this improves upon the TypeScript CLI, as the TypeScript
CLI uses heuristics to estimate the number of tokens used rather than
using the `usage` information directly:


296996d74e/codex-cli/src/utils/approximate-tokens-used.ts (L3-L16)

Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
Michael Bolin
72082164c1 chore: rename AskForApproval::UnlessAllowListed to AskForApproval::UnlessTrusted (#1385)
We could just rename to `Untrusted` instead of `UnlessTrusted`, but I
think `AskForApproval::UnlessTrusted` reads a bit better.
2025-06-25 12:26:13 -07:00
Michael Bolin
86d5a9d80d chore: rename unless-allow-listed to untrusted (#1378)
For the `approval_policy` config option, renames `unless-allow-listed`
to `untrusted`. In general, when it comes to exec'ing commands, I think
"trusted" is a more accurate term than "safe."

Also drops the `AskForApproval::AutoEdit` variant, as we were not really
making use of it, anyway.

Fixes https://github.com/openai/codex/issues/1250.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1378).
* #1379
* __->__ #1378
2025-06-24 22:19:21 -07:00