- MCP server: add send-user-message tool to send user input to a running
Codex session
- Added an integration tests for the happy and sad paths
Changes:
• Add tool definition and schema.
• Expose tool in capabilities.
• Route and handle tool requests with validation.
• Tests for success, bad UUID, and missing session.
follow‑ups
• Listen path not implemented yet; the tool is present but marked “don’t
use yet” in code comments.
• Session run flag reset: clear running_session_id_set appropriately
after turn completion/errors.
This is the third PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
- Add operation to summarize the context so far.
- The operation runs a compact task that summarizes the context.
- The operation clear the previous context to free the context window
- The operation didn't use `run_task` to avoid corrupting the session
- Add /compact in the tui
https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c
- Expose mcp_protocol from mcp-server for reuse in tests and callers.
- In MessageProcessor, detect structured ToolCallRequestParams in
tools/call and forward to a new handler.
- Add handle_new_tool_calls scaffold (returns error for now).
- Test helper: add send_send_user_message_tool_call to McpProcess to
send ConversationSendMessage requests;
This is the second PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
# Summary
- Align MCP server responses with mcp_types by emitting [CallToolResult,
RequestId] instead of an object.
Update send-message result to a tagged enum: Ok or Error { message }.
# Why
Protocol compliance with current MCP schema.
# Tests
- Updated assertions in mcp_protocol.rs for create/stream/send/list and
error cases.
This is the first PR in a stack.
Stack:
Final: #1686
Intermediate: #1751
First: #1750
This delays the call to insert_history_lines until a redraw is
happening. Crucially, the new lines are inserted _after the viewport is
resized_. This results in fewer stray blank lines below the viewport
when modals (e.g. user approval) are closed.
when we enabled KKP in https://github.com/openai/codex/pull/1743, we
started receiving keyup events, but didn't expect them anywhere in our
code. for now, just don't dispatch them at all.
At 550 lines, `exec.rs` was a bit large. In particular, I found it hard
to locate the Seatbelt-related code quickly without a file with
`seatbelt` in the name, so this refactors things so:
- `spawn_command_under_seatbelt()` and dependent code moves to a new
`seatbelt.rs` file
- `spawn_child_async()` and dependent code moves to a new `spawn.rs`
file
This is a follow-up to https://github.com/openai/codex/pull/1705, as
that PR inadvertently lost the logic where `PatchApplyBeginEvent` and
`PatchApplyEndEvent` events were sent when patches were auto-approved.
Though as part of this fix, I believe this also makes an important
safety fix to `assess_patch_safety()`, as there was a case that returned
`SandboxType::None`, which arguably is the thing we were trying to avoid
in #1705.
On a high level, we want there to be only one codepath where
`apply_patch` happens, which should be unified with the patch to run
`exec`, in general, so that sandboxing is applied consistently for both
cases.
Prior to this change, `apply_patch()` in `core` would either:
* exit early, delegating to `exec()` to shell out to `apply_patch` using
the appropriate sandbox
* proceed to run the logic for `apply_patch` in memory
549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)
In this implementation, only the latter would dispatch
`PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would
dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the
`apply_patch` call (or, more specifically, the `codex
--codex-run-as-apply-patch PATCH` call).
To unify things in this PR, we:
* Eliminate the back half of the `apply_patch()` function, and instead
have it also return with `DelegateToExec`, though we add an extra field
to the return value, `user_explicitly_approved_this_action`.
* In `codex.rs` where we process `DelegateToExec`, we use
`SandboxType::None` when `user_explicitly_approved_this_action` is
`true`. This means **we no longer run the apply_patch logic in memory**,
as we always `exec()`. (Note this is what allowed us to delete so much
code in `apply_patch.rs`.)
* In `codex.rs`, we further update `notify_exec_command_begin()` and
`notify_exec_command_end()` to take additional fields to determine what
type of notification to send: `ExecCommand` or `PatchApply`.
Admittedly, this PR also drops some of the functionality about giving
the user the opportunity to expand the set of writable roots as part of
approving the `apply_patch` command. I'm not sure how much that was
used, and we should probably rethink how that works as we are currently
tidying up the protocol to the TUI, in general.
this fixes a couple of panics that would happen when trying to render
something larger than the terminal, or insert history lines when the top
of the viewport is at y=0.
the git tests were failing on my local machine due to gpg signing config
in my ~/.gitconfig. tests should not be affected by ~/.gitconfig, so
configure them to ignore it.
Simplify and improve many UI elements.
* Remove all-around borders in most places. These interact badly with
terminal resizing and look heavy. Prefer left-side-only borders.
* Make the viewport adjust to the size of its contents.
* <kbd>/</kbd> and <kbd>@</kbd> autocomplete boxes appear below the
prompt, instead of above it.
* Restyle the keyboard shortcut hints & move them to the left.
* Restyle the approval dialog.
* Use synchronized rendering to avoid flashing during rerenders.
https://github.com/user-attachments/assets/96f044af-283b-411c-b7fc-5e6b8a433c20
<img width="1117" height="858" alt="Screenshot 2025-07-30 at 5 29 20 PM"
src="https://github.com/user-attachments/assets/0cc0af77-8396-429b-b6ee-9feaaccdbee7"
/>
The goal of this change is to try an experiment where we try to get AI
to take on more of the code review load. The idea is that once you
believe your PR is ready for review, please add the `codex-rust-review`
label (as opposed to the `codex-review` label).
Admittedly the corresponding prompt currently represents my personal
biases in terms of code review, but we should massage it over time to
represent the team's preferences.
Proof of concept for a resizable viewport.
The general approach here is to duplicate the `Terminal` struct from
ratatui, but with our own logic. This is a "light fork" in that we are
still using all the base ratatui functions (`Buffer`, `Widget` and so
on), but we're doing our own bookkeeping at the top level to determine
where to draw everything.
This approach could use improvement—e.g, when the window is resized to a
smaller size, if the UI wraps, we don't correctly clear out the
artifacts from wrapping. This is possible with a little work (i.e.
tracking what parts of our UI would have been wrapped), but this
behavior is at least at par with the existing behavior.
https://github.com/user-attachments/assets/4eb17689-09fd-4daa-8315-c7ebc654986d
cc @joshka who might have Thoughts™
Building on the work of https://github.com/openai/codex/pull/1702, this
changes how a shell call to `apply_patch` is handled.
Previously, a shell call to `apply_patch` was always handled in-process,
never leveraging a sandbox. To determine whether the `apply_patch`
operation could be auto-approved, the
`is_write_patch_constrained_to_writable_paths()` function would check if
all the paths listed in the paths were writable. If so, the agent would
apply the changes listed in the patch.
Unfortunately, this approach afforded a loophole: symlinks!
* For a soft link, we could fix this issue by tracing the link and
checking whether the target is in the set of writable paths, however...
* ...For a hard link, things are not as simple. We can run `stat FILE`
to see if the number of links is greater than 1, but then we would have
to do something potentially expensive like `find . -inum <inode_number>`
to find the other paths for `FILE`. Further, even if this worked, this
approach runs the risk of a
[TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
race condition, so it is not robust.
The solution, implemented in this PR, is to take the virtual execution
of the `apply_patch` CLI into an _actual_ execution using `codex
--codex-run-as-apply-patch PATCH`, which we can run under the sandbox
the user specified, just like any other `shell` call.
This, of course, assumes that the sandbox prevents writing through
symlinks as a mechanism to write to folders that are not in the writable
set configured by the sandbox. I verified this by testing the following
on both Mac and Linux:
```shell
#!/usr/bin/env bash
set -euo pipefail
# Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
# Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
# EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
echo "SANDBOX_DIR: $SANDBOX_DIR"
echo "EXPLOIT_DIR: $EXPLOIT_DIR"
cleanup() {
# Only remove if it looks sane and still exists
[[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
[[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
}
trap cleanup EXIT
echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
# Drop the -s to test hard links.
ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
cat "${SANDBOX_DIR}/link-to-original.txt"
if [[ "$(uname)" == "Linux" ]]; then
SANDBOX_SUBCOMMAND=landlock
else
SANDBOX_SUBCOMMAND=seatbelt
fi
# Attempt the exploit
cd "${SANDBOX_DIR}"
codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
cat "${EXPLOIT_DIR}/original.txt"
```
Admittedly, this change merits a proper integration test, but I think I
will have to do that in a follow-up PR.
Adds a `CodexAuth` type that encapsulates information about available
auth modes and logic for refreshing the token.
Changes `Responses` API to send requests to different endpoints based on
the auth type.
Updates login_with_chatgpt to support API-less mode and skip the key
exchange.
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
see
[discussion](https://github.com/rhysd/tui-textarea/issues/51#issuecomment-3021191712),
it's surprising that ^U behaves this way. IMO the undo/redo
functionality in tui-textarea isn't good enough to be worth preserving,
but if we do bring it back it should probably be on C-z / C-S-z / C-y.
Perhaps there was an intention to make the login screen prettier, but it
feels quite silly right now to just have a screen that says "press q",
so replace it with something that lets the user directly login without
having to quit the app.
<img width="1283" height="635" alt="Screenshot 2025-07-28 at 2 54 05 PM"
src="https://github.com/user-attachments/assets/f19e5595-6ef9-4a2d-b409-aa61b30d3628"
/>
## Summary
Per the [latest MCP
spec](https://modelcontextprotocol.io/specification/2025-06-18/basic#meta),
the `_meta` field is reserved for metadata. In the [Typescript
Schema](0695a497eb/schema/2025-06-18/schema.ts (L37-L40)),
`progressToken` is defined as a value to be attached to subsequent
notifications for that request.
The
[CallToolRequestParams](0695a497eb/schema/2025-06-18/schema.ts (L806-L817))
extends this definition but overwrites the params field. This ambiguity
makes our generated type definitions tricky, so I'm going to skip
`progressToken` field for now and just send back the `requestId`
instead.
In a future PR, we can clarify, update our `generate_mcp_types.py`
script, and update our progressToken logic accordingly.
## Testing
- [x] Added unit tests
- [x] Manually tested with mcp client
(Hopefully) temporary solution to the invisible approvals problem -
prints commands to history when they need approval and then also prints
the result of the approval. In the near future we should be able to do
some fancy stuff with updating commands before writing them to permanent
history.
Also, ctr-c while in the approval modal now acts as esc (aborts command)
and puts the TUI in the state where one additional ctr-c will exit.
This is a straight refactor, moving apply-patch-related code from
`codex.rs` and into the new `apply_patch.rs` file. The only "logical"
change is inlining `#[allow(clippy::unwrap_used)]` instead of declaring
`#![allow(clippy::unwrap_used)]` at the top of the file (which is
currently the case in `codex.rs`).
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1703).
* #1705
* __->__ #1703
* #1702
* #1698
* #1697
This introduces some special behavior to the CLIs that are using the
`codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
it will run as if `apply_patch arg2` were invoked. This is important
because it means we can do things like:
```
SANDBOX_TYPE=landlock # or seatbelt for macOS
codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
```
which gives us a way to run `apply_patch` while ensuring it adheres to
the sandbox the user specified.
While it would be nice to use the `arg0` trick like we are currently
doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
for the underlying command when running under `/usr/bin/sandbox-exec`,
so it will not work for us in this case.
Admittedly, we could have also supported this via a custom environment
variable (e.g., `CODEX_ARG0`), but since environment variables are
inherited by child processes, that seemed like a potentially leakier
abstraction.
This change, as well as our existing reliance on checking `arg0`, place
additional requirements on those who include `codex-core`. Its
`README.md` has been updated to reflect this.
While we could have just added an `apply-patch` subcommand to the
`codex` multitool CLI, that would not be sufficient for the standalone
`codex-exec` CLI, which is something that we distribute as part of our
GitHub releases for those who know they will not be using the TUI and
therefore prefer to use a slightly smaller executable:
https://github.com/openai/codex/releases/tag/rust-v0.10.0
To that end, this PR adds an integration test to ensure that the
`--codex-run-as-apply-patch` option works with the standalone
`codex-exec` CLI.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
* #1705
* #1703
* __->__ #1702
* #1698
* #1697
The overall idea here is: skip ratatui for writing into scrollback,
because its primitives are wrong. We want to render full lines of text,
that will be wrapped natively by the terminal, and which we never plan
to update using ratatui (so the `Buffer` struct is overhead and in fact
an inhibition).
Instead, we use ANSI scrolling regions (link reference doc to come).
Essentially, we:
1. Define a scrolling region that extends from the top of the prompt
area all the way to the top of scrollback
2. Scroll that region up by N < (screen_height - viewport_height) lines,
in this PR N=1
3. Put our cursor at the top of the newly empty region
4. Print out our new text like normal
The terminal interactions here (write_spans and its dependencies) are
mostly extracted from ratatui.