This fixes an issue where messages sent during the final response stream
would seem to disappear, because the "queued messages" UI wasn't shown
during streaming.
## Summary
- Coerce Windows `workspace-write` configs back to read-only, surface
the forced downgrade in the approvals popup,
and funnel users toward WSL or Full Access.
- Add WSL installation instructions to the Auto preset on Windows while
keeping the preset available for other
platforms.
- Skip the trust-on-first-run prompt on native Windows so new folders
remain read-only without additional
confirmation.
- Expose a structured sandbox policy resolution from config to flag
Windows downgrades and adjust tests (core,
exec, TUI) to reflect the new behavior; provide a Windows-only approvals
snapshot.
## Testing
- cargo fmt
- cargo test -p codex-core
config::tests::add_dir_override_extends_workspace_writable_roots
- cargo test -p codex-exec
suite::resume::exec_resume_preserves_cli_configuration_overrides
- cargo test -p codex-tui
chatwidget::tests::approvals_selection_popup_snapshot
- cargo test -p codex-tui
approvals_popup_includes_wsl_note_for_auto_mode
- cargo test -p codex-tui windows_skips_trust_prompt
- just fix -p codex-core
- just fix -p codex-tui
This PR does the following:
1. Changes `try_refresh_token` to handle the case where the endpoint
returns a response without an `id_token`. The OpenID spec indicates that
this field is optional and clients should not assume it's present.
2. Changes the `attempt_stream_responses` to propagate token refresh
errors rather than silently ignoring them.
3. Fixes a typo in a couple of error messages (unrelated to the above,
but something I noticed in passing) - "reconnect" should be spelled
without a hyphen.
This PR does not implement the additional suggestion from @pakrym-oai
that we should sign out when receiving `refresh_token_expired` from the
refresh endpoint. Leaving this as a follow-on because I'm undecided on
whether this should be implemented in `try_refresh_token` or its
callers.
This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.
The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.
For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.
Here is a screen shot of the approval prompt showing the risk assessment
and summary.
<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
This shows the aggregated (stdout + stderr) buffer regardless of exit
code.
Many commands output useful / relevant info on stdout when returning a
non-zero exit code, or the same on stderr when returning an exit code of
0. Often, useful info is present on both stdout AND stderr. Also, the
model sees both. So it is confusing to see commands listed as "(no
output)" that in fact do have output, just on the stream that doesn't
match the exit status, or to see some sort of trivial output like "Tests
failed" but lacking any information about the actual failure.
As such, always display the aggregated output in the display. Transcript
mode remains unchanged as it was already displaying the text that the
model sees, which seems correct for transcript mode.
1. Adds AgentMessage, Reasoning, WebSearch items.
2. Switches the ResponseItem parsing to use new items and then also emit
3. Removes user-item kind and filters out "special" (environment) user
items when returning to clients.
`ParsedCommand::Read` has a `name` field that attempts to identify the
name of the file being read, but the file may not be in the `cwd` in
which the command is invoked as demonstrated by this existing unit test:
0139f6780c/codex-rs/core/src/parse_command.rs (L250-L260)
As you can see, `tui/Cargo.toml` is the relative path to the file being
read.
This PR introduces a new `path: PathBuf` field to `ParsedCommand::Read`
that attempts to capture this information. When possible, this is an
absolute path, though when relative, it should be resolved against the
`cwd` that will be used to run the command to derive the absolute path.
This should make it easier for clients to provide UI for a "read file"
event that corresponds to the command execution.
This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent`
in the core protocol (`protocol/src/protocol.rs`), which is also what
this field is named on `ExecCommandBeginEvent`. Honestly, I don't love
the name (it sounds like a single command, but it is actually a list of
them), but I don't want to get distracted by a naming discussion right
now.
This also adds `parsed_cmd` to `ExecCommandApprovalParams` in
`codex-rs/app-server-protocol/src/protocol.rs`, so it will be available
via `codex app-server`, as well.
For consistency, I also updated `ExecApprovalElicitRequestParams` in
`codex-rs/mcp-server/src/exec_approval.rs` to include this field under
the name `codex_parsed_cmd`, as that struct already has a number of
special `codex_*` fields. Note this is the code for when Codex is used
as an MCP _server_ and therefore has to conform to the official spec for
an MCP elicitation type.
Note these two types were identical, so it seems clear to standardize on the one in `codex_protocol` and eliminate the `Into` stuff.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5218).
* #5222
* __->__ #5218
Clear the history cursor before checking for duplicate submissions so
sending the same message twice exits history mode. This prevents Up/Down
from staying stuck in history browsing after duplicate sends.
Codex isn’t great yet on Windows outside of WSL, and while we’ve merged
https://github.com/openai/codex/pull/4269 to reduce the repetitive
manual approvals on readonly commands, we’ve noticed that users seem to
have more issues with GPT-5-Codex than with GPT-5 on Windows.
This change makes GPT-5 the default for Windows users while we continue
to improve the CLI harness and model for GPT-5-Codex on Windows.
## Summary
- Factor `load_config_as_toml` into `core::config_loader` so config
loading is reusable across callers.
- Layer `~/.codex/config.toml`, optional `~/.codex/managed_config.toml`,
and macOS managed preferences (base64) with recursive table merging and
scoped threads per source.
## Config Flow
```
Managed prefs (macOS profile: com.openai.codex/config_toml_base64)
▲
│
~/.codex/managed_config.toml │ (optional file-based override)
▲
│
~/.codex/config.toml (user-defined settings)
```
- The loader searches under the resolved `CODEX_HOME` directory
(defaults to `~/.codex`).
- Managed configs let administrators ship fleet-wide overrides via
device profiles which is useful for enforcing certain settings like
sandbox or approval defaults.
- For nested hash tables: overlays merge recursively. Child tables are
merged key-by-key, while scalar or array values replace the prior layer
entirely. This lets admins add or tweak individual fields without
clobbering unrelated user settings.
# Tool System Refactor
- Centralizes tool definitions and execution in `core/src/tools/*`:
specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
registry/dispatch (`registry.rs`), and shared context (`context.rs`).
One registry now builds the model-visible tool list and binds handlers.
- Router converts model responses to tool calls; Registry dispatches
with consistent telemetry via `codex-rs/otel` and unified error
handling. Function, Local Shell, MCP, and experimental `unified_exec`
all flow through this path; legacy shell aliases still work.
- Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
make adding tools predictable and testable.
Example: `read_file`
- Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
registered by `build_specs`).
- Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
- E2E test: `core/tests/suite/read_file.rs` validates the tool returns
the requested lines.
## Next steps:
- Decompose `handle_container_exec_with_params`
- Add parallel tool calls
- prefix command approval reasons with "Reason:"
- show keyboard shortcuts for some ListSelectionItems
- remove "description" lines for approval options, and make the labels
more verbose
- add a spacer line in diff display after the path
and some other minor refactors that go along with the above.
<img width="859" height="508" alt="Screenshot 2025-10-02 at 1 24 50 PM"
src="https://github.com/user-attachments/assets/4fa7ecaf-3d3a-406a-bb4d-23e30ce3e5cf"
/>
We continue the separation between `codex app-server` and `codex
mcp-server`.
In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.
Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.
We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.
Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
Instead of overwriting the contents of the composer when pressing
<kbd>Esc</kbd> when there's a queued message, prepend the queued
message(s) to the composer draft.
## Summary
Introduces a “ghost commit” workflow that snapshots the tree without
touching refs.
1. git commit-tree writes an unreferenced commit object from the current
index, optionally pointing to the current HEAD as its parent.
2. We then stash that commit id and use git restore --source <ghost> to
roll the worktree (and index) back to the recorded snapshot later on.
## Details
- Ghost commits live only as loose objects—we never update branches or
tags—so the repo history stays untouched while still giving us a full
tree snapshot.
- Force-included paths let us stage otherwise ignored files before
capturing the tree.
- Restoration rehydrates both tracked and force-included files while
leaving untracked/ignored files alone.
Adds a "View Stack" to the bottom pane to allow for pushing/popping
bottom panels.
`esc` will go back instead of dismissing.
Benefit: We retain the "selection state" of a parent panel (e.g. the
review panel).
Adds the following options:
1. Review current changes
2. Review a specific commit
3. Review against a base branch (PR style)
4. Custom instructions
<img width="487" height="330" alt="Screenshot 2025-09-20 at 2 11 36 PM"
src="https://github.com/user-attachments/assets/edb0aaa5-5747-47fa-881f-cc4c4f7fe8bc"
/>
---
\+ Adds the following UI helpers:
1. Makes list selection searchable
2. Adds navigation to the bottom pane, so you could add a stack of
popups
3. Basic custom prompt view