Commit Graph

185 Commits

Author SHA1 Message Date
pakrym-oai
4c566d484a Separate interactive and non-interactive sessions (#4612)
Do not show exec session in VSCode/TUI selector.
2025-10-02 13:06:21 -07:00
Jeremy Rose
45936f8fbd show "Viewed Image" when the model views an image (#4475)
<img width="1022" height="339" alt="Screenshot 2025-09-29 at 4 22 00 PM"
src="https://github.com/user-attachments/assets/12da7358-19be-4010-a71b-496ede6dfbbf"
/>
2025-10-02 18:36:03 +00:00
jif-oai
b8195a17e5 chore: sanbox extraction (#4286)
# Extract and Centralize Sandboxing
- Goal: Improve safety and clarity by centralizing sandbox planning and
execution.
  - Approach:
- Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux)
with run_with_plan.
- Refactor codex.rs to plan-then-execute; handle failures/escalation via
the plan.
- Delegate apply_patch to the codex binary and run it with an empty env
for determinism.
2025-10-01 12:05:12 +01:00
Michael Bolin
5881c0d6d4 fix: remove mcp-types from app server protocol (#4537)
We continue the separation between `codex app-server` and `codex
mcp-server`.

In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.

Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.

We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.

Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
2025-10-01 02:16:26 +00:00
vishnu-oai
04c1782e52 OpenTelemetry events (#2103)
### Title

## otel

Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
that
describe each run: outbound API requests, streamed responses, user
input,
tool-approval decisions, and the result of every tool invocation. Export
is
**disabled by default** so local runs remain self-contained. Opt in by
adding an
`[otel]` table and choosing an exporter.

```toml
[otel]
environment = "staging"   # defaults to "dev"
exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
```

Codex tags every exported event with `service.name = "codex-cli"`, the
CLI
version, and an `env` attribute so downstream collectors can distinguish
dev/staging/prod traffic. Only telemetry produced inside the
`codex_otel`
crate—the events listed below—is forwarded to the exporter.

### Event catalog

Every event shares a common set of metadata fields: `event.timestamp`,
`conversation.id`, `app.version`, `auth_mode` (when available),
`user.account_id` (when available), `terminal.type`, `model`, and
`slug`.

With OTEL enabled Codex emits the following event types (in addition to
the
metadata above):

- `codex.api_request`
  - `cf_ray` (optional)
  - `attempt`
  - `duration_ms`
  - `http.response.status_code` (optional)
  - `error.message` (failures)
- `codex.sse_event`
  - `event.kind`
  - `duration_ms`
  - `error.message` (failures)
  - `input_token_count` (completion only)
  - `output_token_count` (completion only)
  - `cached_token_count` (completion only, optional)
  - `reasoning_token_count` (completion only, optional)
  - `tool_token_count` (completion only)
- `codex.user_prompt`
  - `prompt_length`
  - `prompt` (redacted unless `log_user_prompt = true`)
- `codex.tool_decision`
  - `tool_name`
  - `call_id`
- `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
  - `source` (`config` or `user`)
- `codex.tool_result`
  - `tool_name`
  - `call_id`
  - `arguments`
  - `duration_ms` (execution time for the tool)
  - `success` (`"true"` or `"false"`)
  - `output`

### Choosing an exporter

Set `otel.exporter` to control where events go:

- `none` – leaves instrumentation active but skips exporting. This is
the
  default.
- `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
Specify the
  endpoint, protocol, and headers your collector expects:

  ```toml
  [otel]
  exporter = { otlp-http = {
    endpoint = "https://otel.example.com/v1/logs",
    protocol = "binary",
    headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
  }}
  ```

- `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
and any
  metadata headers:

  ```toml
  [otel]
  exporter = { otlp-grpc = {
    endpoint = "https://otel.example.com:4317",
    headers = { "x-otlp-meta" = "abc123" }
  }}
  ```

If the exporter is `none` nothing is written anywhere; otherwise you
must run or point to your
own collector. All exporters run on a background batch worker that is
flushed on
shutdown.

If you build Codex from source the OTEL crate is still behind an `otel`
feature
flag; the official prebuilt binaries ship with the feature enabled. When
the
feature is disabled the telemetry hooks become no-ops so the CLI
continues to
function without the extra dependencies.

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-09-29 11:30:55 -07:00
Gabriel Peal
e555a36c6a [MCP] Introduce an experimental official rust sdk based mcp client (#4252)
The [official Rust
SDK](57fc428c57)
has come a long way since we first started our mcp client implementation
5 months ago and, today, it is much more complete than our own
stdio-only implementation.

This PR introduces a new config flag `experimental_use_rmcp_client`
which will use a new mcp client powered by the sdk instead of our own.

To keep this PR simple, I've only implemented the same stdio MCP
functionality that we had but will expand on it with future PRs.

---------

Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-09-26 13:13:37 -04:00
jif-oai
1fc3413a46 ref: state - 2 (#4229)
Extracting tasks in a module and start abstraction behind a Trait (more
to come on this but each task will be tackled in a dedicated PR)
The goal was to drop the ActiveTask and to have a (potentially) set of
tasks during each turn
2025-09-26 13:49:08 +00:00
jif-oai
250b244ab4 ref: full state refactor (#4174)
## Current State Observations
- `Session` currently holds many unrelated responsibilities (history,
approval queues, task handles, rollout recorder, shell discovery, token
tracking, etc.), making it hard to reason about ownership and lifetimes.
- The anonymous `State` struct inside `codex.rs` mixes session-long data
with turn-scoped queues and approval bookkeeping.
- Turn execution (`run_task`) relies on ad-hoc local variables that
should conceptually belong to a per-turn state object.
- External modules (`codex::compact`, tests) frequently poke the raw
`Session.state` mutex, which couples them to implementation details.
- Interrupts, approvals, and rollout persistence all have bespoke
cleanup paths, contributing to subtle bugs when a turn is aborted
mid-flight.

## Desired End State
- Keep a slim `Session` object that acts as the orchestrator and façade.
It should expose a focused API (submit, approvals, interrupts, event
emission) without storing unrelated fields directly.
- Introduce a `state` module that encapsulates all mutable data
structures:
- `SessionState`: session-persistent data (history, approved commands,
token/rate-limit info, maybe user preferences).
- `ActiveTurn`: metadata for the currently running turn (sub-id, task
kind, abort handle) and an `Arc<TurnState>`.
- `TurnState`: all turn-scoped pieces (pending inputs, approval waiters,
diff tracker, review history, auto-compact flags, last agent message,
outstanding tool call bookkeeping).
- Group long-lived helpers/managers into a dedicated `SessionServices`
struct so `Session` does not accumulate "random" fields.
- Provide clear, lock-safe APIs so other modules never touch raw
mutexes.
- Ensure every turn creates/drops a `TurnState` and that
interrupts/finishes delegate cleanup to it.
2025-09-25 12:16:06 +02:00
pakrym-oai
addc946d13 Simplify tool implemetations (#4160)
Use Result<String, FunctionCallError> for all tool handling code and
rely on error propagation instead of creating failed items everywhere.
2025-09-24 17:27:35 +00:00
Ahmed Ibrahim
8227a5ba1b Send limits when getting rate limited (#4102)
Users need visibility on rate limits when they are rate limited.
2025-09-23 22:56:34 +00:00
pakrym-oai
fdb8dadcae Add exec output-schema parameter (#4079)
Adds structured output to `exec` via the `--structured-output`
parameter.
2025-09-23 13:59:16 -07:00
jif-oai
b84a920067 chore: compact do not modify instructions (#4088)
Keep the developer instruction and insert the summarisation message as a
user message instead
2025-09-23 17:59:17 +01:00
pakrym-oai
5c7d9e27b1 Add notifier tests (#4064)
Proposal:
1. Use anyhow for tests and avoid unwrap
2. Extract a helper for starting a test instance of codex
2025-09-23 14:25:46 +00:00
jif-oai
be366a31ab chore: clippy on redundant closure (#4058)
Add redundant closure clippy rules and let Codex fix it by minimising
FQP
2025-09-22 19:30:16 +00:00
Jeremy Rose
19f46439ae timeouts for mcp tool calls (#3959)
defaults to 60sec, overridable with MCP_TOOL_TIMEOUT or on a per-server
basis in the config.
2025-09-22 10:30:59 -07:00
Ahmed Ibrahim
04504d8218 Forward Rate limits to the UI (#3965)
We currently get information about rate limits in the response headers.
We want to forward them to the clients to have better transparency.
UI/UX plans have been discussed and this information is needed.
2025-09-20 21:26:16 -07:00
Ahmed Ibrahim
a7fda70053 Use a unified shell tell to not break cache (#3814)
Currently, we change the tool description according to the sandbox
policy and approval policy. This breaks the cache when the user hits
`/approvals`. This PR does the following:
- Always use the shell with escalation parameter:
- removes `create_shell_tool_for_sandbox` and always uses unified tool
via `create_shell_tool`
- Reject the func call when the model uses escalation parameter when it
cannot.
2025-09-19 00:08:28 +00:00
Michael Bolin
8595237505 fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
Previous to this PR, both of these functions take a single `cwd`:


71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)


71038381aa/codex-rs/core/src/landlock.rs (L16-L23)

whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
this PR).

Added `sandbox_distinguishes_command_and_policy_cwds()` to
`codex-rs/exec/tests/suite/sandbox.rs` to verify this.
2025-09-18 14:37:06 -07:00
dedrisian-oai
62258df92f feat: /review (#3774)
Adds `/review` action in TUI

<img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM"
src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d"
/>
2025-09-18 14:14:16 -07:00
Jeremy Rose
b34e906396 Reland "refactor transcript view to handle HistoryCells" (#3753)
Reland of #3538
2025-09-18 20:55:53 +00:00
jif-oai
277fc6254e chore: use tokio mutex and async function to prevent blocking a worker (#3850)
### Why Use `tokio::sync::Mutex`

`std::sync::Mutex` are not _async-aware_. As a result, they will block
the entire thread instead of just yielding the task. Furthermore they
can be poisoned which is not the case of `tokio` Mutex.
This allows the Tokio runtime to continue running other tasks while
waiting for the lock, preventing deadlocks and performance bottlenecks.

In general, this is preferred in async environment
2025-09-18 18:21:52 +01:00
dedrisian-oai
72733e34c4 Add dev message upon review out (#3758)
Proposal: We want to record a dev message like so:

```
{
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "<user_action>
  <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context>
  <action>review</action>
  <results>
  {findings_str}
  </results>
</user_action>"
        }
      ]
    },
```

Without showing in the chat transcript.

Rough idea, but it fixes issue where the user finishes a review thread,
and asks the parent "fix the rest of the review issues" thinking that
the parent knows about it.

### Question: Why not a tool call?

Because the agent didn't make the call, it was a human. + we haven't
implemented sub-agents yet, and we'll need to think about the way we
represent these human-led tool calls for the agent.
2025-09-16 18:43:32 -07:00
dedrisian-oai
7fe4021f95 Review mode core updates (#3701)
1. Adds the environment prompt (including cwd) to review thread
2. Prepends the review prompt as a user message (temporary fix so the
instructions are not replaced on backend)
3. Sets reasoning to low
4. Sets default review model to `gpt-5-codex`
2025-09-16 13:36:51 -07:00
Dylan
11285655c4 fix: Record EnvironmentContext in SendUserTurn (#3678)
## Summary
SendUserTurn has not been correctly handling updates to policies. While
the tui protocol handles this in `Op::OverrideTurnContext`, the
SendUserTurn should be appending `EnvironmentContext` messages when the
sandbox settings change. MCP client behavior should match the cli
behavior, so we update `SendUserTurn` message to match.

## Testing
- [x] Added prompt caching tests
2025-09-16 11:32:20 -07:00
dedrisian-oai
2aa84b8891 Fix EventMsg Optional (#3604) 2025-09-15 00:34:33 +00:00
pakrym-oai
863d9c237e Include command output when sending timeout to model (#3576)
Being able to see the output helps the model decide how to handle the
timeout.
2025-09-14 14:38:26 -07:00
Ahmed Ibrahim
bbea6bbf7e Handle resuming/forking after compact (#3533)
We need to construct the history different when compact happens. For
this, we need to just consider the history after compact and convert
compact to a response item.

This needs to change and use `build_compact_history` when this #3446 is
merged.
2025-09-14 13:23:31 +00:00
dedrisian-oai
90a0fd342f Review Mode (Core) (#3401)
## 📝 Review Mode -- Core

This PR introduces the Core implementation for Review mode:

- New op `Op::Review { prompt: String }:` spawns a child review task
with isolated context, a review‑specific system prompt, and a
`Config.review_model`.
- `EnteredReviewMode`: emitted when the child review session starts.
Every event from this point onwards reflects the review session.
- `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
finishes or is interrupted, with optional structured findings:

```json
{
  "findings": [
    {
      "title": "<≤ 80 chars, imperative>",
      "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
      "confidence_score": <float 0.0-1.0>,
      "priority": <int 0-3>,
      "code_location": {
        "absolute_file_path": "<file path>",
        "line_range": {"start": <int>, "end": <int>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
  "overall_confidence_score": <float 0.0-1.0>
}
```

## Questions

### Why separate out its own message history?

We want the review thread to match the training of our review models as
much as possible -- that means using a custom prompt, removing user
instructions, and starting a clean chat history.

We also want to make sure the review thread doesn't leak into the parent
thread.

### Why do this as a mode, vs. sub-agents?

1. We want review to be a synchronous task, so it's fine for now to do a
bespoke implementation.
2. We're still unclear about the final structure for sub-agents. We'd
prefer to land this quickly and then refactor into sub-agents without
rushing that implementation.
2025-09-12 23:25:10 +00:00
jif-oai
ea225df22e feat: context compaction (#3446)
## Compact feature:
1. Stops the model when the context window become too large
2. Add a user turn, asking for the model to summarize
3. Build a bridge that contains all the previous user message + the
summary. Rendered from a template
4. Start sampling again from a clean conversation with only that bridge
2025-09-12 13:07:10 -07:00
jif-oai
c6fd056aa6 feat: reasoning effort as optional (#3527)
Allow the reasoning effort to be optional
2025-09-12 12:06:33 -07:00
jif-oai
bba567cee9 bug: fix model save (#3525)
Fix those 2 behaviors:
1. The model does not get saved if we don't CTRL + S
2. The reasoning effort get saved
2025-09-12 10:38:12 -07:00
Michael Bolin
9bbeb75361 feat: include reasoning_effort in NewConversationResponse (#3506)
`ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
2025-09-11 21:04:40 -07:00
Dylan
027944c64e fix: improve handle_sandbox_error timeouts (#3435)
## Summary
Handle timeouts the same way, regardless of approval mode. There's more
to do here, but this is simple and should be zero-regret

## Testing
- [x] existing tests pass
- [x] test locally and verify rollout
2025-09-11 12:09:20 -07:00
Michael Bolin
bec51f6c05 chore: enable clippy::redundant_clone (#3489)
Created this PR by:

- adding `redundant_clone` to `[workspace.lints.clippy]` in
`cargo-rs/Cargol.toml`
- running `cargo clippy --tests --fix`
- running `just fmt`

Though I had to clean up one instance of the following that resulted:

```rust
let codex = codex;
```
2025-09-11 11:59:37 -07:00
Ahmed Ibrahim
674e3d3c90 Add Compact and Turn Context to the rollout items (#3444)
Adding compact and turn context to the rollout items

based on #3440
2025-09-11 18:08:51 +00:00
Ahmed Ibrahim
162e1235a8 Change forking to read the rollout from file (#3440)
This PR changes get history op to get path. Then, forking will use a
path. This will help us have one unified codepath for resuming/forking
conversations. Will also help in having rollout history in order. It
also fixes a bug where you won't see the UI when resuming after forking.
2025-09-10 17:42:54 -07:00
jif-oai
c09ed74a16 Unified execution (#3288)
## Unified PTY-Based Exec Tool

Note: this requires to have this flag in the config:
`use_experimental_unified_exec_tool=true`

- Adds a PTY-backed interactive exec feature (“unified_exec”) with
session reuse via
  session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
- Protocol: introduces ResponseItem::UnifiedExec { session_id,
arguments, timeout_ms }.
- Tools: exposes unified_exec as a function tool (Responses API);
excluded from Chat
  Completions payload while still supported in tool lists.
- Path handling: resolves commands via PATH (or explicit paths), with
UTF‑8/newline‑aware
  truncation (truncate_middle).
- Tests: cover command parsing, path resolution, session
persistence/cleanup, multi‑session
  isolation, timeouts, and truncation behavior.
2025-09-10 17:38:11 -07:00
dedrisian-oai
87654ec0b7 Persist model & reasoning changes (#2799)
Persists `/model` changes across both general and profile-specific
sessions.
2025-09-10 20:53:46 +00:00
Michael Bolin
51d9e05de7 Back out "feat: POSIX unification and snapshot sessions (#3179)" (#3430)
This reverts https://github.com/openai/codex/pull/3179.

#3179 appears to introduce a regression where sourcing dotfiles causes a
bunch of activity in the title bar (and potentially slows things down?)


https://github.com/user-attachments/assets/a68f7fb3-0749-4e0e-a321-2aa6993e01da

Verified this no longer happens after backing out #3179.

Original commit changeset: 62bd0e3d9d
2025-09-10 12:40:24 -07:00
Ahmed Ibrahim
45bd5ca4b9 Move initial history to protocol (#3422)
To fix an edge case of forking then resuming

#3419
2025-09-10 10:17:24 -07:00
Ahmed Ibrahim
43809a454e Introduce rollout items (#3380)
This PR introduces Rollout items. This enable us to rollout eventmsgs
and session meta.

This is mostly #3214 with rebase on main
2025-09-09 23:52:33 +00:00
Michael Bolin
2a76a08a9e fix: include rollout_path in NewConversationResponse (#3352)
Adding the `rollout_path` to the `NewConversationResponse` makes it so a
client can perform subsequent operations on a `(ConversationId,
PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
* #3353
* __->__ #3352
2025-09-09 00:11:48 -07:00
jif-oai
62bd0e3d9d feat: POSIX unification and snapshot sessions (#3179)
## Session snapshot
For POSIX shell, the goal is to take a snapshot of the interactive shell
environment, store it in a session file located in `.codex/` and only
source this file for every command that is run.
As a result, if a snapshot files exist, `bash -lc <CALL>` get replaced
by `bash -c <CALL>`.

This also fixes the issue that `bash -lc` does not source `.bashrc`,
resulting in missing env variables and aliases in the codex session.
## POSIX unification
Unify `bash` and `zsh` shell into a POSIX shell. The rational is that
the tool will not use any `zsh` specific capabilities.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-09-08 18:09:45 -07:00
Gabriel Peal
5eaaf307e1 Generate more typescript types and return conversation id with ConversationSummary (#3219)
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
2025-09-08 17:54:47 -04:00
Gabriel Peal
c8fab51372 Use ConversationId instead of raw Uuids (#3282)
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.

I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
2025-09-07 23:22:25 -04:00
pakrym-oai
0269096229 Move token usage/context information to session level (#3221)
Move context information into the main loop so it can be used to
interrupt the loop or start auto-compaction.
2025-09-06 15:19:23 +00:00
pakrym-oai
5775174ec2 Never store requests (#3212)
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.

Not having the mode to store requests also allows us to simplify the
code.

## Breaking change

The `disable_response_storage` configuration option is removed.
2025-09-05 10:41:47 -07:00
Michael Bolin
bd4fa85507 fix: add callback to map before sending request to fix race condition (#3146)
Last week, I thought I found the smoking gun in our flaky integration
tests where holding these locks could have led to potential deadlock:

- https://github.com/openai/codex/pull/2876
- https://github.com/openai/codex/pull/2878

Yet even after those PRs went in, we continued to see flakinees in our
integration tests! Though with the additional logging added as part of
debugging those tests, I now saw things like:

```
read message from stdout: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) })
notification: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) })
read message from stdout: Request(JSONRPCRequest { id: Integer(0), jsonrpc: "2.0", method: "execCommandApproval", params: Some(Object {"conversation_id": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}) })
writing message to stdin: Response(JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} })
in read_stream_until_notification_message(codex/event/task_complete)
[mcp stderr] 2025-09-04T00:00:59.738585Z  INFO codex_mcp_server::message_processor: <- response: JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} }
[mcp stderr] 2025-09-04T00:00:59.738740Z DEBUG codex_core::codex: Submission sub=Submission { id: "1", op: ExecApproval { id: "0", decision: Approved } }
[mcp stderr] 2025-09-04T00:00:59.738832Z  WARN codex_core::codex: No pending approval found for sub_id: 0
```

That is, a response was sent for a request, but no callback was in place
to handle the response!

This time, I think I may have found the underlying issue (though the
fixes for holding locks for too long may have also been part of it),
which is I found cases where we were sending the request:


234c0a0469/codex-rs/core/src/codex.rs (L597)

before inserting the `Sender` into the `pending_approvals` map (which
has to wait on acquiring a mutex):


234c0a0469/codex-rs/core/src/codex.rs (L598-L601)

so it is possible the request could go out and the client could respond
before `pending_approvals` was updated!

Note this was happening in both `request_command_approval()` and
`request_patch_approval()`, which maps to the sorts of errors we have
been seeing when these integration tests have been flaking on us.

While here, I am also adding some extra logging that prints if inserting
into `pending_approvals` overwrites an entry as opposed to purely
inserting one. Today, a conversation can have only one pending request
at a time, but as we are planning to support parallel tool calls, this
invariant may not continue to hold, in which case we need to revisit
this abstraction.
2025-09-04 07:38:28 -07:00
Ahmed Ibrahim
2b96f9f569 Dividing UserMsgs into categories to send it back to the tui (#3127)
This PR does the following:

- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing

Building on top of #3123

Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
2025-09-04 05:34:50 +00:00
Ahmed Ibrahim
f2036572b6 Replay EventMsgs from Response Items when resuming a session with history. (#3123)
### Overview

This PR introduces the following changes:
	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
2. Ensures that when a session is initialized with initial history, a
vector of EventMsg is sent along with the session configuration. This
allows clients to re-render the UI accordingly.
	3. 	Added integration testing

### Caveats

This implementation does not send every EventMsg that was previously
dispatched to clients. The excluded events fall into two categories:
	•	“Arguably” rolled-out events
Examples include tool calls and apply-patch calls. While these events
are conceptually rolled out, we currently only roll out ResponseItems.
These events are already being handled elsewhere and transformed into
EventMsg before being sent.
	•	Non-rolled-out events
Certain events such as TurnDiff, Error, and TokenCount are not rolled
out at all.

### Future Directions

At present, resuming a session involves maintaining two states:
	•	UI State
Clients can replay most of the important UI from the provided EventMsg
history.
	•	Model State
The model receives the complete session history to reconstruct its
internal state.

This design provides a solid foundation. If, in the future, more precise
UI reconstruction is needed, we have two potential paths:
1. Introduce a third data structure that allows us to derive both
ResponseItems and EventMsgs.
2. Clearly divide responsibilities: the core system ensures the
integrity of the model state, while clients are responsible for
reconstructing the UI.
2025-09-04 04:47:00 +00:00