Commit Graph

181 Commits

Author SHA1 Message Date
jif-oai
3ddd4d47d0 fix: lagged output in unified_exec buffer (#4992)
Handle `Lagged` error when the broadcast buffer of the unified_exec is
full
2025-10-09 16:06:07 +00:00
jif-oai
ca6a0358de bug: sandbox denied error logs (#4874)
Check on STDOUT / STDERR or aggregated output for some logs when sanbox
is denied
2025-10-09 16:01:01 +00:00
Gabriel Peal
d3820f4782 [MCP] Add an enabled config field (#4917)
This lets users more easily toggle MCP servers.
2025-10-08 16:24:51 -04:00
jif-oai
687a13bbe5 feat: truncate on compact (#4942)
Truncate the message during compaction if it is just too large
Do it iteratively as tokenization is basically free on server-side
2025-10-08 18:11:08 +01:00
jif-oai
f52320be86 feat: grep_files as a tool (#4820)
Add `grep_files` to be able to perform more action in parallel
2025-10-08 11:02:50 +01:00
Gabriel Peal
a43ae86b6c [MCP] Add support for streamable http servers with codex mcp add and replace bearer token handling (#4904)
1. You can now add streamable http servers via the CLI
2. As part of this, I'm also changing the existing bearer_token plain
text config field with ane env var

```
mcp add github --url https://api.githubcopilot.com/mcp/ --bearer-token-env-var=GITHUB_PAT
```
2025-10-07 23:21:37 -04:00
dedrisian-oai
b016a3e7d8 Remove instruction hack for /review (#4896)
We use to put the review prompt in the first user message as well to
bypass statsig overrides, but now that's been resolved and instructions
are being respected, so we're duplicating the review instructions.
2025-10-07 12:47:00 -07:00
jif-oai
226215f36d feat: list_dir tool (#4817)
Add a tool to list_dir. It is useful because we can mark it as
non-mutating and so use it in parallel
2025-10-07 19:33:19 +01:00
jif-oai
338c2c873c bug: fix flaky test (#4878)
Fix flaky test by warming up the tools
2025-10-07 19:32:49 +01:00
pakrym-oai
35a770e871 Simplify request body assertions (#4845)
We'll have a lot more test like these
2025-10-07 09:56:39 +01:00
pakrym-oai
a90a58f7a1 Trim double Total output lines (#4787) 2025-10-05 16:41:55 -07:00
pakrym-oai
b2d81a7cac Make output assertions more explicit (#4784)
Match using precise regexes.
2025-10-05 16:01:38 -07:00
pakrym-oai
191d620707 Use response helpers when mounting SSE test responses (#4783)
## Summary
- replace manual wiremock SSE mounts in the compact suite with the
shared response helpers
- simplify the exec auth_env integration test by using the
mount_sse_once_match helper
- rely on mount_sse_sequence plus server request collection to replace
the bespoke SeqResponder utility in tests

## Testing
- just fmt

------
https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
2025-10-05 21:58:16 +00:00
pakrym-oai
5c42419b02 Use assert_matches (#4756)
assert_matches is soon to be in std but is experimental for now.
2025-10-05 21:12:31 +00:00
pakrym-oai
aecbe0f333 Add helper for response created SSE events in tests (#4758)
## Summary
- add a reusable `ev_response_created` helper that builds
`response.created` SSE events for integration tests
- update the exec and core integration suites to use the new helper
instead of repeating manual JSON literals
- keep the streaming fixtures consistent by relying on the shared helper
in every touched test

## Testing
- `just fmt`


------
https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
2025-10-05 21:11:43 +00:00
jif-oai
dc3c6bf62a feat: parallel tool calls (#4663)
Add parallel tool calls. This is configurable at model level and tool
level
2025-10-05 16:10:49 +00:00
Dylan
3203862167 chore: update tool config (#4755)
## Summary
Updates tool config for gpt-5-codex

## Test Plan
- [x] Ran locally
- [x]  Updated unit tests
2025-10-04 22:47:26 -07:00
pakrym-oai
06853d94f0 Use wait_for_event helpers in tests (#4753)
## Summary
- replace manual event polling loops in several core test suites with
the shared wait_for_event helpers
- keep prior assertions intact by using closure captures for stateful
expectations, including plan updates, patch lifecycles, and review flow
checks
- rely on wait_for_event_with_timeout where longer waits are required,
simplifying timeout handling

## Testing
- just fmt


------
https://chatgpt.com/codex/tasks/task_i_68e1d58582d483208febadc5f90dd95e
2025-10-04 22:04:05 -07:00
Ahmed Ibrahim
cc2f4aafd7 Add truncation hint on truncated exec output. (#4740)
When truncating output, add a hint of the total number of lines
2025-10-05 03:29:07 +00:00
Dylan
4764fc1ee7 feat: Freeform apply_patch with simple shell output (#4718)
## Summary
This PR is an alternative approach to #4711, but instead of changing our
storage, parses out shell calls in the client and reserializes them on
the fly before we send them out as part of the request.

What this changes:
1. Adds additional serialization logic when the
ApplyPatchToolType::Freeform is in use.
2. Adds a --custom-apply-patch flag to enable this setting on a
session-by-session basis.

This change is delicate, but is not meant to be permanent. It is meant
to be the first step in a migration:
1. (This PR) Add in-flight serialization with config
2. Update model_family default
3. Update serialization logic to store turn outputs in a structured
format, with logic to serialize based on model_family setting.
4. Remove this rewrite in-flight logic.

## Test Plan
- [x] Additional unit tests added
- [x] Integration tests added
- [x] Tested locally
2025-10-04 19:16:36 -07:00
Ahmed Ibrahim
90ef94d3b3 Surface context window error to the client (#4675)
In the past, we were treating `input exceeded context window` as a
streaming error and retrying on it. Retrying on it has no point because
it won't change the behavior. In this PR, we surface the error to the
client without retry and also send a token count event to indicate that
the context window is full.

<img width="650" height="125" alt="image"
src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5"
/>
2025-10-05 01:40:06 +00:00
Ahmed Ibrahim
d7acd146fb fix: exec commands that blows up context window. (#4706)
We truncate the output of exec commands to not blow the context window.
However, some cases we weren't doing that. This caused reports of people
with 76% context window left facing `input exceeded context window`
which is weird.
2025-10-04 11:49:56 -07:00
iceweasel-oai
de8d77274a set gpt-5 as default model for Windows users (#4676)
Codex isn’t great yet on Windows outside of WSL, and while we’ve merged
https://github.com/openai/codex/pull/4269 to reduce the repetitive
manual approvals on readonly commands, we’ve noticed that users seem to
have more issues with GPT-5-Codex than with GPT-5 on Windows.

This change makes GPT-5 the default for Windows users while we continue
to improve the CLI harness and model for GPT-5-Codex on Windows.
2025-10-03 14:00:03 -07:00
Gabriel Peal
1d17ca1fa3 [MCP] Add support for MCP Oauth credentials (#4517)
This PR adds oauth login support to streamable http servers when
`experimental_use_rmcp_client` is enabled.

This PR is large but represents the minimal amount of work required for
this to work. To keep this PR smaller, login can only be done with
`codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
or `codex mcp list` yet. Fingers crossed that this is the last large MCP
PR and that subsequent PRs can be smaller.

Under the hood, credentials are stored using platform credential
managers using the [keyring crate](https://crates.io/crates/keyring).
When the keyring isn't available, it falls back to storing credentials
in `CODEX_HOME/.credentials.json` which is consistent with how other
coding agents handle authentication.

I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
to test the dbus store on linux but did verify that the fallback works.

One quirk is that if you have credentials, during development, every
build will have its own ad-hoc binary so the keyring won't recognize the
reader as being the same as the write so it may ask for the user's
password. I may add an override to disable this or allow
users/enterprises to opt-out of the keyring storage if it causes issues.

<img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
/>
<img width="745" height="486" alt="image"
src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
/>
2025-10-03 13:43:12 -04:00
jif-oai
bfe3328129 Fix flaky test (#4672)
This issue was due to the fact that the timeout is not always sufficient
to have enough character for truncation + a race between synthetic
timeout and process kill
2025-10-03 18:09:41 +01:00
jif-oai
e0b38bd7a2 feat: add beta_supported_tools (#4669)
Gate the new read_file tool behind a new `beta_supported_tools` flag and
only enable it for `gpt-5-codex`
2025-10-03 16:58:03 +00:00
jif-oai
33d3ecbccc chore: refactor tool handling (#4510)
# Tool System Refactor

- Centralizes tool definitions and execution in `core/src/tools/*`:
specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
registry/dispatch (`registry.rs`), and shared context (`context.rs`).
One registry now builds the model-visible tool list and binds handlers.
- Router converts model responses to tool calls; Registry dispatches
with consistent telemetry via `codex-rs/otel` and unified error
handling. Function, Local Shell, MCP, and experimental `unified_exec`
all flow through this path; legacy shell aliases still work.
- Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
make adding tools predictable and testable.

Example: `read_file`
- Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
registered by `build_specs`).
- Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
- E2E test: `core/tests/suite/read_file.rs` validates the tool returns
the requested lines.

## Next steps:
- Decompose `handle_container_exec_with_params` 
- Add parallel tool calls
2025-10-03 13:21:06 +01:00
jif-oai
69cb72f842 chore: sandbox refactor 2 (#4653)
Revert the revert and fix the UI issue
2025-10-03 11:17:39 +01:00
Ahmed Ibrahim
ed5d656fa8 Revert "chore: sanbox extraction" (#4626)
Reverts openai/codex#4286
2025-10-02 21:09:21 +00:00
pakrym-oai
4c566d484a Separate interactive and non-interactive sessions (#4612)
Do not show exec session in VSCode/TUI selector.
2025-10-02 13:06:21 -07:00
pakrym-oai
2f6fb37d72 Support CODEX_API_KEY for codex exec (#4615)
Allows to set API key per invocation of `codex exec`
2025-10-02 09:59:45 -07:00
jif-oai
b8195a17e5 chore: sanbox extraction (#4286)
# Extract and Centralize Sandboxing
- Goal: Improve safety and clarity by centralizing sandbox planning and
execution.
  - Approach:
- Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux)
with run_with_plan.
- Refactor codex.rs to plan-then-execute; handle failures/escalation via
the plan.
- Delegate apply_patch to the codex binary and run it with an empty env
for determinism.
2025-10-01 12:05:12 +01:00
Michael Bolin
5881c0d6d4 fix: remove mcp-types from app server protocol (#4537)
We continue the separation between `codex app-server` and `codex
mcp-server`.

In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.

Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.

We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.

Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
2025-10-01 02:16:26 +00:00
vishnu-oai
04c1782e52 OpenTelemetry events (#2103)
### Title

## otel

Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
that
describe each run: outbound API requests, streamed responses, user
input,
tool-approval decisions, and the result of every tool invocation. Export
is
**disabled by default** so local runs remain self-contained. Opt in by
adding an
`[otel]` table and choosing an exporter.

```toml
[otel]
environment = "staging"   # defaults to "dev"
exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
```

Codex tags every exported event with `service.name = "codex-cli"`, the
CLI
version, and an `env` attribute so downstream collectors can distinguish
dev/staging/prod traffic. Only telemetry produced inside the
`codex_otel`
crate—the events listed below—is forwarded to the exporter.

### Event catalog

Every event shares a common set of metadata fields: `event.timestamp`,
`conversation.id`, `app.version`, `auth_mode` (when available),
`user.account_id` (when available), `terminal.type`, `model`, and
`slug`.

With OTEL enabled Codex emits the following event types (in addition to
the
metadata above):

- `codex.api_request`
  - `cf_ray` (optional)
  - `attempt`
  - `duration_ms`
  - `http.response.status_code` (optional)
  - `error.message` (failures)
- `codex.sse_event`
  - `event.kind`
  - `duration_ms`
  - `error.message` (failures)
  - `input_token_count` (completion only)
  - `output_token_count` (completion only)
  - `cached_token_count` (completion only, optional)
  - `reasoning_token_count` (completion only, optional)
  - `tool_token_count` (completion only)
- `codex.user_prompt`
  - `prompt_length`
  - `prompt` (redacted unless `log_user_prompt = true`)
- `codex.tool_decision`
  - `tool_name`
  - `call_id`
- `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
  - `source` (`config` or `user`)
- `codex.tool_result`
  - `tool_name`
  - `call_id`
  - `arguments`
  - `duration_ms` (execution time for the tool)
  - `success` (`"true"` or `"false"`)
  - `output`

### Choosing an exporter

Set `otel.exporter` to control where events go:

- `none` – leaves instrumentation active but skips exporting. This is
the
  default.
- `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
Specify the
  endpoint, protocol, and headers your collector expects:

  ```toml
  [otel]
  exporter = { otlp-http = {
    endpoint = "https://otel.example.com/v1/logs",
    protocol = "binary",
    headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
  }}
  ```

- `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
and any
  metadata headers:

  ```toml
  [otel]
  exporter = { otlp-grpc = {
    endpoint = "https://otel.example.com:4317",
    headers = { "x-otlp-meta" = "abc123" }
  }}
  ```

If the exporter is `none` nothing is written anywhere; otherwise you
must run or point to your
own collector. All exporters run on a background batch worker that is
flushed on
shutdown.

If you build Codex from source the OTEL crate is still behind an `otel`
feature
flag; the official prebuilt binaries ship with the feature enabled. When
the
feature is disabled the telemetry hooks become no-ops so the CLI
continues to
function without the extra dependencies.

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-09-29 11:30:55 -07:00
Gabriel Peal
3a1be084f9 [MCP] Add experimental support for streamable HTTP MCP servers (#4317)
This PR adds support for streamable HTTP MCP servers when the
`experimental_use_rmcp_client` is enabled.

To set one up, simply add a new mcp server config with the url:
```
[mcp_servers.figma]
url = "http://127.0.0.1:3845/mcp"
```

It also supports an optional `bearer_token` which will be provided in an
authorization header. The full oauth flow is not supported yet.

The config parsing will throw if it detects that the user mixed and
matched config fields (like command + bearer token or url + env).

The best way to review it is to review `core/src` and then
`rmcp-client/src/rmcp_client.rs` first. The rest is tests and
propagating the `Transport` struct around the codebase.

Example with the Figma MCP:
<img width="5084" height="1614" alt="CleanShot 2025-09-26 at 13 35 40"
src="https://github.com/user-attachments/assets/eaf2771e-df3e-4300-816b-184d7dec5a28"
/>
2025-09-26 21:24:01 -04:00
Jeremy Rose
43b63ccae8 update composer + user message styling (#4240)
Changes:

- the composer and user messages now have a colored background that
stretches the entire width of the terminal.
- the prompt character was changed from a cyan `▌` to a bold `›`.
- the "working" shimmer now follows the "dark gray" color of the
terminal, better matching the terminal's color scheme

| Terminal + Background        | Screenshot |
|------------------------------|------------|
| iTerm with dark bg | <img width="810" height="641" alt="Screenshot
2025-09-25 at 11 44 52 AM"
src="https://github.com/user-attachments/assets/1317e579-64a9-4785-93e6-98b0258f5d92"
/> |
| iTerm with light bg | <img width="845" height="540" alt="Screenshot
2025-09-25 at 11 46 29 AM"
src="https://github.com/user-attachments/assets/e671d490-c747-4460-af0b-3f8d7f7a6b8e"
/> |
| iTerm with color bg | <img width="825" height="564" alt="Screenshot
2025-09-25 at 11 47 12 AM"
src="https://github.com/user-attachments/assets/141cda1b-1164-41d5-87da-3be11e6a3063"
/> |
| Terminal.app with dark bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 45 22 AM"
src="https://github.com/user-attachments/assets/93fc4781-99f7-4ee7-9c8e-3db3cd854fe5"
/> |
| Terminal.app with light bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 46 04 AM"
src="https://github.com/user-attachments/assets/19bf6a3c-91e0-447b-9667-b8033f512219"
/> |
| Terminal.app with color bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 45 50 AM"
src="https://github.com/user-attachments/assets/dd7c4b5b-342e-4028-8140-f4e65752bd0b"
/> |
2025-09-26 16:35:56 -07:00
Gabriel Peal
e555a36c6a [MCP] Introduce an experimental official rust sdk based mcp client (#4252)
The [official Rust
SDK](57fc428c57)
has come a long way since we first started our mcp client implementation
5 months ago and, today, it is much more complete than our own
stdio-only implementation.

This PR introduces a new config flag `experimental_use_rmcp_client`
which will use a new mcp client powered by the sdk instead of our own.

To keep this PR simple, I've only implemented the same stdio MCP
functionality that we had but will expand on it with future PRs.

---------

Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-09-26 13:13:37 -04:00
jif-oai
1fc3413a46 ref: state - 2 (#4229)
Extracting tasks in a module and start abstraction behind a Trait (more
to come on this but each task will be tackled in a dedicated PR)
The goal was to drop the ActiveTask and to have a (potentially) set of
tasks during each turn
2025-09-26 13:49:08 +00:00
pakrym-oai
a10403d697 Actually mount sse once (#4264)
Mock server was responding with the same result many times.
2025-09-26 01:17:51 +00:00
pakrym-oai
8e3a048fec Add codex exec testing helpers (#4254)
Add a shortcut to create working directories and run codex exec with
fake server.
2025-09-25 17:12:45 -07:00
Jeremy Rose
4a5f05c136 make tests pass cleanly in sandbox (#4067)
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
2025-09-25 13:11:14 -07:00
pakrym-oai
e85742635f Send text parameter for non-gpt-5 models (#4195)
We had a hardcoded check for gpt-5 before.

Fixes: https://github.com/openai/codex/issues/4181
2025-09-24 22:00:06 +00:00
Ahmed Ibrahim
cb96f4f596 Add Reset in for rate limits (#4111)
- Parse the headers
- Reorganize the struct because it's getting too long
- show the resets at in the tui

<img width="324" height="79" alt="image"
src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d"
/>
2025-09-24 15:31:08 +00:00
Ahmed Ibrahim
8227a5ba1b Send limits when getting rate limited (#4102)
Users need visibility on rate limits when they are rate limited.
2025-09-23 22:56:34 +00:00
pakrym-oai
fdb8dadcae Add exec output-schema parameter (#4079)
Adds structured output to `exec` via the `--structured-output`
parameter.
2025-09-23 13:59:16 -07:00
jif-oai
b84a920067 chore: compact do not modify instructions (#4088)
Keep the developer instruction and insert the summarisation message as a
user message instead
2025-09-23 17:59:17 +01:00
pakrym-oai
76ecbb3d8e Use TestCodex builder in stream retry tests (#4096)
## Summary
- refactor the stream retry integration tests to construct conversations
through `TestCodex`
- remove bespoke config and tempdir setup now handled by the shared
builder

## Testing
- cargo test -p codex-core --test all
stream_error_allows_next_turn::continue_after_stream_error
- cargo test -p codex-core --test all
stream_no_completed::retries_on_early_close

------
https://chatgpt.com/codex/tasks/task_i_68d2b94d83888320bc75a0bc3bd77b49
2025-09-23 08:57:08 -07:00
pakrym-oai
5c7d9e27b1 Add notifier tests (#4064)
Proposal:
1. Use anyhow for tests and avoid unwrap
2. Extract a helper for starting a test instance of codex
2025-09-23 14:25:46 +00:00
Thibault Sottiaux
c93e77b68b feat: update default (#4076)
Changes:
- Default model and docs now use gpt-5-codex. 
- Disables the GPT-5 Codex NUX by default.
- Keeps presets available for API key users.
2025-09-22 20:10:52 -07:00
Ahmed Ibrahim
dd56750612 Change headers and struct of rate limits (#4060) 2025-09-22 21:06:20 +00:00