Commit Graph

48 Commits

Author SHA1 Message Date
pakrym-oai
14a115d488 Add non_sandbox_test helper (#3880)
Makes tests shorter
2025-09-22 14:50:41 +00:00
Ahmed Ibrahim
04504d8218 Forward Rate limits to the UI (#3965)
We currently get information about rate limits in the response headers.
We want to forward them to the clients to have better transparency.
UI/UX plans have been discussed and this information is needed.
2025-09-20 21:26:16 -07:00
pakrym-oai
881c7978f1 Move responses mocking helpers to a shared lib (#3878)
These are generally useful
2025-09-18 17:53:14 -07:00
Michael Bolin
8595237505 fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
Previous to this PR, both of these functions take a single `cwd`:


71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)


71038381aa/codex-rs/core/src/landlock.rs (L16-L23)

whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
this PR).

Added `sandbox_distinguishes_command_and_policy_cwds()` to
`codex-rs/exec/tests/suite/sandbox.rs` to verify this.
2025-09-18 14:37:06 -07:00
Jeremy Rose
b34e906396 Reland "refactor transcript view to handle HistoryCells" (#3753)
Reland of #3538
2025-09-18 20:55:53 +00:00
jif-oai
4a5d6f7c71 Make ESC button work when auto-compaction (#3857)
Only emit a task finished when the compaction comes from a `/compact`
2025-09-18 15:34:16 +00:00
dedrisian-oai
72733e34c4 Add dev message upon review out (#3758)
Proposal: We want to record a dev message like so:

```
{
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "<user_action>
  <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context>
  <action>review</action>
  <results>
  {findings_str}
  </results>
</user_action>"
        }
      ]
    },
```

Without showing in the chat transcript.

Rough idea, but it fixes issue where the user finishes a review thread,
and asks the parent "fix the rest of the review issues" thinking that
the parent knows about it.

### Question: Why not a tool call?

Because the agent didn't make the call, it was a human. + we haven't
implemented sub-agents yet, and we'll need to think about the way we
represent these human-led tool calls for the agent.
2025-09-16 18:43:32 -07:00
dedrisian-oai
7fe4021f95 Review mode core updates (#3701)
1. Adds the environment prompt (including cwd) to review thread
2. Prepends the review prompt as a user message (temporary fix so the
instructions are not replaced on backend)
3. Sets reasoning to low
4. Sets default review model to `gpt-5-codex`
2025-09-16 13:36:51 -07:00
Dylan
11285655c4 fix: Record EnvironmentContext in SendUserTurn (#3678)
## Summary
SendUserTurn has not been correctly handling updates to policies. While
the tui protocol handles this in `Op::OverrideTurnContext`, the
SendUserTurn should be appending `EnvironmentContext` messages when the
sandbox settings change. MCP client behavior should match the cli
behavior, so we update `SendUserTurn` message to match.

## Testing
- [x] Added prompt caching tests
2025-09-16 11:32:20 -07:00
Ahmed Ibrahim
26f1246a89 Revert "refactor transcript view to handle HistoryCells" (#3614)
Reverts openai/codex#3538
It panics on forking first message. It also calculates the index in a
wrong way.
2025-09-15 03:39:36 +00:00
dedrisian-oai
2aa84b8891 Fix EventMsg Optional (#3604) 2025-09-15 00:34:33 +00:00
Ahmed Ibrahim
a30e5e40ee enable-resume (#3537)
Adding the ability to resume conversations.
we have one verb `resume`. 

Behavior:

`tui`:
`codex resume`: opens session picker
`codex resume --last`: continue last message
`codex resume <session id>`: continue conversation with `session id`

`exec`:
`codex resume --last`: continue last conversation
`codex resume <session id>`: continue conversation with `session id`

Implementation:
- I added a function to find the path in `~/.codex/sessions/` with a
`UUID`. This is helpful in resuming with session id.
- Added the above mentioned flags
- Added lots of testing
2025-09-14 19:33:19 -04:00
dedrisian-oai
b2f6fc3b9a Fix flaky windows test (#3564)
There are exactly 4 types of flaky tests in Windows x86 right now:

1. `review_input_isolated_from_parent_history` => Times out waiting for
closing events
2. `review_does_not_emit_agent_message_on_structured_output` => Times
out waiting for closing events
3. `auto_compact_runs_after_token_limit_hit` => Times out waiting for
closing events
4. `auto_compact_runs_after_token_limit_hit` => Also has a problem where
auto compact should add a third request, but receives 4 requests.

1, 2, and 3 seem to be solved with increasing threads on windows runner
from 2 -> 4.

Don't know yet why # 4 is happening, but probably also because of
WireMock issues on windows causing races.
2025-09-14 23:20:25 +00:00
pakrym-oai
863d9c237e Include command output when sending timeout to model (#3576)
Being able to see the output helps the model decide how to handle the
timeout.
2025-09-14 14:38:26 -07:00
Ahmed Ibrahim
bbea6bbf7e Handle resuming/forking after compact (#3533)
We need to construct the history different when compact happens. For
this, we need to just consider the history after compact and convert
compact to a response item.

This needs to change and use `build_compact_history` when this #3446 is
merged.
2025-09-14 13:23:31 +00:00
Jeremy Rose
4891ee29c5 refactor transcript view to handle HistoryCells (#3538)
No (intended) functional change.

This refactors the transcript view to hold a list of HistoryCells
instead of a list of Lines. This simplifies and makes much of the logic
more robust, as well as laying the groundwork for future changes, e.g.
live-updating history cells in the transcript.

Similar to #2879 in goal. Fixes #2755.
2025-09-13 19:23:14 -07:00
pakrym-oai
3d4acbaea0 Preserve IDs for more item types in azure (#3542)
https://github.com/openai/codex/issues/3509
2025-09-13 01:09:56 +00:00
pakrym-oai
414b8be8b6 Always request encrypted cot (#3539)
Otherwise future requests will fail with 500
2025-09-12 23:51:30 +00:00
dedrisian-oai
90a0fd342f Review Mode (Core) (#3401)
## 📝 Review Mode -- Core

This PR introduces the Core implementation for Review mode:

- New op `Op::Review { prompt: String }:` spawns a child review task
with isolated context, a review‑specific system prompt, and a
`Config.review_model`.
- `EnteredReviewMode`: emitted when the child review session starts.
Every event from this point onwards reflects the review session.
- `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
finishes or is interrupted, with optional structured findings:

```json
{
  "findings": [
    {
      "title": "<≤ 80 chars, imperative>",
      "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
      "confidence_score": <float 0.0-1.0>,
      "priority": <int 0-3>,
      "code_location": {
        "absolute_file_path": "<file path>",
        "line_range": {"start": <int>, "end": <int>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
  "overall_confidence_score": <float 0.0-1.0>
}
```

## Questions

### Why separate out its own message history?

We want the review thread to match the training of our review models as
much as possible -- that means using a custom prompt, removing user
instructions, and starting a clean chat history.

We also want to make sure the review thread doesn't leak into the parent
thread.

### Why do this as a mode, vs. sub-agents?

1. We want review to be a synchronous task, so it's fine for now to do a
bespoke implementation.
2. We're still unclear about the final structure for sub-agents. We'd
prefer to land this quickly and then refactor into sub-agents without
rushing that implementation.
2025-09-12 23:25:10 +00:00
pakrym-oai
e3c6903199 Add Azure Responses API workaround (#3528)
Azure Responses API doesn't work well with store:false and response
items.

If store = false and id is sent an error is thrown that ID is not found
If store = false and id is not sent an error is thrown that ID is
required

Add detection for Azure urls and add a workaround to preserve reasoning
item IDs and send store:true
2025-09-12 13:52:15 -07:00
jif-oai
ea225df22e feat: context compaction (#3446)
## Compact feature:
1. Stops the model when the context window become too large
2. Add a user turn, asking for the model to summarize
3. Build a bridge that contains all the previous user message + the
summary. Rendered from a template
4. Start sampling again from a clean conversation with only that bridge
2025-09-12 13:07:10 -07:00
jif-oai
c6fd056aa6 feat: reasoning effort as optional (#3527)
Allow the reasoning effort to be optional
2025-09-12 12:06:33 -07:00
jif-oai
bba567cee9 bug: fix model save (#3525)
Fix those 2 behaviors:
1. The model does not get saved if we don't CTRL + S
2. The reasoning effort get saved
2025-09-12 10:38:12 -07:00
Ahmed Ibrahim
674e3d3c90 Add Compact and Turn Context to the rollout items (#3444)
Adding compact and turn context to the rollout items

based on #3440
2025-09-11 18:08:51 +00:00
jif-oai
114ce9ff4d NIT unified exec (#3479)
Fix the default value of the experimental flag of unified_exec
2025-09-11 16:19:12 +00:00
Eric Traut
e13b35ecb0 Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
This PR does the following:
* Adds the ability to paste or type an API key.
* Removes the `preferred_auth_method` config option. The last login
method is always persisted in auth.json, so this isn't needed.
* If OPENAI_API_KEY env variable is defined, the value is used to
prepopulate the new UI. The env variable is otherwise ignored by the
CLI.
* Adds a new MCP server entry point "login_api_key" so we can implement
this same API key behavior for the VS Code extension.
<img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
/>
<img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
/>
2025-09-11 09:16:34 -07:00
Ahmed Ibrahim
162e1235a8 Change forking to read the rollout from file (#3440)
This PR changes get history op to get path. Then, forking will use a
path. This will help us have one unified codepath for resuming/forking
conversations. Will also help in having rollout history in order. It
also fixes a bug where you won't see the UI when resuming after forking.
2025-09-10 17:42:54 -07:00
jif-oai
c09ed74a16 Unified execution (#3288)
## Unified PTY-Based Exec Tool

Note: this requires to have this flag in the config:
`use_experimental_unified_exec_tool=true`

- Adds a PTY-backed interactive exec feature (“unified_exec”) with
session reuse via
  session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
- Protocol: introduces ResponseItem::UnifiedExec { session_id,
arguments, timeout_ms }.
- Tools: exposes unified_exec as a function tool (Responses API);
excluded from Chat
  Completions payload while still supported in tool lists.
- Path handling: resolves commands via PATH (or explicit paths), with
UTF‑8/newline‑aware
  truncation (truncate_middle).
- Tests: cover command parsing, path resolution, session
persistence/cleanup, multi‑session
  isolation, timeouts, and truncation behavior.
2025-09-10 17:38:11 -07:00
Jeremy Rose
f69f07b028 put workspace roots in the environment context (#3375)
to keep the tool description constant when the writable roots change.
2025-09-10 15:10:52 -07:00
Michael Bolin
51d9e05de7 Back out "feat: POSIX unification and snapshot sessions (#3179)" (#3430)
This reverts https://github.com/openai/codex/pull/3179.

#3179 appears to introduce a regression where sourcing dotfiles causes a
bunch of activity in the title bar (and potentially slows things down?)


https://github.com/user-attachments/assets/a68f7fb3-0749-4e0e-a321-2aa6993e01da

Verified this no longer happens after backing out #3179.

Original commit changeset: 62bd0e3d9d
2025-09-10 12:40:24 -07:00
Ahmed Ibrahim
45bd5ca4b9 Move initial history to protocol (#3422)
To fix an edge case of forking then resuming

#3419
2025-09-10 10:17:24 -07:00
Ahmed Ibrahim
43809a454e Introduce rollout items (#3380)
This PR introduces Rollout items. This enable us to rollout eventmsgs
and session meta.

This is mostly #3214 with rebase on main
2025-09-09 23:52:33 +00:00
Gabriel Peal
5eab4c7ab4 Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388)
The previous config approach had a few issues:
1. It is part of the config but not designed to be used externally
2. It had to be wired through many places (look at the +/- on this PR
3. It wasn't guaranteed to be set consistently everywhere because we
don't have a super well defined way that configs stack. For example, the
extension would configure during newConversation but anything that
happened outside of that (like login) wouldn't get it.

This env var approach is cleaner and also creates one less thing we have
to deal with when coming up with a better holistic story around configs.

One downside is that I removed the unit test testing for the override
because I don't want to deal with setting the global env or spawning
child processes and figuring out how to introspect their originator
header. The new code is sufficiently simple and I tested it e2e that I
feel as if this is still worth it.
2025-09-09 17:23:23 -04:00
jif-oai
62bd0e3d9d feat: POSIX unification and snapshot sessions (#3179)
## Session snapshot
For POSIX shell, the goal is to take a snapshot of the interactive shell
environment, store it in a session file located in `.codex/` and only
source this file for every command that is run.
As a result, if a snapshot files exist, `bash -lc <CALL>` get replaced
by `bash -c <CALL>`.

This also fixes the issue that `bash -lc` does not source `.bashrc`,
resulting in missing env variables and aliases in the codex session.
## POSIX unification
Unify `bash` and `zsh` shell into a POSIX shell. The rational is that
the tool will not use any `zsh` specific capabilities.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-09-08 18:09:45 -07:00
Jeremy Rose
ac58749bd3 allow mach-lookup for com.apple.system.opendirectoryd.libinfo (#3334)
in the base sandbox policy. this is [allowed in Chrome
renderers](https://source.chromium.org/chromium/chromium/src/+/main:sandbox/policy/mac/common.sb;l=266;drc=7afa0043cfcddb3ef9dafe5acbfc01c2f7e7df01),
so I feel it's fairly safe.
2025-09-08 16:28:52 -07:00
Gabriel Peal
5eaaf307e1 Generate more typescript types and return conversation id with ConversationSummary (#3219)
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
2025-09-08 17:54:47 -04:00
Gabriel Peal
c8fab51372 Use ConversationId instead of raw Uuids (#3282)
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.

I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
2025-09-07 23:22:25 -04:00
pakrym-oai
5775174ec2 Never store requests (#3212)
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.

Not having the mode to store requests also allows us to simplify the
code.

## Breaking change

The `disable_response_storage` configuration option is removed.
2025-09-05 10:41:47 -07:00
Ahmed Ibrahim
2b96f9f569 Dividing UserMsgs into categories to send it back to the tui (#3127)
This PR does the following:

- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing

Building on top of #3123

Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
2025-09-04 05:34:50 +00:00
Ahmed Ibrahim
f2036572b6 Replay EventMsgs from Response Items when resuming a session with history. (#3123)
### Overview

This PR introduces the following changes:
	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
2. Ensures that when a session is initialized with initial history, a
vector of EventMsg is sent along with the session configuration. This
allows clients to re-render the UI accordingly.
	3. 	Added integration testing

### Caveats

This implementation does not send every EventMsg that was previously
dispatched to clients. The excluded events fall into two categories:
	•	“Arguably” rolled-out events
Examples include tool calls and apply-patch calls. While these events
are conceptually rolled out, we currently only roll out ResponseItems.
These events are already being handled elsewhere and transformed into
EventMsg before being sent.
	•	Non-rolled-out events
Certain events such as TurnDiff, Error, and TokenCount are not rolled
out at all.

### Future Directions

At present, resuming a session involves maintaining two states:
	•	UI State
Clients can replay most of the important UI from the provided EventMsg
history.
	•	Model State
The model receives the complete session history to reconstruct its
internal state.

This design provides a solid foundation. If, in the future, more precise
UI reconstruction is needed, we have two potential paths:
1. Introduce a third data structure that allows us to derive both
ResponseItems and EventMsgs.
2. Clearly divide responsibilities: the core system ensures the
integrity of the model state, while clients are responsible for
reconstructing the UI.
2025-09-04 04:47:00 +00:00
Ahmed Ibrahim
6b83c1c3f3 Fix failing CI (#3130)
In this test, the ChatGPT token path is used, and the auth layer tries
to refresh the token if it thinks the token is “old.” Your helper writes
a fixed last_refresh timestamp that has now aged past the 28‑day
threshold, so the code attempts a real refresh against auth.openai.com,
never reaches the mock, and you end up with
received_requests().await.unwrap() being empty.
2025-09-03 22:38:32 +00:00
pakrym-oai
c636f821ae Add a common way to create HTTP client (#3110)
Ensure User-Agent and originator are always sent.
2025-09-03 10:11:02 -07:00
pakrym-oai
03e2796ca4 Move CodexAuth and AuthManager to the core crate (#3074)
Fix a long standing layering issue.
2025-09-02 18:36:19 -07:00
Ahmed Ibrahim
431a10fc50 chore: unify history loading (#2736)
We have two ways of loading conversation with a previous history. Fork
conversation and the experimental resume that we had before. In this PR,
I am unifying their code path. The path is getting the history items and
recording them in a brand new conversation. This PR also constraint the
rollout recorder responsibilities to be only recording to the disk and
loading from the disk.

The PR also fixes a current bug when we have two forking in a row:
History 1:
<Environment Context>
UserMessage_1
UserMessage_2
UserMessage_3

**Fork with n = 1 (only remove one element)**
History 2:
<Environment Context>
UserMessage_1
UserMessage_2
<Environment Context>

**Fork with n = 1 (only remove one element)**
History 2:
<Environment Context>
UserMessage_1
UserMessage_2
**<Environment Context>**

This shouldn't happen but because we were appending the `<Environment
Context>` after each spawning and it's considered as _user message_.
Now, we don't add this message if restoring and old conversation.
2025-09-02 22:44:29 +00:00
Michael Bolin
74d2741729 chore: require uninlined_format_args from clippy (#2845)
- added `uninlined_format_args` to `[workspace.lints.clippy]` in the
`Cargo.toml` for the workspace
- ran `cargo clippy --tests --fix`
- ran `just fmt`
2025-08-28 11:25:23 -07:00
dedrisian-oai
4e9ad23864 Add "View Image" tool (#2723)
Adds a "View Image" tool so Codex can find and see images by itself:

<img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40
04 AM"
src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03"
/>
2025-08-27 17:41:23 -07:00
Ahmed Ibrahim
2d2f66f9c5 Bug fix: deduplicate assistant messages (#2758)
We are treating assistant messages in a different way than other
messages which resulted in a duplicated history.

See #2698
2025-08-27 01:29:16 -07:00
Jeremy Rose
32bbbbad61 test: faster test execution in codex-core (#2633)
this dramatically improves time to run `cargo test -p codex-core` (~25x
speedup).

before:
```
cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
```

after:
```
cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
```

both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
to exclude compile times.

approach inspired by [Delete Cargo Integration
Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
we move all test cases in tests/ into a single suite in order to have a
single binary, as there is significant overhead for each test binary
executed, and because test execution is only parallelized with a single
binary.
2025-08-24 11:10:53 -07:00