Commit Graph

713 Commits

Author SHA1 Message Date
pakrym-oai
91b16b8682 Don't request approval for safe commands in unified exec (#6380) 2025-11-07 16:36:04 -08:00
Alexander Smirnov
183fc8e01a core: replace Cloudflare 403 HTML with friendly message (#6252)
### Motivation

When Codex is launched from a region where Cloudflare blocks access (for
example, Russia), the CLI currently dumps Cloudflare’s entire HTML error
page. This isn’t actionable and makes it hard for users to understand
what happened. We want to detect the Cloudflare block and show a
concise, user-friendly explanation instead.

### What Changed

- Added CLOUDFLARE_BLOCKED_MESSAGE and a friendly_message() helper to
UnexpectedResponseError. Whenever we see a 403 whose body contains the
Cloudflare block notice, we now emit a single-line message (Access
blocked by Cloudflare…) while preserving the HTTP status and request id.
All other responses keep the original behaviour.
- Added two focused unit tests:
- unexpected_status_cloudflare_html_is_simplified ensures the Cloudflare
HTML case yields the friendly message.
- unexpected_status_non_html_is_unchanged confirms plain-text 403s still
return the raw body.

### Testing

- cargo build -p codex-cli
- cargo test -p codex-core
- just fix -p codex-core
- cargo test --all-features

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2025-11-07 15:55:16 -08:00
pakrym-oai
4c1a6f0ee0 Promote shell config tool to model family config (#6351) 2025-11-07 10:11:11 -08:00
Celia Chen
e84e39940b [App-server] Implement account/read endpoint (#6336)
This PR does two things:
1. add a new function in core that maps the core-internal plan type to
the external plan type;
2. implement account/read that get account status (v2 of
`getAuthStatus`).
2025-11-06 19:43:13 -08:00
pakrym-oai
e8905f6d20 Prefer wait_for_event over wait_for_event_with_timeout (#6349) 2025-11-06 18:11:11 -08:00
pakrym-oai
f8b30af6dc Prefer wait_for_event over wait_for_event_with_timeout. (#6346)
No need to specify the timeout in most cases.
2025-11-06 16:14:43 -08:00
pakrym-oai
c368c6aeea Remove shell tool when unified exec is enabled (#6345)
Also drop streameable shell that's just an alias for unified exec.
2025-11-06 15:46:24 -08:00
Eric Traut
0c647bc566 Don't retry "insufficient_quota" errors (#6340)
This PR makes an "insufficient quota" error fatal so we don't attempt to
retry it multiple times in the agent loop.

We have multiple bug reports from users about intermittent retry
behaviors, and this could explain some of them. With this change, we'll
eliminate the retries and surface a clear error message.

The PR is a nearly identical copy of [this
PR](https://github.com/openai/codex/pull/4837) contributed by
@abimaelmartell. The original PR has gone stale. Rather than wait for
the contributor to resolve merge conflicts, I wanted to get this change
in.
2025-11-06 15:12:01 -08:00
pakrym-oai
b5349202e9 Freeform unified exec output formatting (#6233) 2025-11-06 22:14:27 +00:00
Jeremy Rose
8501b0b768 core: widen sandbox to allow certificate ops when network is enabled (#5980)
This allows `gh api` to work in the workspace-write sandbox w/ network
enabled. Without this we see e.g.

```
$ codex debug seatbelt --full-auto gh api repos/openai/codex/pulls --paginate -X GET -F state=all
Get "https://api.github.com/repos/openai/codex/pulls?per_page=100&state=all": tls: failed to verify certificate: x509: OSStatus -26276
```
2025-11-06 12:47:20 -08:00
Thibault Sottiaux
8c75ed39d5 feat: clarify that gpt-5-codex should not amend commits unless requested (#6333) 2025-11-06 11:42:47 -08:00
iceweasel-oai
871d442b8e Windows Sandbox: Show Everyone-writable directory warning (#6283)
Show a warning when Auto Sandbox mode becomes enabled, if we detect
Everyone-writable directories, since they cannot be protected by the
current implementation of the Sandbox.

This PR also includes changes to how we detect Everyone-writable to be
*much* faster
2025-11-06 10:44:42 -08:00
Eric Traut
d7953aed74 Fixes intermittent test failures in CI (#6282)
I'm seeing two tests fail intermittently in CI. This PR attempts to
address (or at least mitigate) the flakiness.

* summarize_context_three_requests_and_instructions - The test snapshots
server.received_requests() immediately after observing TaskComplete.
Because the OpenAI /v1/responses call is streamed, the HTTP request can
still be draining when that event fires, so wiremock occasionally
reports only two captured requests. Fix is to wait for async activity to
complete.
* archive_conversation_moves_rollout_into_archived_directory - times out
on a slow CI run. Mitigation is to increase timeout value from 10s to
20s.
2025-11-05 13:12:25 -08:00
Owen Lin
2ab1650d4d [app-server] feat: v2 Thread APIs (#6214)
Implements:
```
thread/list
thread/start
thread/resume
thread/archive
```

along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.

However, an improvement made for developer ergonomics:
- `thread/start` and `thread/resume` automatically attaches a
conversation listener internally, so clients don't have to make a
separate `AddConversationListener` call like they do today.

For consistency, also updated `model/list` and `feedback/upload` (naming
conventions, list API params).
2025-11-05 20:28:43 +00:00
Eric Traut
c4ebe4b078 Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.

This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.

Here is the error message in the CLI. The same text appears within the
extension.

<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>

I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.

Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.

Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.

This should address aspects of #6191, #5679, and #5505
2025-11-05 10:51:57 -08:00
Ahmed Ibrahim
1a89f70015 refactor Conversation history file into its own directory (#6229)
This is just a refactor of `conversation_history` file by breaking it up
into multiple smaller ones with helper. This refactor will help us move
more functionality related to context management here. in a clean way.
2025-11-05 10:49:35 -08:00
Andrew Dirksen
95af417923 allow codex to be run from pid 1 (#4200)
Previously it was not possible for codex to run commands as the init
process (pid 1) in linux. Commands run in containers tend to see their
own pid as 1. See https://github.com/openai/codex/issues/4198

This pr implements the solution mentioned in that issue.

Co-authored-by: Eric Traut <etraut@openai.com>
2025-11-04 17:54:46 -08:00
Soroush Yousefpour
fff576cf98 fix(core): load custom prompts from symlinked Markdown files (#3643)
- Discover prompts via fs::metadata to follow symlinks

- Add Unix-only symlink test in custom_prompts.rs

- Update docs/prompts.md to mention symlinks

Fixes #3637

---------

Signed-off-by: Soroush Yousefpour <h.yusefpour@gmail.com>
Co-authored-by: dedrisian-oai <dedrisian@openai.com>
Co-authored-by: Eric Traut <etraut@openai.com>
2025-11-04 17:44:02 -08:00
Ahmed Ibrahim
d40a6b7f73 fix: Update the deprecation message to link to the docs (#6211)
The deprecation message is currently a bit confusing. Users may not
understand what is `[features].x`. I updated the docs and the
deprecation message for more guidance.

---------

Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
2025-11-04 21:02:27 +00:00
Ahmed Ibrahim
fe54c216a3 ignore deltas in codex_delegate (#6208)
ignore legacy deltas in codex-delegate to avoid this
[issue](https://github.com/openai/codex/pull/6202).
2025-11-04 19:21:35 +00:00
Ahmed Ibrahim
7e068e1094 fix: ignore reasoning deltas because we send it with turn item (#6202)
should fix this:

<img width="2418" height="242" alt="image"
src="https://github.com/user-attachments/assets/f818d00b-ed3a-479b-94a7-e4bc5db6326e"
/>
2025-11-04 08:27:16 -08:00
Celia Chen
d3187dbc17 [App-server] v2 for account/updated and account/logout (#6175)
V2 for `account/updated` and `account/logout` for app server. correspond
to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
will make other v2 endpoints call `account/updated` instead of
`authStatusChange` too.
2025-11-03 22:01:33 -08:00
Robby He
dc2f26f7b5 Fix is_api_message to correctly exclude reasoning messages (#6156)
## Problem

The `is_api_message` function in `conversation_history.rs` had a
misalignment between its documentation and implementation:

- **Comment stated**: "Anything that is not a system message or
'reasoning' message is considered an API message"
- **Code behavior**: Was returning `true` for `ResponseItem::Reasoning`,
meaning reasoning messages were incorrectly treated as API messages

This inconsistency could lead to reasoning messages being persisted in
conversation history when they should be filtered out.

## Root Cause

Investigation revealed that reasoning messages are explicitly excluded
throughout the codebase:

1. **Chat completions API** (lines 267-272 in `chat_completions.rs`)
omits reasoning from conversation history:
   ```rust
   ResponseItem::Reasoning { .. } | ResponseItem::Other => {
       // Omit these items from the conversation history.
       continue;
   }
   ```

2. **Existing tests** like `drops_reasoning_when_last_role_is_user` and
`ignores_reasoning_before_last_user` validate that reasoning should be
excluded from API payloads

## Solution

Fixed the `is_api_message` function to align with its documentation and
the rest of the codebase:

```rust
// Before: Reasoning was incorrectly returning true
ResponseItem::Reasoning { .. } | ResponseItem::WebSearchCall { .. } => true,

// After: Reasoning correctly returns false  
ResponseItem::WebSearchCall { .. } => true,
ResponseItem::Reasoning { .. } | ResponseItem::Other => false,
```

## Testing

- Enhanced existing test to verify reasoning messages are properly
filtered out
- All 264 core tests pass, including 8 chat completions tests that
validate reasoning behavior
- No regressions introduced

This ensures reasoning messages are consistently excluded from API
message processing across the entire codebase.
2025-11-03 20:55:41 -08:00
Eric Traut
1e0e553304 Fixed notify handler so it passes correct input_messages details (#6143)
This fixes bug #6121. 

The `input_messages` field passed to the notify handler is currently
empty because the logic is incorrectly including the OutputText rather
than InputText. I've fixed that and added proper filtering to remove
messages associated with AGENTS.md and other context injected by the
harness.

Testing: I wrote a notify handler and verified that the user prompt is
correctly passed through to the handler.
2025-11-03 14:23:04 -08:00
iceweasel-oai
07b7d28937 log sandbox commands to $CODEX_HOME instead of cwd (#6171)
Logging commands in the Windows Sandbox is temporary, but while we are
doing it, let's always write to CODEX_HOME instead of dirtying the cwd.
2025-11-03 13:12:33 -08:00
Ahmed Ibrahim
6ee7fbcfff feat: add the time after aborting (#5996)
Tell the model how much time passed after the user aborted the call.
2025-11-03 11:44:06 -08:00
iceweasel-oai
2eda75a8ee Do not skip trust prompt on Windows if sandbox is enabled. (#6167)
If the experimental windows sandbox is enabled, the trust prompt should
show on Windows.
2025-11-03 11:27:45 -08:00
Vinh Nguyen
a1ee10b438 fix: improve usage URLs in status card and snapshots (#6111)
Hi OpenAI Codex team, currently "Visit chatgpt.com/codex/settings/usage
for up-to-date information on rate limits and credits" message in status
card and error messages. For now, without the "https://" prefix, the
link cannot be clicked directly from most terminals or chat interfaces.

<img width="636" height="127" alt="Screenshot 2025-11-02 at 22 47 06"
src="https://github.com/user-attachments/assets/5ea11e8b-fb74-451c-85dc-f4d492b2678b"
/>

---

The fix is intent to improve this issue:

- It makes the link clickable in terminals that support it, hence better
accessibility
- It follows standard URL formatting practices
- It maintains consistency with other links in the application (like the
existing "https://openai.com/chatgpt/pricing" links)

Thank you!
2025-11-02 21:44:59 -08:00
Eric Traut
0c7efa0cfd Fix incorrect "deprecated" message about experimental config key (#6131)
When I enable `experimental_sandbox_command_assessment`, I get an
incorrect deprecation warning: "experimental_sandbox_command_assessment
is deprecated. Use experimental_sandbox_command_assessment instead."

This PR fixes this error.
2025-11-02 16:33:09 -08:00
Eric Traut
d5853d9c47 Changes to sandbox command assessment feature based on initial experiment feedback (#6091)
* Removed sandbox risk categories; feedback indicates that these are not
that useful and "less is more"
* Tweaked the assessment prompt to generate terser answers
* Fixed bug in orchestrator that prevents this feature from being
exposed in the extension
2025-11-01 14:52:23 -07:00
Thomas Stokes
d9118c04bf Parse the Azure OpenAI rate limit message (#5956)
Fixes #4161

Currently Codex uses a regex to parse the "Please try again in 1.898s"
OpenAI-style rate limit message, so that it can wait the correct
duration before retrying. Azure OpenAI returns a different error that
looks like "Rate limit exceeded. Try again in 35 seconds."

This PR extends the regex and parsing code to match in a more fuzzy
manner, handling anything matching the pattern "try again in
\<duration>\<unit>".
2025-11-01 09:33:13 -07:00
jif-oai
611e00c862 feat: compactor 2 (#6027)
Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-10-31 14:27:08 -07:00
Ahmed Ibrahim
c8ebb2a0dc Add warning on compact (#6052)
This PR introduces the ability for `core` to send `warnings` as it can
send `errors. It also sends a warning on compaction.

<img width="811" height="187" alt="image"
src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
/>
2025-10-31 13:27:33 -07:00
Dylan Hurd
88e083a9d0 chore: Add shell serialization tests for json (#6043)
## Summary
Can never have enough tests on this code path - checking that json
inside a shell call is deserialized correctly.

## Tests
- [x] These are tests 😎
2025-10-31 11:01:58 -07:00
Ahmed Ibrahim
1c8507b32a Truncate total tool calls text (#5979)
Put a cap on the aggregate output of text content on tool calls.

---------

Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
2025-10-31 10:30:36 -07:00
jif-oai
0508823075 test: undo (#6034) 2025-10-31 14:46:24 +00:00
pakrym-oai
2371d771cc Update user instruction message format (#6010) 2025-10-30 18:44:02 -07:00
Ahmed Ibrahim
dc2aeac21f override verbosity for gpt-5-codex (#6007)
we are seeing [reports](https://github.com/openai/codex/issues/6004) of
users having verbosity in their config.toml and facing issues.
gpt-5-codex doesn't accept other values rather than medium for
verbosity.
2025-10-31 00:45:05 +00:00
Jack
f842849bec docs: Fix markdown list item spacing in codex-rs/core/review_prompt.md (#4144)
Fixes a Markdown parsing issue where a list item used `*` without a
following space (`*Line ranges ...`). Per CommonMark, a space after the
list marker is required. Updated to `* Line ranges ...` so the guideline
renders as a standalone bullet. This change improves readability and
prevents mis-parsing in renderers.

Co-authored-by: Eric Traut <etraut@openai.com>
2025-10-30 17:39:21 -07:00
zhao-oai
dcf73970d2 rate limit errors now provide absolute time (#6000) 2025-10-30 20:33:25 -04:00
Ahmed Ibrahim
a3d3719481 Remove last turn reasoning filtering (#5986) 2025-10-30 23:20:32 +00:00
iceweasel-oai
87cce88f48 Windows Sandbox - Alpha version (#4905)
- Added the new codex-windows-sandbox crate that builds both a library
entry point (run_windows_sandbox_capture) and a CLI executable to launch
commands inside a Windows restricted-token sandbox, including ACL
management, capability SID provisioning, network lockdown, and output
capture
(windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54).
- Introduced the experimental WindowsSandbox feature flag and wiring so
Windows builds can opt into the sandbox:
SandboxType::WindowsRestrictedToken, the in-process execution path, and
platform sandbox selection now honor the flag (core/src/features.rs:47,
core/src/config.rs:1224, core/src/safety.rs:19,
core/src/sandboxing/mod.rs:69, core/src/exec.rs:79,
core/src/exec.rs:172).
- Updated workspace metadata to include the new crate and its
Windows-specific dependencies so the core crate can link against it
(codex-rs/
    Cargo.toml:91, core/Cargo.toml:86).
- Added a PowerShell bootstrap script that installs the Windows
toolchain, required CLI utilities, and builds the workspace to ease
development
    on the platform (scripts/setup-windows.ps1:1).
- Landed a Python smoke-test suite that exercises
read-only/workspace-write policies, ACL behavior, and network denial for
the Windows sandbox
    binary (windows-sandbox-rs/sandbox_smoketests.py:1).
2025-10-30 15:51:57 -07:00
Bernard Niset
ff6d4cec6b fix: Update seatbelt policy for java on macOS (#3987)
# Summary

This PR is related to the Issue #3978 and contains a fix to the seatbelt
profile for macOS that allows to run java/jdk tooling from the sandbox.
I have found that the included change is the minimum change to make it
run on my machine.

There is a unit test added by codex when making this fix. I wonder if it
is useful since you need java installed on the target machine for it to
be relevant. I can remove it it is better.

Fixes #3978
2025-10-30 14:25:04 -07:00
Celia Chen
6ef658a9f9 [Hygiene] Remove include_view_image_tool config (#5976)
There's still some debate about whether we want to expose
`tools.view_image` or `feature.view_image` so those are left unchanged
for now, but this old `include_view_image_tool` config is good-to-go.
Also updated the doc to reflect that `view_image` tool is now by default
true.
2025-10-30 13:23:24 -07:00
Anton Panasenko
9572cfc782 [codex] add developer instructions (#5897)
we are using developer instructions for code reviews, we need to pass
them in cli as well.
2025-10-30 11:18:31 -07:00
Dylan Hurd
4a55646a02 chore: testing on freeform apply_patch (#5952)
## Summary
Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
apply_patch tool as opposed to the function call path. The good news is
that all the tests pass with zero logical tests, with the exception of
the heredoc, which doesn't really make sense in the freeform tool
context anyway.

@jif-oai since you wrote the original tests in #5557, I'd love your
opinion on the right way to DRY these test cases between the two. Happy
to set up a more sophisticated harness, but didn't want to go down the
rabbit hole until we agreed on the right pattern

## Testing
- [x] These are tests
2025-10-30 10:40:48 -07:00
jif-oai
f4f9695978 feat: compaction prompt configurable (#5959)
```
 codex -c compact_prompt="Summarize in bullet points"
 ```
2025-10-30 14:24:24 +00:00
Ahmed Ibrahim
5fcc380bd9 Pass initial history as an optional to codex delegate (#5950)
This will give us more freedom on controlling the delegation. i.e we can
fork our history and run `compact`.
2025-10-30 07:22:42 -07:00
jif-oai
aa76003e28 chore: unify config crates (#5958) 2025-10-30 10:28:32 +00:00
Ahmed Ibrahim
fac548e430 Send delegate header (#5942)
Send delegate type header
2025-10-30 09:49:40 +00:00