valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	91b16b8682	Don't request approval for safe commands in unified exec (#6380 )	2025-11-07 16:36:04 -08:00
Alexander Smirnov	183fc8e01a	core: replace Cloudflare 403 HTML with friendly message (#6252 ) ### Motivation When Codex is launched from a region where Cloudflare blocks access (for example, Russia), the CLI currently dumps Cloudflare’s entire HTML error page. This isn’t actionable and makes it hard for users to understand what happened. We want to detect the Cloudflare block and show a concise, user-friendly explanation instead. ### What Changed - Added CLOUDFLARE_BLOCKED_MESSAGE and a friendly_message() helper to UnexpectedResponseError. Whenever we see a 403 whose body contains the Cloudflare block notice, we now emit a single-line message (Access blocked by Cloudflare…) while preserving the HTTP status and request id. All other responses keep the original behaviour. - Added two focused unit tests: - unexpected_status_cloudflare_html_is_simplified ensures the Cloudflare HTML case yields the friendly message. - unexpected_status_non_html_is_unchanged confirms plain-text 403s still return the raw body. ### Testing - cargo build -p codex-cli - cargo test -p codex-core - just fix -p codex-core - cargo test --all-features --------- Co-authored-by: Eric Traut <etraut@openai.com>	2025-11-07 15:55:16 -08:00
pakrym-oai	4c1a6f0ee0	Promote shell config tool to model family config (#6351 )	2025-11-07 10:11:11 -08:00
Celia Chen	e84e39940b	[App-server] Implement `account/read` endpoint (#6336 ) This PR does two things: 1. add a new function in core that maps the core-internal plan type to the external plan type; 2. implement account/read that get account status (v2 of `getAuthStatus`).	2025-11-06 19:43:13 -08:00
pakrym-oai	e8905f6d20	Prefer `wait_for_event` over `wait_for_event_with_timeout` (#6349 )	2025-11-06 18:11:11 -08:00
pakrym-oai	f8b30af6dc	Prefer `wait_for_event` over `wait_for_event_with_timeout`. (#6346 ) No need to specify the timeout in most cases.	2025-11-06 16:14:43 -08:00
pakrym-oai	c368c6aeea	Remove shell tool when unified exec is enabled (#6345 ) Also drop streameable shell that's just an alias for unified exec.	2025-11-06 15:46:24 -08:00
Eric Traut	0c647bc566	Don't retry "insufficient_quota" errors (#6340 ) This PR makes an "insufficient quota" error fatal so we don't attempt to retry it multiple times in the agent loop. We have multiple bug reports from users about intermittent retry behaviors, and this could explain some of them. With this change, we'll eliminate the retries and surface a clear error message. The PR is a nearly identical copy of [this PR](https://github.com/openai/codex/pull/4837) contributed by @abimaelmartell. The original PR has gone stale. Rather than wait for the contributor to resolve merge conflicts, I wanted to get this change in.	2025-11-06 15:12:01 -08:00
pakrym-oai	b5349202e9	Freeform unified exec output formatting (#6233 )	2025-11-06 22:14:27 +00:00
Jeremy Rose	8501b0b768	core: widen sandbox to allow certificate ops when network is enabled (#5980 ) This allows `gh api` to work in the workspace-write sandbox w/ network enabled. Without this we see e.g. ``` $ codex debug seatbelt --full-auto gh api repos/openai/codex/pulls --paginate -X GET -F state=all Get "https://api.github.com/repos/openai/codex/pulls?per_page=100&state=all": tls: failed to verify certificate: x509: OSStatus -26276 ```	2025-11-06 12:47:20 -08:00
Thibault Sottiaux	8c75ed39d5	feat: clarify that gpt-5-codex should not amend commits unless requested (#6333 )	2025-11-06 11:42:47 -08:00
iceweasel-oai	871d442b8e	Windows Sandbox: Show Everyone-writable directory warning (#6283 ) Show a warning when Auto Sandbox mode becomes enabled, if we detect Everyone-writable directories, since they cannot be protected by the current implementation of the Sandbox. This PR also includes changes to how we detect Everyone-writable to be much faster	2025-11-06 10:44:42 -08:00
Eric Traut	d7953aed74	Fixes intermittent test failures in CI (#6282 ) I'm seeing two tests fail intermittently in CI. This PR attempts to address (or at least mitigate) the flakiness. * summarize_context_three_requests_and_instructions - The test snapshots server.received_requests() immediately after observing TaskComplete. Because the OpenAI /v1/responses call is streamed, the HTTP request can still be draining when that event fires, so wiremock occasionally reports only two captured requests. Fix is to wait for async activity to complete. * archive_conversation_moves_rollout_into_archived_directory - times out on a slow CI run. Mitigation is to increase timeout value from 10s to 20s.	2025-11-05 13:12:25 -08:00
Owen Lin	2ab1650d4d	[app-server] feat: v2 Thread APIs (#6214 ) Implements: ``` thread/list thread/start thread/resume thread/archive ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `thread/start` and `thread/resume` automatically attaches a conversation listener internally, so clients don't have to make a separate `AddConversationListener` call like they do today. For consistency, also updated `model/list` and `feedback/upload` (naming conventions, list API params).	2025-11-05 20:28:43 +00:00
Eric Traut	c4ebe4b078	Improved token refresh handling to address "Re-connecting" behavior (#6231 ) Currently, when the access token expires, we attempt to use the refresh token to acquire a new access token. This works most of the time. However, there are situations where the refresh token is expired, exhausted (already used to perform a refresh), or revoked. In those cases, the current logic treats the error as transient and attempts to retry it repeatedly. This PR changes the token refresh logic to differentiate between permanent and transient errors. It also changes callers to treat the permanent errors as fatal rather than retrying them. And it provides better error messages to users so they understand how to address the problem. These error messages should also help us further understand why we're seeing examples of refresh token exhaustion. Here is the error message in the CLI. The same text appears within the extension. <img width="863" height="38" alt="image" src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f" /> I also correct the spelling of "Re-connecting", which shouldn't have a hyphen in it. Testing: I manually tested these code paths by adding temporary code to programmatically cause my refresh token to be exhausted (by calling the token refresh endpoint in a tight loop more than 50 times). I then simulated an access token expiration, which caused the token refresh logic to be invoked. I confirmed that the updated logic properly handled the error condition. Note: We earlier discussed the idea of forcefully logging out the user at the point where token refresh failed. I made several attempts to do this, and all of them resulted in a bad UX. It's important to surface this error to users in a way that explains the problem and tells them that they need to log in again. We also previously discussed deleting the auth.json file when this condition is detected. That also creates problems because it effectively changes the auth status from logged in to logged out, and this causes odd failures and inconsistent UX. I think it's therefore better not to delete auth.json in this case. If the user closes the CLI or VSCE and starts it again, we properly detect that the access token is expired and the refresh token is "dead", and we force the user to go through the login flow at that time. This should address aspects of #6191, #5679, and #5505	2025-11-05 10:51:57 -08:00
Ahmed Ibrahim	1a89f70015	refactor Conversation history file into its own directory (#6229 ) This is just a refactor of `conversation_history` file by breaking it up into multiple smaller ones with helper. This refactor will help us move more functionality related to context management here. in a clean way.	2025-11-05 10:49:35 -08:00
Andrew Dirksen	95af417923	allow codex to be run from pid 1 (#4200 ) Previously it was not possible for codex to run commands as the init process (pid 1) in linux. Commands run in containers tend to see their own pid as 1. See https://github.com/openai/codex/issues/4198 This pr implements the solution mentioned in that issue. Co-authored-by: Eric Traut <etraut@openai.com>	2025-11-04 17:54:46 -08:00
Soroush Yousefpour	fff576cf98	fix(core): load custom prompts from symlinked Markdown files (#3643 ) - Discover prompts via fs::metadata to follow symlinks - Add Unix-only symlink test in custom_prompts.rs - Update docs/prompts.md to mention symlinks Fixes #3637 --------- Signed-off-by: Soroush Yousefpour <h.yusefpour@gmail.com> Co-authored-by: dedrisian-oai <dedrisian@openai.com> Co-authored-by: Eric Traut <etraut@openai.com>	2025-11-04 17:44:02 -08:00
Ahmed Ibrahim	d40a6b7f73	fix: Update the deprecation message to link to the docs (#6211 ) The deprecation message is currently a bit confusing. Users may not understand what is `[features].x`. I updated the docs and the deprecation message for more guidance. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-11-04 21:02:27 +00:00
Ahmed Ibrahim	fe54c216a3	ignore deltas in `codex_delegate` (#6208 ) ignore legacy deltas in codex-delegate to avoid this [issue](https://github.com/openai/codex/pull/6202).	2025-11-04 19:21:35 +00:00
Ahmed Ibrahim	7e068e1094	fix: ignore reasoning deltas because we send it with turn item (#6202 ) should fix this: <img width="2418" height="242" alt="image" src="https://github.com/user-attachments/assets/f818d00b-ed3a-479b-94a7-e4bc5db6326e" />	2025-11-04 08:27:16 -08:00
Celia Chen	d3187dbc17	[App-server] v2 for account/updated and account/logout (#6175 ) V2 for `account/updated` and `account/logout` for app server. correspond to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs will make other v2 endpoints call `account/updated` instead of `authStatusChange` too.	2025-11-03 22:01:33 -08:00
Robby He	dc2f26f7b5	Fix is_api_message to correctly exclude reasoning messages (#6156 ) ## Problem The `is_api_message` function in `conversation_history.rs` had a misalignment between its documentation and implementation: - Comment stated: "Anything that is not a system message or 'reasoning' message is considered an API message" - Code behavior: Was returning `true` for `ResponseItem::Reasoning`, meaning reasoning messages were incorrectly treated as API messages This inconsistency could lead to reasoning messages being persisted in conversation history when they should be filtered out. ## Root Cause Investigation revealed that reasoning messages are explicitly excluded throughout the codebase: 1. Chat completions API (lines 267-272 in `chat_completions.rs`) omits reasoning from conversation history: ```rust ResponseItem::Reasoning { .. } \| ResponseItem::Other => { // Omit these items from the conversation history. continue; } ``` 2. Existing tests like `drops_reasoning_when_last_role_is_user` and `ignores_reasoning_before_last_user` validate that reasoning should be excluded from API payloads ## Solution Fixed the `is_api_message` function to align with its documentation and the rest of the codebase: ```rust // Before: Reasoning was incorrectly returning true ResponseItem::Reasoning { .. } \| ResponseItem::WebSearchCall { .. } => true, // After: Reasoning correctly returns false ResponseItem::WebSearchCall { .. } => true, ResponseItem::Reasoning { .. } \| ResponseItem::Other => false, ``` ## Testing - Enhanced existing test to verify reasoning messages are properly filtered out - All 264 core tests pass, including 8 chat completions tests that validate reasoning behavior - No regressions introduced This ensures reasoning messages are consistently excluded from API message processing across the entire codebase.	2025-11-03 20:55:41 -08:00
Eric Traut	1e0e553304	Fixed notify handler so it passes correct `input_messages` details (#6143 ) This fixes bug #6121. The `input_messages` field passed to the notify handler is currently empty because the logic is incorrectly including the OutputText rather than InputText. I've fixed that and added proper filtering to remove messages associated with AGENTS.md and other context injected by the harness. Testing: I wrote a notify handler and verified that the user prompt is correctly passed through to the handler.	2025-11-03 14:23:04 -08:00
iceweasel-oai	07b7d28937	log sandbox commands to $CODEX_HOME instead of cwd (#6171 ) Logging commands in the Windows Sandbox is temporary, but while we are doing it, let's always write to CODEX_HOME instead of dirtying the cwd.	2025-11-03 13:12:33 -08:00
Ahmed Ibrahim	6ee7fbcfff	feat: add the time after aborting (#5996 ) Tell the model how much time passed after the user aborted the call.	2025-11-03 11:44:06 -08:00
iceweasel-oai	2eda75a8ee	Do not skip trust prompt on Windows if sandbox is enabled. (#6167 ) If the experimental windows sandbox is enabled, the trust prompt should show on Windows.	2025-11-03 11:27:45 -08:00
Vinh Nguyen	a1ee10b438	fix: improve usage URLs in status card and snapshots (#6111 ) Hi OpenAI Codex team, currently "Visit chatgpt.com/codex/settings/usage for up-to-date information on rate limits and credits" message in status card and error messages. For now, without the "https://" prefix, the link cannot be clicked directly from most terminals or chat interfaces. <img width="636" height="127" alt="Screenshot 2025-11-02 at 22 47 06" src="https://github.com/user-attachments/assets/5ea11e8b-fb74-451c-85dc-f4d492b2678b" /> --- The fix is intent to improve this issue: - It makes the link clickable in terminals that support it, hence better accessibility - It follows standard URL formatting practices - It maintains consistency with other links in the application (like the existing "https://openai.com/chatgpt/pricing" links) Thank you!	2025-11-02 21:44:59 -08:00
Eric Traut	0c7efa0cfd	Fix incorrect "deprecated" message about experimental config key (#6131 ) When I enable `experimental_sandbox_command_assessment`, I get an incorrect deprecation warning: "experimental_sandbox_command_assessment is deprecated. Use experimental_sandbox_command_assessment instead." This PR fixes this error.	2025-11-02 16:33:09 -08:00
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
Thomas Stokes	d9118c04bf	Parse the Azure OpenAI rate limit message (#5956 ) Fixes #4161 Currently Codex uses a regex to parse the "Please try again in 1.898s" OpenAI-style rate limit message, so that it can wait the correct duration before retrying. Azure OpenAI returns a different error that looks like "Rate limit exceeded. Try again in 35 seconds." This PR extends the regex and parsing code to match in a more fuzzy manner, handling anything matching the pattern "try again in \<duration>\<unit>".	2025-11-01 09:33:13 -07:00
jif-oai	611e00c862	feat: compactor 2 (#6027 ) Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-10-31 14:27:08 -07:00
Ahmed Ibrahim	c8ebb2a0dc	Add warning on compact (#6052 ) This PR introduces the ability for `core` to send `warnings` as it can send `errors. It also sends a warning on compaction. <img width="811" height="187" alt="image" src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a" />	2025-10-31 13:27:33 -07:00
Dylan Hurd	88e083a9d0	chore: Add shell serialization tests for json (#6043 ) ## Summary Can never have enough tests on this code path - checking that json inside a shell call is deserialized correctly. ## Tests - [x] These are tests 😎	2025-10-31 11:01:58 -07:00
Ahmed Ibrahim	1c8507b32a	Truncate total tool calls text (#5979 ) Put a cap on the aggregate output of text content on tool calls. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-10-31 10:30:36 -07:00
jif-oai	0508823075	test: undo (#6034 )	2025-10-31 14:46:24 +00:00
pakrym-oai	2371d771cc	Update user instruction message format (#6010 )	2025-10-30 18:44:02 -07:00
Ahmed Ibrahim	dc2aeac21f	override verbosity for gpt-5-codex (#6007 ) we are seeing [reports](https://github.com/openai/codex/issues/6004) of users having verbosity in their config.toml and facing issues. gpt-5-codex doesn't accept other values rather than medium for verbosity.	2025-10-31 00:45:05 +00:00
Jack	f842849bec	docs: Fix markdown list item spacing in codex-rs/core/review_prompt.md (#4144 ) Fixes a Markdown parsing issue where a list item used `` without a following space (`Line ranges ...`). Per CommonMark, a space after the list marker is required. Updated to `* Line ranges ...` so the guideline renders as a standalone bullet. This change improves readability and prevents mis-parsing in renderers. Co-authored-by: Eric Traut <etraut@openai.com>	2025-10-30 17:39:21 -07:00
zhao-oai	dcf73970d2	rate limit errors now provide absolute time (#6000 )	2025-10-30 20:33:25 -04:00
Ahmed Ibrahim	a3d3719481	Remove last turn reasoning filtering (#5986 )	2025-10-30 23:20:32 +00:00
iceweasel-oai	87cce88f48	Windows Sandbox - Alpha version (#4905 ) - Added the new codex-windows-sandbox crate that builds both a library entry point (run_windows_sandbox_capture) and a CLI executable to launch commands inside a Windows restricted-token sandbox, including ACL management, capability SID provisioning, network lockdown, and output capture (windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54). - Introduced the experimental WindowsSandbox feature flag and wiring so Windows builds can opt into the sandbox: SandboxType::WindowsRestrictedToken, the in-process execution path, and platform sandbox selection now honor the flag (core/src/features.rs:47, core/src/config.rs:1224, core/src/safety.rs:19, core/src/sandboxing/mod.rs:69, core/src/exec.rs:79, core/src/exec.rs:172). - Updated workspace metadata to include the new crate and its Windows-specific dependencies so the core crate can link against it (codex-rs/ Cargo.toml:91, core/Cargo.toml:86). - Added a PowerShell bootstrap script that installs the Windows toolchain, required CLI utilities, and builds the workspace to ease development on the platform (scripts/setup-windows.ps1:1). - Landed a Python smoke-test suite that exercises read-only/workspace-write policies, ACL behavior, and network denial for the Windows sandbox binary (windows-sandbox-rs/sandbox_smoketests.py:1).	2025-10-30 15:51:57 -07:00
Bernard Niset	ff6d4cec6b	fix: Update seatbelt policy for java on macOS (#3987 ) # Summary This PR is related to the Issue #3978 and contains a fix to the seatbelt profile for macOS that allows to run java/jdk tooling from the sandbox. I have found that the included change is the minimum change to make it run on my machine. There is a unit test added by codex when making this fix. I wonder if it is useful since you need java installed on the target machine for it to be relevant. I can remove it it is better. Fixes #3978	2025-10-30 14:25:04 -07:00
Celia Chen	6ef658a9f9	[Hygiene] Remove `include_view_image_tool` config (#5976 ) There's still some debate about whether we want to expose `tools.view_image` or `feature.view_image` so those are left unchanged for now, but this old `include_view_image_tool` config is good-to-go. Also updated the doc to reflect that `view_image` tool is now by default true.	2025-10-30 13:23:24 -07:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Dylan Hurd	4a55646a02	chore: testing on freeform apply_patch (#5952 ) ## Summary Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform apply_patch tool as opposed to the function call path. The good news is that all the tests pass with zero logical tests, with the exception of the heredoc, which doesn't really make sense in the freeform tool context anyway. @jif-oai since you wrote the original tests in #5557, I'd love your opinion on the right way to DRY these test cases between the two. Happy to set up a more sophisticated harness, but didn't want to go down the rabbit hole until we agreed on the right pattern ## Testing - [x] These are tests	2025-10-30 10:40:48 -07:00
jif-oai	f4f9695978	feat: compaction prompt configurable (#5959 ) ``` codex -c compact_prompt="Summarize in bullet points" ```	2025-10-30 14:24:24 +00:00
Ahmed Ibrahim	5fcc380bd9	Pass initial history as an optional to codex delegate (#5950 ) This will give us more freedom on controlling the delegation. i.e we can fork our history and run `compact`.	2025-10-30 07:22:42 -07:00
jif-oai	aa76003e28	chore: unify config crates (#5958 )	2025-10-30 10:28:32 +00:00
Ahmed Ibrahim	fac548e430	Send delegate header (#5942 ) Send delegate type header	2025-10-30 09:49:40 +00:00

1 2 3 4 5 ...

713 Commits