valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Eric Traut	d7953aed74	Fixes intermittent test failures in CI (#6282 ) I'm seeing two tests fail intermittently in CI. This PR attempts to address (or at least mitigate) the flakiness. * summarize_context_three_requests_and_instructions - The test snapshots server.received_requests() immediately after observing TaskComplete. Because the OpenAI /v1/responses call is streamed, the HTTP request can still be draining when that event fires, so wiremock occasionally reports only two captured requests. Fix is to wait for async activity to complete. * archive_conversation_moves_rollout_into_archived_directory - times out on a slow CI run. Mitigation is to increase timeout value from 10s to 20s.	2025-11-05 13:12:25 -08:00
Owen Lin	2ab1650d4d	[app-server] feat: v2 Thread APIs (#6214 ) Implements: ``` thread/list thread/start thread/resume thread/archive ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `thread/start` and `thread/resume` automatically attaches a conversation listener internally, so clients don't have to make a separate `AddConversationListener` call like they do today. For consistency, also updated `model/list` and `feedback/upload` (naming conventions, list API params).	2025-11-05 20:28:43 +00:00
Eric Traut	c4ebe4b078	Improved token refresh handling to address "Re-connecting" behavior (#6231 ) Currently, when the access token expires, we attempt to use the refresh token to acquire a new access token. This works most of the time. However, there are situations where the refresh token is expired, exhausted (already used to perform a refresh), or revoked. In those cases, the current logic treats the error as transient and attempts to retry it repeatedly. This PR changes the token refresh logic to differentiate between permanent and transient errors. It also changes callers to treat the permanent errors as fatal rather than retrying them. And it provides better error messages to users so they understand how to address the problem. These error messages should also help us further understand why we're seeing examples of refresh token exhaustion. Here is the error message in the CLI. The same text appears within the extension. <img width="863" height="38" alt="image" src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f" /> I also correct the spelling of "Re-connecting", which shouldn't have a hyphen in it. Testing: I manually tested these code paths by adding temporary code to programmatically cause my refresh token to be exhausted (by calling the token refresh endpoint in a tight loop more than 50 times). I then simulated an access token expiration, which caused the token refresh logic to be invoked. I confirmed that the updated logic properly handled the error condition. Note: We earlier discussed the idea of forcefully logging out the user at the point where token refresh failed. I made several attempts to do this, and all of them resulted in a bad UX. It's important to surface this error to users in a way that explains the problem and tells them that they need to log in again. We also previously discussed deleting the auth.json file when this condition is detected. That also creates problems because it effectively changes the auth status from logged in to logged out, and this causes odd failures and inconsistent UX. I think it's therefore better not to delete auth.json in this case. If the user closes the CLI or VSCE and starts it again, we properly detect that the access token is expired and the refresh token is "dead", and we force the user to go through the login flow at that time. This should address aspects of #6191, #5679, and #5505	2025-11-05 10:51:57 -08:00
Ahmed Ibrahim	1a89f70015	refactor Conversation history file into its own directory (#6229 ) This is just a refactor of `conversation_history` file by breaking it up into multiple smaller ones with helper. This refactor will help us move more functionality related to context management here. in a clean way.	2025-11-05 10:49:35 -08:00
Andrew Dirksen	95af417923	allow codex to be run from pid 1 (#4200 ) Previously it was not possible for codex to run commands as the init process (pid 1) in linux. Commands run in containers tend to see their own pid as 1. See https://github.com/openai/codex/issues/4198 This pr implements the solution mentioned in that issue. Co-authored-by: Eric Traut <etraut@openai.com>	2025-11-04 17:54:46 -08:00
Soroush Yousefpour	fff576cf98	fix(core): load custom prompts from symlinked Markdown files (#3643 ) - Discover prompts via fs::metadata to follow symlinks - Add Unix-only symlink test in custom_prompts.rs - Update docs/prompts.md to mention symlinks Fixes #3637 --------- Signed-off-by: Soroush Yousefpour <h.yusefpour@gmail.com> Co-authored-by: dedrisian-oai <dedrisian@openai.com> Co-authored-by: Eric Traut <etraut@openai.com>	2025-11-04 17:44:02 -08:00
Ahmed Ibrahim	d40a6b7f73	fix: Update the deprecation message to link to the docs (#6211 ) The deprecation message is currently a bit confusing. Users may not understand what is `[features].x`. I updated the docs and the deprecation message for more guidance. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-11-04 21:02:27 +00:00
Ahmed Ibrahim	fe54c216a3	ignore deltas in `codex_delegate` (#6208 ) ignore legacy deltas in codex-delegate to avoid this [issue](https://github.com/openai/codex/pull/6202).	2025-11-04 19:21:35 +00:00
Ahmed Ibrahim	7e068e1094	fix: ignore reasoning deltas because we send it with turn item (#6202 ) should fix this: <img width="2418" height="242" alt="image" src="https://github.com/user-attachments/assets/f818d00b-ed3a-479b-94a7-e4bc5db6326e" />	2025-11-04 08:27:16 -08:00
Celia Chen	d3187dbc17	[App-server] v2 for account/updated and account/logout (#6175 ) V2 for `account/updated` and `account/logout` for app server. correspond to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs will make other v2 endpoints call `account/updated` instead of `authStatusChange` too.	2025-11-03 22:01:33 -08:00
Robby He	dc2f26f7b5	Fix is_api_message to correctly exclude reasoning messages (#6156 ) ## Problem The `is_api_message` function in `conversation_history.rs` had a misalignment between its documentation and implementation: - Comment stated: "Anything that is not a system message or 'reasoning' message is considered an API message" - Code behavior: Was returning `true` for `ResponseItem::Reasoning`, meaning reasoning messages were incorrectly treated as API messages This inconsistency could lead to reasoning messages being persisted in conversation history when they should be filtered out. ## Root Cause Investigation revealed that reasoning messages are explicitly excluded throughout the codebase: 1. Chat completions API (lines 267-272 in `chat_completions.rs`) omits reasoning from conversation history: ```rust ResponseItem::Reasoning { .. } \| ResponseItem::Other => { // Omit these items from the conversation history. continue; } ``` 2. Existing tests like `drops_reasoning_when_last_role_is_user` and `ignores_reasoning_before_last_user` validate that reasoning should be excluded from API payloads ## Solution Fixed the `is_api_message` function to align with its documentation and the rest of the codebase: ```rust // Before: Reasoning was incorrectly returning true ResponseItem::Reasoning { .. } \| ResponseItem::WebSearchCall { .. } => true, // After: Reasoning correctly returns false ResponseItem::WebSearchCall { .. } => true, ResponseItem::Reasoning { .. } \| ResponseItem::Other => false, ``` ## Testing - Enhanced existing test to verify reasoning messages are properly filtered out - All 264 core tests pass, including 8 chat completions tests that validate reasoning behavior - No regressions introduced This ensures reasoning messages are consistently excluded from API message processing across the entire codebase.	2025-11-03 20:55:41 -08:00
Eric Traut	1e0e553304	Fixed notify handler so it passes correct `input_messages` details (#6143 ) This fixes bug #6121. The `input_messages` field passed to the notify handler is currently empty because the logic is incorrectly including the OutputText rather than InputText. I've fixed that and added proper filtering to remove messages associated with AGENTS.md and other context injected by the harness. Testing: I wrote a notify handler and verified that the user prompt is correctly passed through to the handler.	2025-11-03 14:23:04 -08:00
iceweasel-oai	07b7d28937	log sandbox commands to $CODEX_HOME instead of cwd (#6171 ) Logging commands in the Windows Sandbox is temporary, but while we are doing it, let's always write to CODEX_HOME instead of dirtying the cwd.	2025-11-03 13:12:33 -08:00
Ahmed Ibrahim	6ee7fbcfff	feat: add the time after aborting (#5996 ) Tell the model how much time passed after the user aborted the call.	2025-11-03 11:44:06 -08:00
iceweasel-oai	2eda75a8ee	Do not skip trust prompt on Windows if sandbox is enabled. (#6167 ) If the experimental windows sandbox is enabled, the trust prompt should show on Windows.	2025-11-03 11:27:45 -08:00
Vinh Nguyen	a1ee10b438	fix: improve usage URLs in status card and snapshots (#6111 ) Hi OpenAI Codex team, currently "Visit chatgpt.com/codex/settings/usage for up-to-date information on rate limits and credits" message in status card and error messages. For now, without the "https://" prefix, the link cannot be clicked directly from most terminals or chat interfaces. <img width="636" height="127" alt="Screenshot 2025-11-02 at 22 47 06" src="https://github.com/user-attachments/assets/5ea11e8b-fb74-451c-85dc-f4d492b2678b" /> --- The fix is intent to improve this issue: - It makes the link clickable in terminals that support it, hence better accessibility - It follows standard URL formatting practices - It maintains consistency with other links in the application (like the existing "https://openai.com/chatgpt/pricing" links) Thank you!	2025-11-02 21:44:59 -08:00
Eric Traut	0c7efa0cfd	Fix incorrect "deprecated" message about experimental config key (#6131 ) When I enable `experimental_sandbox_command_assessment`, I get an incorrect deprecation warning: "experimental_sandbox_command_assessment is deprecated. Use experimental_sandbox_command_assessment instead." This PR fixes this error.	2025-11-02 16:33:09 -08:00
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
Thomas Stokes	d9118c04bf	Parse the Azure OpenAI rate limit message (#5956 ) Fixes #4161 Currently Codex uses a regex to parse the "Please try again in 1.898s" OpenAI-style rate limit message, so that it can wait the correct duration before retrying. Azure OpenAI returns a different error that looks like "Rate limit exceeded. Try again in 35 seconds." This PR extends the regex and parsing code to match in a more fuzzy manner, handling anything matching the pattern "try again in \<duration>\<unit>".	2025-11-01 09:33:13 -07:00
jif-oai	611e00c862	feat: compactor 2 (#6027 ) Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-10-31 14:27:08 -07:00
Ahmed Ibrahim	c8ebb2a0dc	Add warning on compact (#6052 ) This PR introduces the ability for `core` to send `warnings` as it can send `errors. It also sends a warning on compaction. <img width="811" height="187" alt="image" src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a" />	2025-10-31 13:27:33 -07:00
Dylan Hurd	88e083a9d0	chore: Add shell serialization tests for json (#6043 ) ## Summary Can never have enough tests on this code path - checking that json inside a shell call is deserialized correctly. ## Tests - [x] These are tests 😎	2025-10-31 11:01:58 -07:00
Ahmed Ibrahim	1c8507b32a	Truncate total tool calls text (#5979 ) Put a cap on the aggregate output of text content on tool calls. --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-10-31 10:30:36 -07:00
jif-oai	0508823075	test: undo (#6034 )	2025-10-31 14:46:24 +00:00
pakrym-oai	2371d771cc	Update user instruction message format (#6010 )	2025-10-30 18:44:02 -07:00
Ahmed Ibrahim	dc2aeac21f	override verbosity for gpt-5-codex (#6007 ) we are seeing [reports](https://github.com/openai/codex/issues/6004) of users having verbosity in their config.toml and facing issues. gpt-5-codex doesn't accept other values rather than medium for verbosity.	2025-10-31 00:45:05 +00:00
Jack	f842849bec	docs: Fix markdown list item spacing in codex-rs/core/review_prompt.md (#4144 ) Fixes a Markdown parsing issue where a list item used `` without a following space (`Line ranges ...`). Per CommonMark, a space after the list marker is required. Updated to `* Line ranges ...` so the guideline renders as a standalone bullet. This change improves readability and prevents mis-parsing in renderers. Co-authored-by: Eric Traut <etraut@openai.com>	2025-10-30 17:39:21 -07:00
zhao-oai	dcf73970d2	rate limit errors now provide absolute time (#6000 )	2025-10-30 20:33:25 -04:00
Ahmed Ibrahim	a3d3719481	Remove last turn reasoning filtering (#5986 )	2025-10-30 23:20:32 +00:00
iceweasel-oai	87cce88f48	Windows Sandbox - Alpha version (#4905 ) - Added the new codex-windows-sandbox crate that builds both a library entry point (run_windows_sandbox_capture) and a CLI executable to launch commands inside a Windows restricted-token sandbox, including ACL management, capability SID provisioning, network lockdown, and output capture (windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54). - Introduced the experimental WindowsSandbox feature flag and wiring so Windows builds can opt into the sandbox: SandboxType::WindowsRestrictedToken, the in-process execution path, and platform sandbox selection now honor the flag (core/src/features.rs:47, core/src/config.rs:1224, core/src/safety.rs:19, core/src/sandboxing/mod.rs:69, core/src/exec.rs:79, core/src/exec.rs:172). - Updated workspace metadata to include the new crate and its Windows-specific dependencies so the core crate can link against it (codex-rs/ Cargo.toml:91, core/Cargo.toml:86). - Added a PowerShell bootstrap script that installs the Windows toolchain, required CLI utilities, and builds the workspace to ease development on the platform (scripts/setup-windows.ps1:1). - Landed a Python smoke-test suite that exercises read-only/workspace-write policies, ACL behavior, and network denial for the Windows sandbox binary (windows-sandbox-rs/sandbox_smoketests.py:1).	2025-10-30 15:51:57 -07:00
Bernard Niset	ff6d4cec6b	fix: Update seatbelt policy for java on macOS (#3987 ) # Summary This PR is related to the Issue #3978 and contains a fix to the seatbelt profile for macOS that allows to run java/jdk tooling from the sandbox. I have found that the included change is the minimum change to make it run on my machine. There is a unit test added by codex when making this fix. I wonder if it is useful since you need java installed on the target machine for it to be relevant. I can remove it it is better. Fixes #3978	2025-10-30 14:25:04 -07:00
Celia Chen	6ef658a9f9	[Hygiene] Remove `include_view_image_tool` config (#5976 ) There's still some debate about whether we want to expose `tools.view_image` or `feature.view_image` so those are left unchanged for now, but this old `include_view_image_tool` config is good-to-go. Also updated the doc to reflect that `view_image` tool is now by default true.	2025-10-30 13:23:24 -07:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Dylan Hurd	4a55646a02	chore: testing on freeform apply_patch (#5952 ) ## Summary Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform apply_patch tool as opposed to the function call path. The good news is that all the tests pass with zero logical tests, with the exception of the heredoc, which doesn't really make sense in the freeform tool context anyway. @jif-oai since you wrote the original tests in #5557, I'd love your opinion on the right way to DRY these test cases between the two. Happy to set up a more sophisticated harness, but didn't want to go down the rabbit hole until we agreed on the right pattern ## Testing - [x] These are tests	2025-10-30 10:40:48 -07:00
jif-oai	f4f9695978	feat: compaction prompt configurable (#5959 ) ``` codex -c compact_prompt="Summarize in bullet points" ```	2025-10-30 14:24:24 +00:00
Ahmed Ibrahim	5fcc380bd9	Pass initial history as an optional to codex delegate (#5950 ) This will give us more freedom on controlling the delegation. i.e we can fork our history and run `compact`.	2025-10-30 07:22:42 -07:00
jif-oai	aa76003e28	chore: unify config crates (#5958 )	2025-10-30 10:28:32 +00:00
Ahmed Ibrahim	fac548e430	Send delegate header (#5942 ) Send delegate type header	2025-10-30 09:49:40 +00:00
zhao-oai	b34efde2f3	asdf (#5940 ) .	2025-10-30 01:10:41 +00:00
Ahmed Ibrahim	7aa46ab5fc	ignore agent message deltas for the review mode (#5937 ) The deltas produce the whole json output. ignore them.	2025-10-30 00:47:55 +00:00
pakrym-oai	3429e82e45	Add item streaming events (#5546 ) Adds AgentMessageContentDelta, ReasoningContentDelta, ReasoningRawContentDelta item streaming events while maintaining compatibility for old events. --------- Co-authored-by: Owen Lin <owen@openai.com>	2025-10-29 22:33:57 +00:00
Ahmed Ibrahim	13e1d0362d	Delegate review to codex instance (#5572 ) In this PR, I am exploring migrating task kind to an invocation of Codex. The main reason would be getting rid off multiple `ConversationHistory` state and streamlining our context/history management. This approach depends on opening a channel between the sub-codex and codex. This channel is responsible for forwarding `interactive` (`approvals`) and `non-interactive` events. The `task` is responsible for handling those events. This opens the door for implementing `codex as a tool`, replacing `compact` and `review`, and potentially subagents. One consideration is this code is very similar to `app-server` specially in the approval part. If in the future we wanted an interactive `sub-codex` we should consider using `codex-mcp`	2025-10-29 21:04:25 +00:00
jif-oai	db31f6966d	chore: config editor (#5878 ) The goal is to have a single place where we actually write files In a follow-up PR, will move everything config related in a dedicated module and move the helpers in a dedicated file	2025-10-29 20:52:46 +00:00
Rasmus Rygaard	39e09c289d	Add a wrapper around raw response items (#5923 ) We currently have nested enums when sending raw response items in the app-server protocol. This makes downstream schemas confusing because we need to embed `type`-discriminated enums within each other. This PR adds a small wrapper around the response item so we can keep the schemas separate	2025-10-29 20:32:40 +00:00
jif-oai	3183935bd7	feat: add output even in sandbox denied (#5908 )	2025-10-29 18:21:18 +00:00
jif-oai	060637b4d4	feat: deprecation warning (#5825 ) <img width="955" height="311" alt="Screenshot 2025-10-28 at 14 26 25" src="https://github.com/user-attachments/assets/99729b3d-3bc9-4503-aab3-8dc919220ab4" />	2025-10-29 12:29:28 +00:00
jif-oai	fa92cd92fa	chore: merge git crates (#5909 ) Merge `git-apply` and `git-tooling` into `utils/`	2025-10-29 12:11:44 +00:00
Abhishek Bhardwaj	89591e4246	feature: Add "!cmd" user shell execution (#2471 ) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). Summary of changes - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. End-to-end flow TUI handling 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. Core submission loop 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. Task execution 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. Shell tool runtime 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. Completion 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. Conversation context - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. Demo video https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996	2025-10-29 00:31:20 -07:00
Axojhf	802d2440b4	Fix bash detection failure in VS Code Codex extension on Windows under certain conditions (#3421 ) Found that the VS Code Codex extension throws “Error starting conversation” when initializing a conversation with Git for Windows’ bash on PATH. Debugging showed the bash-detection logic did not return as expected; this change makes it reliable in that scenario. Possibly related to issue #2841.	2025-10-28 21:29:16 -07:00
pakrym-oai	ef3e075ad6	Refresh tokens more often and log a better message when both auth and token refresh fails (#5655 ) <img width="784" height="153" alt="image" src="https://github.com/user-attachments/assets/c44b0eb2-d65c-4fc2-8b54-b34f7e1c4d95" />	2025-10-28 18:55:53 -07:00

1 2 3 4 5 ...

701 Commits