valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Ed Bayes	839b2ae7cf	Change animation frames (#3627 ) ## Description - Changes animation frames to be smaller - Cleans up file names and popup logic ## tests - Passes local CI	2025-09-15 04:36:34 +00:00
easong-openai	6a8e743d57	initial mcp add interface (#3543 ) Adds `codex mcp add`, `codex mcp list`, `codex mcp remove`. Currently writes to global config.	2025-09-15 04:30:56 +00:00
Thibault Sottiaux	a797051921	chore: update swiftfox_prompt.md (#3624 )	2025-09-15 04:10:35 +00:00
Thibault Sottiaux	d7d9d96d6c	feat: add reasoning level to header (#3622 )	2025-09-15 03:59:22 +00:00
Ahmed Ibrahim	26f1246a89	Revert "refactor transcript view to handle HistoryCells" (#3614 ) Reverts openai/codex#3538 It panics on forking first message. It also calculates the index in a wrong way.	2025-09-15 03:39:36 +00:00
Ahmed Ibrahim	6581da9b57	Show the header when resuming a conversation (#3615 )	2025-09-15 03:31:08 +00:00
Eric Traut	900bb01486	When logging in using ChatGPT, make sure to overwrite API key (#3611 ) When logging in using ChatGPT using the `codex login` command, a successful login should write a new `auth.json` file with the ChatGPT token information. The old code attempted to retain the API key and merge the token information into the existing `auth.json` file. With the new simplified login mechanism, `auth.json` should have auth information for only ChatGPT or API Key, not both. The `codex login --api-key <key>` code path was already doing the right thing here, but the `codex login` command was incorrect. This PR fixes the problem and adds test cases for both commands.	2025-09-14 19:48:18 -07:00
Ahmed Ibrahim	2ad6a37192	Don't show the model for apikey (#3607 )	2025-09-15 01:32:18 +00:00
Eric Traut	e5dd7f0934	Fix get_auth_status response when using custom provider (#3581 ) This PR addresses an edge-case bug that appears in the VS Code extension in the following situation: 1. Log in using ChatGPT (using either the CLI or extension). This will create an `auth.json` file. 2. Manually modify `config.toml` to specify a custom provider. 3. Start a fresh copy of the VS Code extension. The profile menu in the VS Code extension will indicate that you are logged in using ChatGPT even though you're not. This is caused by the `get_auth_status` method returning an `auth_method: 'chatgpt'` when a custom provider is configured and it doesn't use OpenAI auth (i.e. `requires_openai_auth` is false). The method should always return `auth_method: None` if `requires_openai_auth` is false. The same bug also causes the NUX (new user experience) screen to be displayed in the VSCE in this situation.	2025-09-14 18:27:02 -07:00
Dylan	b6673838e8	fix: model family and apply_patch consistency (#3603 ) ## Summary Resolves a merge conflict between #3597 and #3560, and adds tests to double check our apply_patch configuration. ## Testing - [x] Added unit tests --------- Co-authored-by: dedrisian-oai <dedrisian@openai.com>	2025-09-14 18:20:37 -07:00
Fouad Matin	1823906215	fix(tui): update full-auto to default preset (#3608 ) Update `--full-auto` to use default preset	2025-09-14 18:14:11 -07:00
Fouad Matin	5185d69f13	fix(core): flaky test `completed_commands_do_not_persist_sessions` (#3596 ) Fix flaky test: ``` FAIL [ 2.641s] codex-core unified_exec::tests::completed_commands_do_not_persist_sessions stdout ─── running 1 test test unified_exec::tests::completed_commands_do_not_persist_sessions ... FAILED failures: failures: unified_exec::tests::completed_commands_do_not_persist_sessions test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 235 filtered out; finished in 2.63s stderr ─── thread 'unified_exec::tests::completed_commands_do_not_persist_sessions' panicked at core/src/unified_exec/mod.rs:582:9: assertion failed: result.output.contains("codex") ```	2025-09-14 18:04:05 -07:00
pakrym-oai	4dffa496ac	Skip frames files in codespell (#3606 ) Fixes CI	2025-09-14 18:00:23 -07:00
Ahmed Ibrahim	ce984b2c71	Add session header to chat widget (#3592 ) <img width="570" height="332" alt="image" src="https://github.com/user-attachments/assets/ca6dfcb0-f3a1-4b3e-978d-4f844ba77527" />	2025-09-14 17:53:50 -07:00
pakrym-oai	c47febf221	Append full raw reasoning event text (#3605 ) We don't emit correct delta events and only get full reasoning back. Append it to history.	2025-09-14 17:50:06 -07:00
jimmyfraiture2	76c37c5493	feat: UI animation (#3590 ) Add NUX animation --------- Co-authored-by: Thibault Sottiaux <tibo@openai.com>	2025-09-14 17:42:17 -07:00
dedrisian-oai	2aa84b8891	Fix EventMsg Optional (#3604 )	2025-09-15 00:34:33 +00:00
pakrym-oai	9177bdae5e	Only one branch for swiftfox (#3601 ) Make each model family have a single branch.	2025-09-14 16:56:22 -07:00
Ahmed Ibrahim	a30e5e40ee	enable-resume (#3537 ) Adding the ability to resume conversations. we have one verb `resume`. Behavior: `tui`: `codex resume`: opens session picker `codex resume --last`: continue last message `codex resume <session id>`: continue conversation with `session id` `exec`: `codex resume --last`: continue last conversation `codex resume <session id>`: continue conversation with `session id` Implementation: - I added a function to find the path in `~/.codex/sessions/` with a `UUID`. This is helpful in resuming with session id. - Added the above mentioned flags - Added lots of testing	2025-09-14 19:33:19 -04:00
jimmyfraiture2	99e1d33bd1	feat: update model save (#3589 ) Edit model save to save by default as global or on the profile depending on the session	2025-09-14 16:25:43 -07:00
dedrisian-oai	b2f6fc3b9a	Fix flaky windows test (#3564 ) There are exactly 4 types of flaky tests in Windows x86 right now: 1. `review_input_isolated_from_parent_history` => Times out waiting for closing events 2. `review_does_not_emit_agent_message_on_structured_output` => Times out waiting for closing events 3. `auto_compact_runs_after_token_limit_hit` => Times out waiting for closing events 4. `auto_compact_runs_after_token_limit_hit` => Also has a problem where auto compact should add a third request, but receives 4 requests. 1, 2, and 3 seem to be solved with increasing threads on windows runner from 2 -> 4. Don't know yet why # 4 is happening, but probably also because of WireMock issues on windows causing races.	2025-09-14 23:20:25 +00:00
pakrym-oai	51f88fd04a	Fix swiftfox model selector (#3598 ) The model shouldn't be saved with a suffix. The effort is a separate field.	2025-09-14 23:12:21 +00:00
pakrym-oai	916fdc2a37	Add per-model-family prompts (#3597 ) Allows more flexibility in defining prompts.	2025-09-14 22:45:15 +00:00
pakrym-oai	863d9c237e	Include command output when sending timeout to model (#3576 ) Being able to see the output helps the model decide how to handle the timeout.	2025-09-14 14:38:26 -07:00
Ahmed Ibrahim	7e1543f5d8	Align user history message prefix width (#3467 ) <img width="798" height="340" alt="image" src="https://github.com/user-attachments/assets/fdd63f40-9c94-4e3a-bce5-2d2f333a384f" />	2025-09-14 20:51:08 +00:00
Ahmed Ibrahim	d701eb32d7	Gate model upgrade prompt behind ChatGPT auth (#3586 ) - refresh the login_state after onboarding. - should be on chatgpt for upgrade	2025-09-14 13:08:24 -07:00
Michael Bolin	9baae77533	chore: update output_lines() to take a struct instead of a sequence of bools (#3591 ) I found the boolean literals hard to follow.	2025-09-14 13:07:38 -07:00
Ahmed Ibrahim	e932722292	Add spacing before queued status indicator messages (#3474 ) <img width="687" height="174" alt="image" src="https://github.com/user-attachments/assets/e68f5a29-cb2d-4aa6-9cbd-f492878d8d0a" />	2025-09-14 15:37:28 -04:00
Ahmed Ibrahim	bbea6bbf7e	Handle resuming/forking after compact (#3533 ) We need to construct the history different when compact happens. For this, we need to just consider the history after compact and convert compact to a response item. This needs to change and use `build_compact_history` when this #3446 is merged.	2025-09-14 13:23:31 +00:00
Jeremy Rose	4891ee29c5	refactor transcript view to handle HistoryCells (#3538 ) No (intended) functional change. This refactors the transcript view to hold a list of HistoryCells instead of a list of Lines. This simplifies and makes much of the logic more robust, as well as laying the groundwork for future changes, e.g. live-updating history cells in the transcript. Similar to #2879 in goal. Fixes #2755.	2025-09-13 19:23:14 -07:00
Thibault Sottiaux	bac8a427f3	chore: default swiftfox models to experimental reasoning summaries (#3560 )	2025-09-13 23:40:54 +00:00
Thibault Sottiaux	14ab1063a7	chore: rename	2025-09-12 23:17:41 -07:00
Thibault Sottiaux	a77364bbaa	chore: remove descriptions	2025-09-12 22:55:40 -07:00
Thibault Sottiaux	19b4ed3c96	w	2025-09-12 22:44:05 -07:00
pakrym-oai	3d4acbaea0	Preserve IDs for more item types in azure (#3542 ) https://github.com/openai/codex/issues/3509	2025-09-13 01:09:56 +00:00
pakrym-oai	414b8be8b6	Always request encrypted cot (#3539 ) Otherwise future requests will fail with 500	2025-09-12 23:51:30 +00:00
dedrisian-oai	90a0fd342f	Review Mode (Core) (#3401 ) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining why this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" \| "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.	2025-09-12 23:25:10 +00:00
jif-oai	8d56d2f655	fix: NIT None reasoning effort (#3536 ) Fix the reasoning effort not being set to None in the UI	2025-09-12 21:17:49 +00:00
jif-oai	8408f3e8ed	Fix NUX UI (#3534 ) Fix NUX UI	2025-09-12 14:09:31 -07:00
Jeremy Rose	b8ccfe9b65	core: expand default sandbox (#3483 ) this adds some more capabilities to the default sandbox which I feel are safe. Most are in the [renderer.sb](https://source.chromium.org/chromium/chromium/src/+/main:sandbox/policy/mac/renderer.sb) sandbox for chrome renderers, which i feel is fair game for codex commands. Specific changes: 1. Allow processes in the sandbox to send signals to any other process in the same sandbox (e.g. child processes or daemonized processes), instead of just themselves. 2. Allow user-preference-read 3. Allow process-info* to anything in the same sandbox. This is a bit wider than Chromium allows, but it seems OK to me to allow anything in the sandbox to get details about other processes in the same sandbox. Bazel uses these to e.g. wait for another process to exit. 4. Allow all CPU feature detection, this seems harmless to me. It's wider than Chromium, but Chromium is concerned about fingerprinting, and tightly controls what CPU features they actually care about, and we don't have either that restriction or that advantage. 5. Allow new sysctl-reads: ``` (sysctl-name "vm.loadavg") (sysctl-name-prefix "kern.proc.pgrp.") (sysctl-name-prefix "kern.proc.pid.") (sysctl-name-prefix "net.routetable.") ``` bazel needs these for waiting on child processes and for communicating with its local build server, i believe. I wonder if we should just allow all (sysctl-read), as reading any arbitrary info about the system seems fine to me. 6. Allow iokit-open on RootDomainUserClient. This has to do with power management I believe, and Chromium allows renderers to do this, so okay. Bazel needs it to boot successfully, possibly for sleep/wake callbacks? 7. Mach lookup to `com.apple.system.opendirectoryd.libinfo`, which has to do with user data, and which Chrome allows. 8. Mach lookup to `com.apple.PowerManagement.control`. Chromium allows its GPU process to do this, but not its renderers. Bazel needs this to boot, probably relatedly to sleep/wake stuff.	2025-09-12 14:03:02 -07:00
pakrym-oai	e3c6903199	Add Azure Responses API workaround (#3528 ) Azure Responses API doesn't work well with store:false and response items. If store = false and id is sent an error is thrown that ID is not found If store = false and id is not sent an error is thrown that ID is required Add detection for Azure urls and add a workaround to preserve reasoning item IDs and send store:true	2025-09-12 13:52:15 -07:00
Jeremy Rose	5f6e95b592	if a command parses as a patch, do not attempt to run it (#3382 ) sometimes the model forgets to actually invoke `apply_patch` and puts a patch as the script body. trying to execute this as bash sometimes creates files named `,` or `{` or does other unknown things, so catch this situation and return an error to the model.	2025-09-12 13:47:41 -07:00
Ahmed Ibrahim	a2e9cc5530	Update interruption error message styling (#3470 ) <img width="497" height="76" alt="image" src="https://github.com/user-attachments/assets/a1ad279d-1d01-41cd-ac14-b3343a392563" /> <img width="493" height="74" alt="image" src="https://github.com/user-attachments/assets/baf487ba-430e-40fe-8944-2071ec052962" />	2025-09-12 16:17:02 -04:00
jif-oai	ea225df22e	feat: context compaction (#3446 ) ## Compact feature: 1. Stops the model when the context window become too large 2. Add a user turn, asking for the model to summarize 3. Build a bridge that contains all the previous user message + the summary. Rendered from a template 4. Start sampling again from a clean conversation with only that bridge	2025-09-12 13:07:10 -07:00
Ahmed Ibrahim	d4848e558b	Add spacing before composer footer hints (#3469 ) <img width="647" height="82" alt="image" src="https://github.com/user-attachments/assets/867eb5d9-3076-4018-846e-260a50408185" />	2025-09-12 15:31:24 -04:00
Ahmed Ibrahim	1a6a95fb2a	Add spacing between dropdown headers and items (#3472 ) <img width="927" height="194" alt="image" src="https://github.com/user-attachments/assets/f4cb999b-16c3-448a-aed4-060bed8b96dd" /> <img width="1246" height="205" alt="image" src="https://github.com/user-attachments/assets/5d9ba5bd-0c02-46da-a809-b583a176528a" />	2025-09-12 15:31:15 -04:00
jif-oai	c6fd056aa6	feat: reasoning effort as optional (#3527 ) Allow the reasoning effort to be optional	2025-09-12 12:06:33 -07:00
Michael Bolin	abdcb40f4c	feat: change the behavior of SetDefaultModel RPC so None clears the value. (#3529 ) It turns out that we want slightly different behavior for the `SetDefaultModel` RPC because some models do not work with reasoning (like GPT-4.1), so we should be able to explicitly clear this value. Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.	2025-09-12 11:35:51 -07:00
Dylan	4ae6b9787a	standardize shell description (#3514 ) ## Summary Standardizes the shell description across sandbox_types, since we cover this in the prompt, and have moved necessary details (like network_access and writeable workspace roots) to EnvironmentContext messages. ## Test Plan - [x] updated unit tests	2025-09-12 14:24:09 -04:00
jif-oai	bba567cee9	bug: fix model save (#3525 ) Fix those 2 behaviors: 1. The model does not get saved if we don't CTRL + S 2. The reasoning effort get saved	2025-09-12 10:38:12 -07:00

1 2 3 4 5 ...

1215 Commits