valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Kazuhiro Sera	db30a6f5d8	Fix #2391 Add Ctrl+H as backspace keyboard shortcut (#2412 ) This pull request resolves #2391. ctrl + h is not assigned to any other operations at this moment, and this feature request sounds valid to me. If we don't prefer having this, please feel free to close this.	2025-08-18 16:00:29 -07:00
Ahmed Ibrahim	ecb388045c	Add cache tests for UserTurn (#2432 )	2025-08-18 21:28:09 +00:00
Michael Bolin	fc6cfd5ecc	protocol-ts (#2425 )	2025-08-18 13:08:53 -07:00
Ahmed Ibrahim	c283f9f6ce	Add an operation to override current task context (#2431 ) - Added an operation to override current task context - Added a test to check that cache stays the same	2025-08-18 19:59:19 +00:00
Ahmed Ibrahim	c9963b52e9	consolidate reasoning enums into one (#2428 ) We have three enums for each of reasoning summaries and reasoning effort with same values. They can be consolidated into one.	2025-08-18 11:50:17 -07:00
Michael Bolin	a4f76bd75a	chore: add TS annotation to generated mcp-types (#2424 ) Adds the `TS` annotation from https://crates.io/crates/ts-rs to all types to facilitate codegen. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2424). * __->__ #2424 * #2423	2025-08-18 09:38:47 -07:00
Michael Bolin	712bfa04ac	chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423 ) The existing `wire_format.rs` should share more types with the `codex-protocol` crate (like `AskForApproval` instead of maintaining a parallel `CodexToolCallApprovalPolicy` enum), so this PR moves `wire_format.rs` into `codex-protocol`, renaming it as `mcp-protocol.rs`. We also de-dupe types, where appropriate. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423). * #2424 * __->__ #2423	2025-08-18 09:36:57 -07:00
ae	da69d50c60	fix: stop using ANSI blue (#2421 ) - One less color. - Replaced with cyan which looks better next to other cyan components.	2025-08-18 16:02:25 +00:00
ae	5bce369c4d	fix: clean up styles & colors and define in styles.md (#2401 ) New style guide: # Headers, primary, and secondary text - Headers: Use `bold`. For markdown with various header levels, leave in the `#` signs. - Primary text: Default. - Secondary text: Use `dim`. # Foreground colors - Default: Most of the time, just use the default foreground color. `reset` can help get it back. - Selection: Use ANSI `blue`. (Ed & AE want to make this cyan too, but we'll do that in a followup since it's riskier in different themes.) - User input tips and status indicators: Use ANSI `cyan`. - Success and additions: Use ANSI `green`. - Errors, failures and deletions: Use ANSI `red`. - Codex: Use ANSI `magenta`. # Avoid - Avoid custom colors because there's no guarantee that they'll contrast well or look good on various terminal color themes. - Avoid ANSI `black`, `white`, `yellow` as foreground colors because the terminal theme will do a better job. (Use `reset` if you need to in order to get those.) The exception is if you need contrast rendering over a manually colored background. (There are some rules to try to catch this in `clippy.toml`.) # Testing Tested in a variety of light and dark color themes in Terminal, iTerm2, and Ghostty.	2025-08-18 08:26:29 -07:00
Michael Bolin	a269754668	remove mcp-server/src/mcp_protocol.rs and the code that depends on it (#2360 )	2025-08-18 00:29:18 -07:00
Michael Bolin	b581498882	fix: introduce EventMsg::TurnAborted (#2365 ) Introduces `EventMsg::TurnAborted` that should be sent in response to `Op::Interrupt`. In the MCP server, updates the handling of a `ClientRequest::InterruptConversation` request such that it sends the `Op::Interrupt` but does not respond to the request until it sees an `EventMsg::TurnAborted`.	2025-08-17 21:40:31 -07:00
Michael Bolin	71cae06e66	fix: refactor login/src/server.rs so process_request() is a separate function (#2388 )	2025-08-17 12:32:56 -07:00
Eric Traut	350b00d54b	Added MCP server command to enable authentication using ChatGPT (#2373 ) This PR adds two new APIs for the MCP server: 1) loginChatGpt, and 2) cancelLoginChatGpt. The first starts a login server and returns a local URL that allows for browser-based authentication, and the second provides a way to cancel the login attempt. If the login attempt succeeds, a notification (in the form of an event) is sent to a subscriber. I also added a timeout mechanism for the existing login server. The loginChatGpt code path uses a 10-minute timeout by default, so if the user fails to complete the login flow in that timeframe, the login server automatically shuts down. I tested the timeout code by manually setting the timeout to a much lower number and confirming that it works as expected when used e2e.	2025-08-17 10:03:52 -07:00
Jeremy Rose	7a80d3c96c	replace /prompts with a rotating placeholder (#2314 )	2025-08-15 19:37:10 -07:00
aibrahim-oai	d3078b9adc	Show progress indicator for /diff command (#2245 ) ## Summary - Show a temporary Working on diff state in the bottom pan - Add `DiffResult` app event and dispatch git diff asynchronously ## Testing - `just fmt` - `just fix` (fails: `let` expressions in this position are unstable) - `cargo test --all-features` (fails: `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_689a839f32b88321840a893551d5fbef	2025-08-15 15:32:41 -07:00
Jeremy Rose	1ad8ae2579	color the status letter in apply patch summary (#2337 ) <img width="440" height="77" alt="Screenshot 2025-08-14 at 8 30 30 PM" src="https://github.com/user-attachments/assets/c6169a3a-2e98-4ace-b7ee-918cf4368b7a" />	2025-08-15 20:25:48 +00:00
pakrym-oai	c1156a878b	Remove duplicated "Successfully logged in message" (#2357 )	2025-08-15 13:01:27 -07:00
Kazuhiro Sera	dcfdd2faf5	Fix #2296 Add "minimal" reasoning effort for GPT 5 models (#2326 ) This pull request resolves #2296; I've confirmed if it works by: 1. Add settings to ~/.codex/config.toml: ```toml model_reasoning_effort = "minimal" ``` 2. Run the CLI: ``` cd codex-rs cargo build && RUST_LOG=trace cargo run --bin codex /status tail -f ~/.codex/log/codex-tui.log ``` Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-08-15 12:59:52 -07:00
Michael Bolin	d262244725	fix: introduce codex-protocol crate (#2355 )	2025-08-15 12:44:40 -07:00
Jeremy Rose	7c26c8e091	tui: skip identical consecutive entries in local composer history (#2352 ) This PR avoids inserting duplicate consecutive messages into the Chat Composer's local history.	2025-08-15 10:55:44 -07:00
Michael Bolin	eda50d8372	feat: introduce ClientRequest::SendUserTurn (#2345 ) This adds a new request type, `SendUserTurn`, that makes it possible to submit a `Op::UserTurn` operation (introduced in #2329) to a conversation. This PR also adds a new integration test that verifies that changing from `AskForApproval::UnlessTrusted` to `AskForApproval::Never` mid-conversation ensures that an elicitation is no longer sent for running `python3 -c print(42)`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * __->__ #2345 * #2329 * #2343 * #2340 * #2338	2025-08-15 10:05:58 -07:00
Michael Bolin	17aa394ae7	feat: introduce Op:UserTurn (#2329 ) This introduces `Op::UserTurn`, which makes it possible to override many of the fields that were set when the `Session` was originally created when creating a new conversation turn. This is one way we could support changing things like `model` or `cwd` in the middle of the conversation, though we may want to consider making each field optional, or alternatively having a separate `Op` that mutates the `TurnContext` associated with a `submission_loop()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2329). * #2345 * __->__ #2329 * #2343 * #2340 * #2338	2025-08-15 09:56:05 -07:00
Michael Bolin	13ed67cfc1	feat: introduce TurnContext (#2343 ) This PR introduces `TurnContext`, which is designed to hold a set of fields that should be constant for a turn of a conversation. Note that the fields of `TurnContext` were previously governed by `Session`. Ultimately, we want to enable users to change these values between turns (changing model, approval policy, etc.), though in the current implementation, the `TurnContext` is constant for the entire conversation. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * #2345 * #2329 * __->__ #2343 * #2340 * #2338	2025-08-15 09:40:02 -07:00
Jeremy Rose	45d6c74682	tui: align diff display by always showing sign char and keeping fixed gutter (#2353 ) diff lines without a sign char were misaligned.	2025-08-15 09:32:45 -07:00
Michael Bolin	265fd89e31	fix: try to fix flakiness in test_shell_command_approval_triggers_elicitation (#2344 ) I still see flakiness in `test_shell_command_approval_triggers_elicitation()` on occasion where `MockServer` claims it has not received all of its expected requests. I recently introduced a similar type of test in #2264, `test_codex_jsonrpc_conversation_flow()`, which I have not seen flake (yet!), so this PR pulls over two things I did in that test: - increased `worker_threads` from `2` to `4` - added an assertion to make sure the `task_complete` notification is received Honestly, I'm still not sure why `MockServer` claims it sometimes does not receive all its expected requests given that we assert that the final `JSONRPCResponse` is read on the stream, but let's give this a shot. Assuming this fixes things, my hypothesis is that the increase in `worker_threads` helps because perhaps there are async tasks in `MockServer` that do not reliably complete fully when there are not enough threads available? If that is correct, it seems like the test would still be flaky, though perhaps with lower frequency?	2025-08-15 09:17:20 -07:00
Michael Bolin	6730592433	fix: introduce MutexExt::lock_unchecked() so we stop ignoring unwrap() throughout codex.rs (#2340 ) This way we are sure a dangerous `unwrap()` does not sneak in! --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2340). * #2345 * #2329 * #2343 * __->__ #2340 * #2338	2025-08-15 09:14:44 -07:00
Michael Bolin	26c8373821	fix: tighten up checks against writable folders for SandboxPolicy (#2338 ) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338	2025-08-15 09:06:15 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Jeremy Rose	917e29803b	tui: include optional full command line in history display (#2334 ) Add env var to show the raw, unparsed command line under parsed commands. When we have transcript mode we should show the full command there, but this is useful for debugging.	2025-08-14 22:06:42 -07:00
pakrym-oai	5552688621	Format multiline commands (#2333 ) <img width="966" height="729" alt="image" src="https://github.com/user-attachments/assets/fa45b7e1-cd46-427f-b2bc-8501e9e4760b" /> <img width="797" height="530" alt="image" src="https://github.com/user-attachments/assets/6993eec5-e157-4df7-b558-15643ad10d64" />	2025-08-14 19:49:42 -07:00
pakrym-oai	76df07350a	Cleanup rust login server a bit more (#2331 ) Remove some extra abstractions. --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-14 19:42:14 -07:00
easong-openai	d0b907d399	re-implement session id in status (#2332 ) Basically the same thing as https://github.com/openai/codex/pull/2297	2025-08-15 02:14:46 +00:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Parker Thompson	c26d42ab69	Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309 ) When using codex-tui on a linux system I was unable to run `cargo clippy` inside of codex due to: ``` [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET\|SOCK_CLOEXEC, 0, <unfinished ...> [pid 3548370] close(8 <unfinished ...> [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted) ``` And ``` 3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted) ``` This PR: * Fixes a bug that disallowed AF_UNIX to allow it on `socket()` * Adds recvfrom() to the syscall allow list, this should be fine since we disable opening new sockets. But we should validate there is not a open socket inheritance issue. * Allow socketpair to be called for AF_UNIX * Adds tests for AF_UNIX components * All of which allows running `cargo clippy` within the sandbox on linux, and possibly other tooling using a fork server model + AF_UNIX comms.	2025-08-14 17:12:41 -07:00
easong-openai	e9b597cfa3	Port login server to rust (#2294 ) Port the login server to rust. --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-08-14 17:11:26 -07:00
Jeremy Rose	afc377bae5	clear running commands in various places (#2325 ) we have a very unclear lifecycle for the chatwidget—this should only have to be added in one place! but this fixes the "hanging commands" issue where the active_exec_cell wasn't correctly cleared when commands finished. To repro w/o this PR: 1. prompt "run sleep 10" 2. once the command starts running, press <kbd>Esc</kbd> 3. prompt "run echo hi" Expected: ``` ✓ Completed └ ⌨️ echo hi codex hi ``` Actual: ``` ⚙︎ Working └ ⌨️ echo hi ▌ Ask Codex to do anything ``` i.e. the "Working" never changes to "Completed". The bug is fixed with this PR.	2025-08-15 00:01:19 +00:00
Jeremy Rose	235987843c	add a timer to running exec commands (#2321 ) sometimes i switch back to codex and i don't know how long a command has been running. <img width="744" height="462" alt="Screenshot 2025-08-14 at 3 30 07 PM" src="https://github.com/user-attachments/assets/bd80947f-5a47-43e6-ad19-69c2995a2a29" />	2025-08-14 19:32:45 -04:00
Michael Bolin	6a0f709cff	fix: add call_id to ApprovalParams in mcp-server/src/wire_format.rs (#2322 ) Clients still need this field.	2025-08-14 16:09:12 -07:00
Michael Bolin	2ecca79663	fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318 ) The high-order bit on this PR is that it makes it so `sandbox.rs` tests both Mac and Linux, as we introduce a general `spawn_command_under_sandbox()` function with platform-specific implementations for testing. An important, and interesting, discovery in porting the test to Linux is that (for reasons cited in the code comments), `/dev/shm` has to be added to `writable_roots` on Linux in order for `multiprocessing.Lock` to work there. Granting write access to `/dev/shm` comes with some degree of risk, so we do not make this the default for Codex CLI. Piggybacking on top of #2317, this moves the `python_multiprocessing_lock_works` test yet again, moving `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs` because in `codex-rs/exec/tests` we can use `cargo_bin()` like so: ``` let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec"); ``` which is necessary so we can use `codex_linux_sandbox_exe` and therefore `spawn_command_under_linux_sandbox` in an integration test. This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs` and into `landlock.rs`, which makes things more consistent with `seatbelt.rs` in `codex-core`. For reference, https://github.com/openai/codex/pull/1808 is the PR that made the change to Seatbelt to get this test to pass on Mac.	2025-08-14 15:47:48 -07:00
Michael Bolin	a8c7f5391c	fix: move general sandbox tests to codex-rs/core/tests/sandbox.rs (#2317 ) Previous to this PR, `codex-rs/core/tests/sandbox.rs` contained integration tests that were specific to Seatbelt. This PR moves those tests to `codex-rs/core/src/seatbelt.rs` and designates `codex-rs/core/tests/sandbox.rs` to be used as the home for cross-platform (well, Mac and Linux...) sandbox tests. To start, this migrates `python_multiprocessing_lock_works_under_seatbelt()` from #1823 to the new `sandbox.rs` because this is the type of thing that should work on both Mac _and_ Linux, though I still need to do some work to clean up the test so it works on both platforms.	2025-08-14 14:48:38 -07:00
David Z Hao	992e81d9b5	test(core): add seatbelt sem lock tests (#1823 ) ## Summary - add a unit test to ensure the macOS seatbelt policy allows POSIX semaphores - add a macOS-only test that runs a Python multiprocessing Lock under Seatbelt ## Testing - `cargo test -p codex_core seatbelt_base_policy_allows_ipc_posix_sem --no-fail-fast` (failed: failed to download from `https://static.crates.io/crates/tokio-stream/0.1.17/download`) - `cargo test -p codex_core seatbelt_base_policy_allows_ipc_posix_sem --no-fail-fast --offline` (failed: attempting to make an HTTP request, but --offline was specified) - `cargo test --all-features --no-fail-fast --offline` (failed: attempting to make an HTTP request, but --offline was specified) - `just fmt` (failed: command not found: just) - `just fix` (failed: command not found: just) Ran tests locally to confirm it passes on master and failed before my previous change ------ https://chatgpt.com/codex/tasks/task_i_6890f221e0a4833381cfb53e11499bcc	2025-08-14 14:23:06 -07:00
Jeremy Rose	7038827bf4	fix bash commands being incorrectly quoted in display (#2313 ) The "display format" of commands was sometimes producing incorrect quoting like `echo foo '>' bar`, which is importantly different from the actual command that was being run. This refactors ParsedCommand to have a string in `cmd` instead of a vec, as a `vec` can't accurately capture a full command.	2025-08-14 17:08:29 -04:00
Jeremy Rose	20cd61e2a4	use a central animation loop (#2268 ) instead of each shimmer needing to have its own animation thread, have render_ref schedule a new frame if it wants one and coalesce to the earliest next frame. this also makes the animations frame-timing-independent, based on start time instead of frame count.	2025-08-14 16:59:47 -04:00
Jeremy Rose	fd2b059504	text elements in textarea for pasted content (#2302 ) This improves handling of pasted content in the textarea. It's no longer possible to partially delete a placeholder (e.g. by ^W or ^D), nor is it possible to place the cursor inside a placeholder. Also, we now render placeholders in a different color to make them more clearly differentiated. https://github.com/user-attachments/assets/2051b3c3-963d-4781-a610-3afee522ae29	2025-08-14 20:58:51 +00:00
Michael Bolin	c25f3ea53e	fix: do not allow dotenv to create/modify environment variables starting with CODEX_ (#2308 ) This ensures Codex cannot drop a `.env` file with a value of `CODEX_HOME` that points to a folder that Codex can control.	2025-08-14 13:57:15 -07:00
Michael Bolin	8f11652458	fix: parallelize logic in Session::new() (#2305 ) #2291 made it so that `Session::new()` is on the critical path to `Codex::spawn()`, which means it is on the hot path to CLI startup. This refactors `Session::new()` to run a number of async tasks in parallel that were previously run serially to try to reduce latency.	2025-08-14 13:29:58 -07:00
aibrahim-oai	b62c2d9552	remove logs from composer by default (#2307 ) Currently the composer shows `handle_codex_event:<event name>` by default which feels confusing. Let's make it appear in trace.	2025-08-14 13:01:15 -07:00
Jeremy Rose	475ba13479	remove the · animation (#2271 ) the pulsing dot felt too noisy to me next to the shimmering "Working" text. we'll bring it back for streaming response text perhaps?	2025-08-14 19:30:41 +00:00
Dylan	544980c008	[context] Store context messages in rollouts (#2243 ) ## Summary Currently, we use request-time logic to determine the user_instructions and environment_context messages. This means that neither of these values can change over time as conversations go on. We want to add in additional details here, so we're migrating these to save these messages to the rollout file instead. This is simpler for the client, and allows us to append additional environment_context messages to each turn if we want ## Testing - [x] Integration test coverage - [x] Tested locally with a few turns, confirmed model could reference environment context and cached token metrics were reasonably high	2025-08-14 14:51:13 -04:00
Jeremy Rose	b42e679227	remove "status text" in bottom line (#2279 ) this used to hold the most recent log line, but it was kinda broken and not that useful.	2025-08-14 14:10:21 -04:00

1 2 3 4 5 ...

514 Commits