valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	a90a58f7a1	Trim double Total output lines (#4787 )	2025-10-05 16:41:55 -07:00
pakrym-oai	b2d81a7cac	Make output assertions more explicit (#4784 ) Match using precise regexes.	2025-10-05 16:01:38 -07:00
Fouad Matin	77a8b7fdeb	add `codex sandbox {linux\|macos}` (#4782 ) ## Summary - add a `codex sandbox` subcommand with macOS and Linux targets while keeping the legacy `codex debug` aliases - update documentation to highlight the new sandbox entrypoints and point existing references to the new command - clarify the core README about the linux sandbox helper alias ## Testing - just fmt - just fix -p codex-cli - cargo test -p codex-cli ------ https://chatgpt.com/codex/tasks/task_i_68e2e00ca1e8832d8bff53aa0b50b49e	2025-10-05 15:51:57 -07:00
Gabriel Peal	7fa5e95c1f	[MCP] Upgrade rmcp to 0.8 (#4774 ) The version with the well-known discovery and my MCP client name change were just released https://github.com/modelcontextprotocol/rust-sdk/releases	2025-10-05 18:12:37 -04:00
pakrym-oai	191d620707	Use response helpers when mounting SSE test responses (#4783 ) ## Summary - replace manual wiremock SSE mounts in the compact suite with the shared response helpers - simplify the exec auth_env integration test by using the mount_sse_once_match helper - rely on mount_sse_sequence plus server request collection to replace the bespoke SeqResponder utility in tests ## Testing - just fmt ------ https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093	2025-10-05 21:58:16 +00:00
pakrym-oai	5c42419b02	Use assert_matches (#4756 ) assert_matches is soon to be in std but is experimental for now.	2025-10-05 21:12:31 +00:00
pakrym-oai	aecbe0f333	Add helper for response created SSE events in tests (#4758 ) ## Summary - add a reusable `ev_response_created` helper that builds `response.created` SSE events for integration tests - update the exec and core integration suites to use the new helper instead of repeating manual JSON literals - keep the streaming fixtures consistent by relying on the shared helper in every touched test ## Testing - `just fmt` ------ https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61	2025-10-05 21:11:43 +00:00
Michael Bolin	a30a902db5	fix: use low-level stdin read logic to avoid a BufReader (#4778 ) `codex-responses-api-proxy` is designed so that there should be exactly one copy of the API key in memory (that is `mlock`'d on UNIX), but in practice, I was seeing two when I dumped the process data from `/proc/$PID/mem`. It appears that `std::io::stdin()` maintains an internal `BufReader` that we cannot zero out, so this PR changes the implementation on UNIX so that we use a low-level `read(2)` instead. Even though it seems like it would be incredibly unlikely, we also make this logic tolerant of short reads. Either `\n` or `EOF` must be sent to signal the end of the key written to stdin.	2025-10-05 13:58:30 -07:00
jif-oai	f3b4a26f32	chore: drop read-file for gpt-5-codex (#4739 ) Drop `read_file` for gpt-5-codex (will do the same for parallel tool call) and add `codex-` as internal model for this kind of feature	2025-10-05 16:26:04 +00:00
jif-oai	dc3c6bf62a	feat: parallel tool calls (#4663 ) Add parallel tool calls. This is configurable at model level and tool level	2025-10-05 16:10:49 +00:00
Dylan	3203862167	chore: update tool config (#4755 ) ## Summary Updates tool config for gpt-5-codex ## Test Plan - [x] Ran locally - [x] Updated unit tests	2025-10-04 22:47:26 -07:00
pakrym-oai	06853d94f0	Use wait_for_event helpers in tests (#4753 ) ## Summary - replace manual event polling loops in several core test suites with the shared wait_for_event helpers - keep prior assertions intact by using closure captures for stateful expectations, including plan updates, patch lifecycles, and review flow checks - rely on wait_for_event_with_timeout where longer waits are required, simplifying timeout handling ## Testing - just fmt ------ https://chatgpt.com/codex/tasks/task_i_68e1d58582d483208febadc5f90dd95e	2025-10-04 22:04:05 -07:00
Ahmed Ibrahim	cc2f4aafd7	Add truncation hint on truncated exec output. (#4740 ) When truncating output, add a hint of the total number of lines	2025-10-05 03:29:07 +00:00
Dylan	4764fc1ee7	feat: Freeform apply_patch with simple shell output (#4718 ) ## Summary This PR is an alternative approach to #4711, but instead of changing our storage, parses out shell calls in the client and reserializes them on the fly before we send them out as part of the request. What this changes: 1. Adds additional serialization logic when the ApplyPatchToolType::Freeform is in use. 2. Adds a --custom-apply-patch flag to enable this setting on a session-by-session basis. This change is delicate, but is not meant to be permanent. It is meant to be the first step in a migration: 1. (This PR) Add in-flight serialization with config 2. Update model_family default 3. Update serialization logic to store turn outputs in a structured format, with logic to serialize based on model_family setting. 4. Remove this rewrite in-flight logic. ## Test Plan - [x] Additional unit tests added - [x] Integration tests added - [x] Tested locally	2025-10-04 19:16:36 -07:00
Ahmed Ibrahim	90ef94d3b3	Surface context window error to the client (#4675 ) In the past, we were treating `input exceeded context window` as a streaming error and retrying on it. Retrying on it has no point because it won't change the behavior. In this PR, we surface the error to the client without retry and also send a token count event to indicate that the context window is full. <img width="650" height="125" alt="image" src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5" />	2025-10-05 01:40:06 +00:00
iceweasel-oai	6c2969d22d	add an onboarding informing Windows of better support in WSL (#4697 )	2025-10-04 17:41:40 -07:00
Thibault Sottiaux	0ad1b0782b	feat: instruct model to use apply_patch + avoid destructive changes (#4742 )	2025-10-04 12:49:50 -07:00
Ahmed Ibrahim	d7acd146fb	fix: exec commands that blows up context window. (#4706 ) We truncate the output of exec commands to not blow the context window. However, some cases we weren't doing that. This caused reports of people with 76% context window left facing `input exceeded context window` which is weird.	2025-10-04 11:49:56 -07:00
Fouad Matin	665341c9b1	login: device code text (#4616 ) Co-authored-by: rakesh <rakesh@openai.com>	2025-10-03 16:35:40 -07:00
dedrisian-oai	fae0e6c52c	Fix reasoning effort title (#4694 )	2025-10-03 16:17:30 -07:00
Jeremy Rose	1b4a79f03c	requery default colors on focus (#4673 ) fixes an issue when terminals change their color scheme, e.g. dark/light mode, the composer wouldn't update its background color.	2025-10-03 22:43:41 +00:00
Gabriel Peal	d13ee79c41	[MCP] Don't require experimental_use_rmcp_client for no-auth http servers (#4689 ) The `experimental_use_rmcp_client` flag is still useful to: 1. Toggle between stdio clients 2. Enable oauth beacuse we want to land https://github.com/modelcontextprotocol/rust-sdk/pull/469, https://github.com/openai/codex/pull/4677, and binary signing before we enable it by default However, for no-auth http servers, there is only one option so we don't need the flag and it seems to be working pretty well.	2025-10-03 17:15:23 -04:00
Gabriel Peal	bde468ff8d	Fix oauth .well-known metadata discovery (#4677 ) This picks up https://github.com/modelcontextprotocol/rust-sdk/pull/459 which is required for proper well-known metadata discovery for some MCPs such as Figma.	2025-10-03 17:15:19 -04:00
iceweasel-oai	de8d77274a	set gpt-5 as default model for Windows users (#4676 ) Codex isn’t great yet on Windows outside of WSL, and while we’ve merged https://github.com/openai/codex/pull/4269 to reduce the repetitive manual approvals on readonly commands, we’ve noticed that users seem to have more issues with GPT-5-Codex than with GPT-5 on Windows. This change makes GPT-5 the default for Windows users while we continue to improve the CLI harness and model for GPT-5-Codex on Windows.	2025-10-03 14:00:03 -07:00
Fouad Matin	a5b7675e42	add(core): managed config (#3868 ) ## Summary - Factor `load_config_as_toml` into `core::config_loader` so config loading is reusable across callers. - Layer `~/.codex/config.toml`, optional `~/.codex/managed_config.toml`, and macOS managed preferences (base64) with recursive table merging and scoped threads per source. ## Config Flow ``` Managed prefs (macOS profile: com.openai.codex/config_toml_base64) ▲ │ ~/.codex/managed_config.toml │ (optional file-based override) ▲ │ ~/.codex/config.toml (user-defined settings) ``` - The loader searches under the resolved `CODEX_HOME` directory (defaults to `~/.codex`). - Managed configs let administrators ship fleet-wide overrides via device profiles which is useful for enforcing certain settings like sandbox or approval defaults. - For nested hash tables: overlays merge recursively. Child tables are merged key-by-key, while scalar or array values replace the prior layer entirely. This lets admins add or tweak individual fields without clobbering unrelated user settings.	2025-10-03 13:02:26 -07:00
Gabriel Peal	1d17ca1fa3	[MCP] Add support for MCP Oauth credentials (#4517 ) This PR adds oauth login support to streamable http servers when `experimental_use_rmcp_client` is enabled. This PR is large but represents the minimal amount of work required for this to work. To keep this PR smaller, login can only be done with `codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp` or `codex mcp list` yet. Fingers crossed that this is the last large MCP PR and that subsequent PRs can be smaller. Under the hood, credentials are stored using platform credential managers using the [keyring crate](https://crates.io/crates/keyring). When the keyring isn't available, it falls back to storing credentials in `CODEX_HOME/.credentials.json` which is consistent with how other coding agents handle authentication. I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able to test the dbus store on linux but did verify that the fallback works. One quirk is that if you have credentials, during development, every build will have its own ad-hoc binary so the keyring won't recognize the reader as being the same as the write so it may ask for the user's password. I may add an override to disable this or allow users/enterprises to opt-out of the keyring storage if it causes issues. <img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40" src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d" /> <img width="745" height="486" alt="image" src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e" />	2025-10-03 13:43:12 -04:00
jif-oai	bfe3328129	Fix flaky test (#4672 ) This issue was due to the fact that the timeout is not always sufficient to have enough character for truncation + a race between synthetic timeout and process kill	2025-10-03 18:09:41 +01:00
jif-oai	e0b38bd7a2	feat: add `beta_supported_tools` (#4669 ) Gate the new read_file tool behind a new `beta_supported_tools` flag and only enable it for `gpt-5-codex`	2025-10-03 16:58:03 +00:00
Michael Bolin	153338c20f	docs: add barebones README for codex-app-server crate (#4671 )	2025-10-03 09:26:44 -07:00
Michael Bolin	042d4d55d9	feat: `codex exec` writes only the final message to stdout (#4644 ) This updates `codex exec` so that, by default, most of the agent's activity is written to stderr so that only the final agent message is written to stdout. This makes it easier to pipe `codex exec` into another tool without extra filtering. I introduced `#![deny(clippy::print_stdout)]` to help enforce this change and renamed the `ts_println!()` macro to `ts_msg()` because (1) it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too long of a name. While here, this also adds `-o` as an alias for `--output-last-message`. Fixes https://github.com/openai/codex/issues/1670	2025-10-03 16:22:12 +00:00
jif-oai	33d3ecbccc	chore: refactor tool handling (#4510 ) # Tool System Refactor - Centralizes tool definitions and execution in `core/src/tools/`: specs (`spec.rs`), handlers (`handlers/`), router (`router.rs`), registry/dispatch (`registry.rs`), and shared context (`context.rs`). One registry now builds the model-visible tool list and binds handlers. - Router converts model responses to tool calls; Registry dispatches with consistent telemetry via `codex-rs/otel` and unified error handling. Function, Local Shell, MCP, and experimental `unified_exec` all flow through this path; legacy shell aliases still work. - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and make adding tools predictable and testable. Example: `read_file` - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`, registered by `build_specs`). - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`, 1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation). - E2E test: `core/tests/suite/read_file.rs` validates the tool returns the requested lines. ## Next steps: - Decompose `handle_container_exec_with_params` - Add parallel tool calls	2025-10-03 13:21:06 +01:00
jif-oai	69cb72f842	chore: sandbox refactor 2 (#4653 ) Revert the revert and fix the UI issue	2025-10-03 11:17:39 +01:00
Michael Bolin	69ac5153d4	fix: replace --api-key with --with-api-key in `codex login` (#4646 ) Previously, users could supply their API key directly via: ```shell codex login --api-key KEY ``` but this has the drawback that `KEY` is more likely to end up in shell history, can be read from `/proc`, etc. This PR removes support for `--api-key` and replaces it with `--with-api-key`, which reads the key from stdin, so either of these are better options: ``` printenv OPENAI_API_KEY \| codex login --with-api-key codex login --with-api-key < my_key.txt ``` Other CLIs, such as `gh auth login --with-token`, follow the same practice.	2025-10-03 06:17:31 +00:00
dedrisian-oai	16b6951648	Nit: Pop model effort picker on esc (#4642 ) Pops the effort picker instead of dismissing the whole thing (on escape). https://github.com/user-attachments/assets/cef32291-cd07-4ac7-be8f-ce62b38145f9	2025-10-02 21:07:47 -07:00
dedrisian-oai	231c36f8d3	Move gpt-5-codex to top (#4641 ) In /model picker	2025-10-03 03:34:58 +00:00
dedrisian-oai	1e4541b982	Fix tab+enter regression on slash commands (#4639 ) Before when you would enter `/di`, hit tab on `/diff`, and then hit enter, it would execute `/diff`. But now it's just sending it as a text. This fixes the issue.	2025-10-02 20:14:28 -07:00
Shijie Rao	7be3b484ad	feat: add file name to fuzzy search response (#4619 ) ### Summary * Updated fuzzy search result to include the file name. * This should not affect CLI usage and the UI there will be addressed in a separate PR. ### Testing Tested locally and with the extension. ### Screenshot <img width="431" height="244" alt="Screenshot 2025-10-02 at 11 08 44 AM" src="https://github.com/user-attachments/assets/ba2ca299-a81d-4453-9242-1750e945aea2" /> --------- Co-authored-by: shijie.rao <shijie.rao@squareup.com>	2025-10-02 18:19:13 -07:00
Jeremy Rose	9617b69c8a	tui: • Working, 100% context dim (#4629 ) - add a `•` before the "Working" shimmer - make the percentage in "X% context left" dim instead of bold <img width="751" height="480" alt="Screenshot 2025-10-02 at 2 29 57 PM" src="https://github.com/user-attachments/assets/cf3e771f-ddb3-48f4-babe-1eaf1f0c2959" />	2025-10-03 01:17:34 +00:00
pakrym-oai	1d94b9111c	Use supports_color in codex exec (#4633 ) It knows how to detect github actions	2025-10-03 01:15:03 +00:00
Michael Bolin	37786593a0	feat: write pid in addition to port to server info (#4571 ) This is nice to have for debugging. While here, also cleaned up a bunch of unnecessary noise in `write_server_info()`.	2025-10-02 17:15:09 -07:00
Jeremy Rose	c0a84473a4	fix false "task complete" state during agent message (#4627 ) fixes an issue where user messages wouldn't be queued and ctrl + c would quit the app instead of canceling the stream during the final agent message.	2025-10-02 15:41:25 -07:00
pakrym-oai	c405d8c06c	Rename assistant message to agent message and fix item type field naming (#4610 ) Naming cleanup	2025-10-02 15:07:14 -07:00
Jeremy Rose	25a2e15ec5	tui: tweaks to dialog display (#4622 ) - prefix command approval reasons with "Reason:" - show keyboard shortcuts for some ListSelectionItems - remove "description" lines for approval options, and make the labels more verbose - add a spacer line in diff display after the path and some other minor refactors that go along with the above. <img width="859" height="508" alt="Screenshot 2025-10-02 at 1 24 50 PM" src="https://github.com/user-attachments/assets/4fa7ecaf-3d3a-406a-bb4d-23e30ce3e5cf" />	2025-10-02 21:41:29 +00:00
pakrym-oai	f895d4cbb3	Minor cleanup of codex exec output (#4585 ) <img width="850" height="723" alt="image" src="https://github.com/user-attachments/assets/2ae067bf-ba6b-47bf-9ffe-d1c3f3aa1870" /> <img width="872" height="547" alt="image" src="https://github.com/user-attachments/assets/9058be24-6513-4423-9dae-2d5fd4cbf162" />	2025-10-02 14:17:42 -07:00
Ahmed Ibrahim	ed5d656fa8	Revert "chore: sanbox extraction" (#4626 ) Reverts openai/codex#4286	2025-10-02 21:09:21 +00:00
pakrym-oai	4c566d484a	Separate interactive and non-interactive sessions (#4612 ) Do not show exec session in VSCode/TUI selector.	2025-10-02 13:06:21 -07:00
easong-openai	06e34d4607	Make model switcher two-stage (#4178 ) https://github.com/user-attachments/assets/16d5c67c-e580-4a29-983c-a315f95424ee	2025-10-02 19:38:24 +00:00
Jeremy Rose	45936f8fbd	show "Viewed Image" when the model views an image (#4475 ) <img width="1022" height="339" alt="Screenshot 2025-09-29 at 4 22 00 PM" src="https://github.com/user-attachments/assets/12da7358-19be-4010-a71b-496ede6dfbbf" />	2025-10-02 18:36:03 +00:00
Jeremy Rose	ec98445abf	normalize key hints (#4586 ) render key hints the same everywhere. \| Before \| After \| \|--------\|-------\| \| <img width="816" height="172" alt="Screenshot 2025-10-01 at 5 15 42 PM" src="https://github.com/user-attachments/assets/f88d5db4-04bb-4e89-b571-568222c41e4b" /> \| <img width="672" height="137" alt="Screenshot 2025-10-01 at 5 13 56 PM" src="https://github.com/user-attachments/assets/1fee6a71-f313-4620-8d9a-10766dc4e195" /> \| \| <img width="816" height="172" alt="Screenshot 2025-10-01 at 5 17 01 PM" src="https://github.com/user-attachments/assets/5170ab35-88b7-4131-b485-ecebea9f0835" /> \| <img width="816" height="174" alt="Screenshot 2025-10-01 at 5 14 24 PM" src="https://github.com/user-attachments/assets/6b6bc64c-25b9-4824-b2d7-56f60370870a" /> \| \| <img width="816" height="172" alt="Screenshot 2025-10-01 at 5 17 29 PM" src="https://github.com/user-attachments/assets/2313b36a-e0a8-4cd2-82be-7d0fe7793c19" /> \| <img width="816" height="134" alt="Screenshot 2025-10-01 at 5 14 37 PM" src="https://github.com/user-attachments/assets/e18934e8-8e9d-4f46-9809-39c8cb6ee893" /> \| \| <img width="816" height="172" alt="Screenshot 2025-10-01 at 5 17 40 PM" src="https://github.com/user-attachments/assets/0cc69e4e-8cce-420a-b3e4-be75a7e2c8f5" /> \| <img width="816" height="134" alt="Screenshot 2025-10-01 at 5 14 56 PM" src="https://github.com/user-attachments/assets/329a5121-ae4a-4829-86e5-4c813543770c" /> \|	2025-10-02 18:34:47 +00:00
dedrisian-oai	b07aafa5f5	Fix status usage ratio (#4584 ) 1. Removes "Token usage" line for chatgpt sub users 2. Adds the word "used" to the context window line	2025-10-02 10:27:10 -07:00

1 2 3 4 5 ...

1081 Commits