valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	5e4f3bbb0b	chore: rework tools execution workflow (#5278 ) Re-work the tool execution flow. Read `orchestrator.rs` to understand the structure	2025-10-20 20:57:37 +01:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
Michael Bolin	d9dbf48828	fix: separate `codex mcp` into `codex mcp-server` and `codex app-server` (#4471 ) This is a very large PR with some non-backwards-compatible changes. Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish server that had two overlapping responsibilities: - Running an MCP server, providing some basic tool calls. - Running the app server used to power experiences such as the VS Code extension. This PR aims to separate these into distinct concepts: - `codex mcp-server` for the MCP server - `codex app-server` for the "application server" Note `codex mcp` still exists because it already has its own subcommands for MCP management (`list`, `add`, etc.) The MCP logic continues to live in `codex-rs/mcp-server` whereas the refactored app server logic is in the new `codex-rs/app-server` folder. Note that most of the existing integration tests in `codex-rs/mcp-server/tests/suite` were actually for the app server, so all the tests have been moved with the exception of `codex-rs/mcp-server/tests/suite/mod.rs`. Because this is already a large diff, I tried not to change more than I had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses the name `McpProcess` for now, but I will do some mechanical renamings to things like `AppServer` in subsequent PRs. While `mcp-server` and `app-server` share some overlapping functionality (like reading streams of JSONL and dispatching based on message types) and some differences (completely different message types), I ended up doing a bit of copypasta between the two crates, as both have somewhat similar `message_processor.rs` and `outgoing_message.rs` files for now, though I expect them to diverge more in the near future. One material change is that of the initialize handshake for `codex app-server`, as we no longer use the MCP types for that handshake. Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an `Initialize` variant to `ClientRequest`, which takes the `ClientInfo` object we need to update the `USER_AGENT_SUFFIX` in `codex-rs/app-server/src/message_processor.rs`. One other material change is in `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated a use of the `send_event_as_notification()` method I am generally trying to deprecate (because it blindly maps an `EventMsg` into a `JSONNotification`) in favor of `send_server_notification()`, which takes a `ServerNotification`, as that is intended to be a custom enum of all notification types supported by the app server. So to make this update, I had to introduce a new variant of `ServerNotification`, `SessionConfigured`, which is a non-backwards compatible change with the old `codex mcp`, and clients will have to be updated after the next release that contains this PR. Note that `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update to reflect this change. I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility crate to avoid some of the copying between `mcp-server` and `app-server`.	2025-09-30 07:06:18 +00:00
Dylan	197f45a3be	[mcp-server] Expose fuzzy file search in MCP (#2677 ) ## Summary Expose a simple fuzzy file search implementation for mcp clients to work with ## Testing - [x] Tested locally	2025-09-29 12:19:09 -07:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
Thibault Sottiaux	c93e77b68b	feat: update default (#4076 ) Changes: - Default model and docs now use gpt-5-codex. - Disables the GPT-5 Codex NUX by default. - Keeps presets available for API key users.	2025-09-22 20:10:52 -07:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
pakrym-oai	d4aba772cb	Switch to uuid_v7 and tighten ConversationId usage (#3819 ) Make sure conversations have a timestamp.	2025-09-18 14:37:03 +00:00
Eric Traut	e5dd7f0934	Fix get_auth_status response when using custom provider (#3581 ) This PR addresses an edge-case bug that appears in the VS Code extension in the following situation: 1. Log in using ChatGPT (using either the CLI or extension). This will create an `auth.json` file. 2. Manually modify `config.toml` to specify a custom provider. 3. Start a fresh copy of the VS Code extension. The profile menu in the VS Code extension will indicate that you are logged in using ChatGPT even though you're not. This is caused by the `get_auth_status` method returning an `auth_method: 'chatgpt'` when a custom provider is configured and it doesn't use OpenAI auth (i.e. `requires_openai_auth` is false). The method should always return `auth_method: None` if `requires_openai_auth` is false. The same bug also causes the NUX (new user experience) screen to be displayed in the VSCE in this situation.	2025-09-14 18:27:02 -07:00
jif-oai	c6fd056aa6	feat: reasoning effort as optional (#3527 ) Allow the reasoning effort to be optional	2025-09-12 12:06:33 -07:00
Michael Bolin	abdcb40f4c	feat: change the behavior of SetDefaultModel RPC so None clears the value. (#3529 ) It turns out that we want slightly different behavior for the `SetDefaultModel` RPC because some models do not work with reasoning (like GPT-4.1), so we should be able to explicitly clear this value. Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.	2025-09-12 11:35:51 -07:00
Michael Bolin	c172e8e997	feat: added SetDefaultModel to JSON-RPC server (#3512 ) This adds `SetDefaultModel`, which takes `model` and `reasoning_effort` as optional fields. If set, the field will overwrite what is in the user's `config.toml`. This reuses logic that was added to support the `/model` command in the TUI: https://github.com/openai/codex/pull/2799.	2025-09-11 23:44:17 -07:00
Michael Bolin	9bbeb75361	feat: include reasoning_effort in NewConversationResponse (#3506 ) `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.	2025-09-11 21:04:40 -07:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Michael Bolin	65f3528cad	feat: add UserInfo request to JSON-RPC server (#3428 ) This adds a simple endpoint that provides the email address encoded in `$CODEX_HOME/auth.json`. As noted, for now, we do not hit the server to verify this is the user's true email address.	2025-09-10 17:03:35 -07:00
Eric Traut	acb28bf914	Improved resiliency of two auth-related tests (#3427 ) This PR improves two existing auth-related tests. They were failing when run in an environment where an `OPENAI_API_KEY` env variable was defined. The change makes them more resilient.	2025-09-10 11:46:02 -07:00
Gabriel Peal	8636bff46d	Set a user agent suffix when used as a mcp server (#3395 ) This automatically adds a user agent suffix whenever the CLI is used as a MCP server	2025-09-10 02:32:57 +00:00
Ahmed Ibrahim	43809a454e	Introduce rollout items (#3380 ) This PR introduces Rollout items. This enable us to rollout eventmsgs and session meta. This is mostly #3214 with rebase on main	2025-09-09 23:52:33 +00:00
Gabriel Peal	5eab4c7ab4	Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388 ) The previous config approach had a few issues: 1. It is part of the config but not designed to be used externally 2. It had to be wired through many places (look at the +/- on this PR 3. It wasn't guaranteed to be set consistently everywhere because we don't have a super well defined way that configs stack. For example, the extension would configure during newConversation but anything that happened outside of that (like login) wouldn't get it. This env var approach is cleaner and also creates one less thing we have to deal with when coming up with a better holistic story around configs. One downside is that I removed the unit test testing for the override because I don't want to deal with setting the global env or spawning child processes and figuring out how to introspect their originator header. The new code is sufficiently simple and I tested it e2e that I feel as if this is still worth it.	2025-09-09 17:23:23 -04:00
Michael Bolin	ace14e8d36	feat: add ArchiveConversation to ClientRequest (#3353 ) Adds support for `ArchiveConversation` in the JSON-RPC server that takes a `(ConversationId, PathBuf)` pair and: - verifies the `ConversationId` corresponds to the rollout id at the `PathBuf` - if so, invokes `ConversationManager.remove_conversation(ConversationId)` - if the `CodexConversation` was in memory, send `Shutdown` and wait for `ShutdownComplete` with a timeout - moves the `.jsonl` file to `$CODEX_HOME/archived_sessions` --------- Co-authored-by: Gabriel Peal <gabriel@openai.com>	2025-09-09 11:39:00 -04:00
Michael Bolin	2a76a08a9e	fix: include rollout_path in NewConversationResponse (#3352 ) Adding the `rollout_path` to the `NewConversationResponse` makes it so a client can perform subsequent operations on a `(ConversationId, PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352). * #3353 * __->__ #3352	2025-09-09 00:11:48 -07:00
Gabriel Peal	5eaaf307e1	Generate more typescript types and return conversation id with ConversationSummary (#3219 ) This PR does multiple things that are necessary for conversation resume to work from the extension. I wanted to make sure everything worked so these changes wound up in one PR: 1. Generate more ts types 2. Resume rollout history files rather than create a new one every time it is resumed so you don't see a duplicate conversation in history for every resume. Chatted with @aibrahim-oai to verify this 3. Return conversation_id in conversation summaries 4. [Cleanup] Use serde and strong types for a lot of the rollout file parsing	2025-09-08 17:54:47 -04:00
Gabriel Peal	5c1416d99b	Add a getUserAgent MCP method (#3320 ) This will allow the extension to pass this user agent + a suffix for its requests	2025-09-08 13:30:13 -04:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Ahmed Ibrahim	907d3dd348	MCP: add session resume + history listing; (#3185 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-09-04 23:44:18 +00:00
Dylan	82ed7bd285	[mcp-server] Update read config interface (#3093 ) ## Summary Follow-up to #3056 This PR updates the mcp-server interface for reading the config settings saved by the user. At risk of introducing _another_ Config struct, I think it makes sense to avoid tying our protocol to ConfigToml, as its become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for this already - better to make it explicit, in my opinion. This is technically a breaking change of the mcp-server protocol, but given the previous interface was introduced so recently in #2725, and we have not yet even started to call it, I propose proceeding with the breaking change - but am open to preserving the old endpoint. ## Testing - [x] Added additional integration test coverage	2025-09-04 16:26:41 -07:00
pakrym-oai	03e2796ca4	Move CodexAuth and AuthManager to the core crate (#3074 ) Fix a long standing layering issue.	2025-09-02 18:36:19 -07:00
Michael Bolin	1e9e703b96	chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848 ) `test_shell_command_approval_triggers_elicitation()` is one of a number of integration tests that we have observed to be flaky on GitHub CI, so this PR tries to reduce the flakiness _and_ to provide us with more information when it flakes. Specifically: - Changed the command that we use to trigger the elicitation from `git init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'` because running `git` seems more likely to invite variance. - Increased the timeout to wait for the task response from 10s to 20s. - Added more logging.	2025-08-28 12:33:33 -07:00
Dylan	0cec0770e2	[mcp-server] Add GetConfig endpoint (#2725 ) ## Summary Adds a GetConfig request to the MCP Protocol, so MCP clients can evaluate the resolved config.toml settings which the harness is using. ## Testing - [x] Added an end to end test of the endpoint	2025-08-27 09:59:03 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00

34 Commits