valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Eric Traut	dc42ec0eb4	Add AuthManager and enhance GetAuthStatus command (#2577 ) This PR adds a central `AuthManager` struct that manages the auth information used across conversations and the MCP server. Prior to this, each conversation and the MCP server got their own private snapshots of the auth information, and changes to one (such as a logout or token refresh) were not seen by others. This is especially problematic when multiple instances of the CLI are run. For example, consider the case where you start CLI 1 and log in to ChatGPT account X and then start CLI 2 and log out and then log in to ChatGPT account Y. The conversation in CLI 1 is still using account X, but if you create a new conversation, it will suddenly (and unexpectedly) switch to account Y. With the `AuthManager`, auth information is read from disk at the time the `ConversationManager` is constructed, and it is cached in memory. All new conversations use this same auth information, as do any token refreshes. The `AuthManager` is also used by the MCP server's GetAuthStatus command, which now returns the auth method currently used by the MCP server. This PR also includes an enhancement to the GetAuthStatus command. It now accepts two new (optional) input parameters: `include_token` and `refresh_token`. Callers can use this to request the in-use auth token and can optionally request to refresh the token. The PR also adds tests for the login and auth APIs that I recently added to the MCP server.	2025-08-22 13:10:11 -07:00
easong-openai	8ad56be06e	Parse and expose stream errors (#2540 )	2025-08-21 01:15:24 -07:00
Dylan	d2b2a6d13a	[prompt] xml-format EnvironmentContext (#2272 ) ## Summary Before we land #2243, let's start printing environment_context in our preferred format. This struct will evolve over time with new information, xml gives us a balance of human readable without too much parsing, llm readable, and extensible. Also moves us over to an Option-based struct, so we can easily provide diffs to the model. ## Testing - [x] Updated tests to reflect new format	2025-08-20 23:45:16 -07:00
eddy-win	050b9baeb6	Bridge command generation to powershell when on Windows (#2319 ) ## What? Why? How? - When running on Windows, codex often tries to invoke bash commands, which commonly fail (unless WSL is installed) - Fix: Detect if powershell is available and, if so, route commands to it - Also add a shell_name property to environmental context for codex to default to powershell commands when running in that environment ## Testing - Tested within WSL and powershell (e.g. get top 5 largest files within a folder and validated that commands generated were powershell commands) - Tested within Zsh - Updated unit tests --------- Co-authored-by: Eddy Escardo <eddy@openai.com>	2025-08-20 16:30:34 -07:00
Michael Bolin	50c48e88f5	chore: upgrade to Rust 1.89 (#2465 ) Codex created this PR from the following prompt: > upgrade this entire repo to Rust 1.89. Note that this requires updating codex-rs/rust-toolchain.toml as well as the workflows in .github/. Make sure that things are "clippy clean" as this change will likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests` are sufficient to check for correctness Note this modifies a lot of lines because it folds nested `if` statements using `&&`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465). * #2467 * __->__ #2465	2025-08-19 13:22:02 -07:00
Dylan	e7e5fe91c8	[tui] Support /mcp command (#2430 ) ## Summary Adds a `/mcp` command to list active tools. We can extend this command to allow configuration of MCP tools, but for now a simple list command will help debug if your config.toml and your tools are working as expected.	2025-08-19 09:00:31 -07:00
Ahmed Ibrahim	c283f9f6ce	Add an operation to override current task context (#2431 ) - Added an operation to override current task context - Added a test to check that cache stays the same	2025-08-18 19:59:19 +00:00
Ahmed Ibrahim	c9963b52e9	consolidate reasoning enums into one (#2428 ) We have three enums for each of reasoning summaries and reasoning effort with same values. They can be consolidated into one.	2025-08-18 11:50:17 -07:00
Michael Bolin	b581498882	fix: introduce EventMsg::TurnAborted (#2365 ) Introduces `EventMsg::TurnAborted` that should be sent in response to `Op::Interrupt`. In the MCP server, updates the handling of a `ClientRequest::InterruptConversation` request such that it sends the `Op::Interrupt` but does not respond to the request until it sees an `EventMsg::TurnAborted`.	2025-08-17 21:40:31 -07:00
Michael Bolin	d262244725	fix: introduce codex-protocol crate (#2355 )	2025-08-15 12:44:40 -07:00
Michael Bolin	17aa394ae7	feat: introduce Op:UserTurn (#2329 ) This introduces `Op::UserTurn`, which makes it possible to override many of the fields that were set when the `Session` was originally created when creating a new conversation turn. This is one way we could support changing things like `model` or `cwd` in the middle of the conversation, though we may want to consider making each field optional, or alternatively having a separate `Op` that mutates the `TurnContext` associated with a `submission_loop()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2329). * #2345 * __->__ #2329 * #2343 * #2340 * #2338	2025-08-15 09:56:05 -07:00
Michael Bolin	13ed67cfc1	feat: introduce TurnContext (#2343 ) This PR introduces `TurnContext`, which is designed to hold a set of fields that should be constant for a turn of a conversation. Note that the fields of `TurnContext` were previously governed by `Session`. Ultimately, we want to enable users to change these values between turns (changing model, approval policy, etc.), though in the current implementation, the `TurnContext` is constant for the entire conversation. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345). * #2345 * #2329 * __->__ #2343 * #2340 * #2338	2025-08-15 09:40:02 -07:00
Michael Bolin	6730592433	fix: introduce MutexExt::lock_unchecked() so we stop ignoring unwrap() throughout codex.rs (#2340 ) This way we are sure a dangerous `unwrap()` does not sneak in! --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2340). * #2345 * #2329 * #2343 * __->__ #2340 * #2338	2025-08-15 09:14:44 -07:00
Michael Bolin	26c8373821	fix: tighten up checks against writable folders for SandboxPolicy (#2338 ) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338	2025-08-15 09:06:15 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Michael Bolin	8f11652458	fix: parallelize logic in Session::new() (#2305 ) #2291 made it so that `Session::new()` is on the critical path to `Codex::spawn()`, which means it is on the hot path to CLI startup. This refactors `Session::new()` to run a number of async tasks in parallel that were previously run serially to try to reduce latency.	2025-08-14 13:29:58 -07:00
Dylan	544980c008	[context] Store context messages in rollouts (#2243 ) ## Summary Currently, we use request-time logic to determine the user_instructions and environment_context messages. This means that neither of these values can change over time as conversations go on. We want to add in additional details here, so we're migrating these to save these messages to the rollout file instead. This is simpler for the client, and allows us to append additional environment_context messages to each turn if we want ## Testing - [x] Integration test coverage - [x] Tested locally with a few turns, confirmed model could reference environment context and cached token metrics were reasonably high	2025-08-14 14:51:13 -04:00
Michael Bolin	cf7a7e63a3	exploration: create Session as part of Codex::spawn() (#2291 ) Historically, `Codex::spawn()` would create the instance of `Codex` and enforce, by construction, that `Op::ConfigureSession` was the first `Op` submitted via `submit()`. Then over in `submission_loop()`, it would handle the case for taking the parameters of `Op::ConfigureSession` and turning it into a `Session`. This approach has two challenges from a state management perspective: `f968a1327a/codex-rs/core/src/codex.rs (L718)` - The local `sess` variable in `submission_loop()` has to be `mut` and `Option<Arc<Session>>` because it is not invariant that a `Session` is present for the lifetime of the loop, so there is a lot of logic to deal with the case where `sess` is `None` (e.g., the `send_no_session_event` function and all of its callsites). - `submission_loop()` is written in such a way that `Op::ConfigureSession` could be observed multiple times, but in practice, it is only observed exactly once at the start of the loop. In this PR, we try to simplify the state management by _removing_ the `Op::ConfigureSession` enum variant and constructing the `Session` as part of `Codex::spawn()` so that it can be passed to `submission_loop()` as `Arc<Session>`. The original logic from the `Op::ConfigureSession` has largely been moved to the new `Session::new()` constructor. --- Incidentally, I also noticed that the handling of `Op::ConfigureSession` can result in events being dispatched in addition to `EventMsg::SessionConfigured`, as an `EventMsg::Error` is created for every MCP initialization error, so it is important to preserve that behavior: `f968a1327a/codex-rs/core/src/codex.rs (L901-L916)` Though admittedly, I believe this does not play nice with #2264, as these error messages will likely be dispatched before the client has a chance to call `addConversationListener`, so we likely need to make it so `newConversation` automatically creates the subscription, but we must also guarantee that the "ack" from `newConversation` is returned before any other conversation-related notifications are sent so the client knows what `conversation_id` to match on.	2025-08-14 09:55:28 -07:00
Michael Bolin	085f166707	fix: make all fields of Session private (#2285 ) As `Session` needs a bit of work, it will make things easier to move around if we can start by reducing the extent of its public API. This makes all the fields private, though adds three `pub(crate)` getters. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2285). * #2287 * #2286 * __->__ #2285	2025-08-13 22:53:54 -07:00
pakrym-oai	f1be7978cf	Parse reasoning text content (#2277 ) Sometimes COT is returns as text content instead of `ReasoningText`. We should parse it but not serialize back on requests. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-13 18:39:58 -07:00
pakrym-oai	41eb59a07d	Wait for requested delay in rate limit errors (#2266 ) Fixes: https://github.com/openai/codex/issues/2131 Response doesn't have the delay in a separate field (yet) so parse the message.	2025-08-13 15:43:54 -07:00
Michael Bolin	08ed618f72	chore: introduce ConversationManager as a clearinghouse for all conversations (#2240 ) This PR does two things because after I got deep into the first one I started pulling on the thread to the second: - Makes `ConversationManager` the place where all in-memory conversations are created and stored. Previously, `MessageProcessor` in the `codex-mcp-server` crate was doing this via its `session_map`, but this is something that should be done in `codex-core`. - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded throughout our code. I think this made sense at one time, but now that we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event, I don't think this was quite right, so I removed it. For `codex exec` and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we no longer make `Notify` a field of `Codex` or `CodexConversation`. Changes of note: - Adds the files `conversation_manager.rs` and `codex_conversation.rs` to `codex-core`. - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`: other crates must use `CodexConversation` instead (which is created via `ConversationManager`). - `core/src/codex_wrapper.rs` has been deleted in favor of `ConversationManager`. - `ConversationManager::new_conversation()` returns `NewConversation`, which is in line with the `new_conversation` tool we want to add to the MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so we eliminate checks in cases like `codex-rs/core/tests/client.rs` to verify `SessionConfiguredEvent` is the first event because that is now internal to `ConversationManager`. - Quite a bit of code was deleted from `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to manage multiple conversations itself: it goes through `ConversationManager` instead. - `core/tests/live_agent.rs` has been deleted because I had to update a bunch of tests and all the tests in here were ignored, and I don't think anyone ever ran them, so this was just technical debt, at this point. - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I hope to refactor the blandly-named `util.rs` into more descriptive files). - In general, I started replacing local variables named `codex` as `conversation`, where appropriate, though admittedly I didn't do it through all the integration tests because that would have added a lot of noise to this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240). * #2264 * #2263 * __->__ #2240	2025-08-13 13:38:18 -07:00
easong-openai	6340acd885	Re-add markdown streaming (#2029 ) Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.	2025-08-12 17:37:28 -07:00
Michael Bolin	596a9d6a96	fix: take ExecToolCallOutput by value to avoid clone() (#2197 ) Since the output could be a large string, it seemed like a win to avoid the `clone()` in the common case.	2025-08-12 08:59:35 -07:00
pakrym-oai	0cf57e1f42	Include output truncation message in tool call results (#2183 ) To avoid model being confused about incomplete output.	2025-08-11 11:52:05 -07:00
Gabriel Peal	7f6408720b	[1/3] Parse exec commands and format them more nicely in the UI (#2095 ) # Note for reviewers The bulk of this PR is in in the new file, `parse_command.rs`. This file is designed to be written TDD and implemented with Codex. Do not worry about reviewing the code, just review the unit tests (if you want). If any cases are missing, we'll add more tests and have Codex fix them. I think the best approach will be to land and iterate. I have some follow-ups I want to do after this lands. The next PR after this will let us merge (and dedupe) multiple sequential cells of the same such as multiple read commands. The deduping will also be important because the model often reads the same file multiple times in a row in chunks === This PR formats common commands like reading, formatting, testing, etc more nicely: It tries to extract things like file names, tests and falls back to the cmd if it doesn't. It also only shows stdout/err if the command failed. <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15" src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e" /> <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32" src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9" /> <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2" src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f" /> Part 2: https://github.com/openai/codex/pull/2097 Part 3: https://github.com/openai/codex/pull/2110	2025-08-11 14:26:15 -04:00
Dylan	0091930f5a	[core] Allow resume after client errors (#2053 ) ## Summary Allow tui conversations to resume after the client fails out of retries. I tested this with exec / mocked api failures as well, and it appears to be fine. But happy to add an exec integration test as well! ## Testing - [x] Added integration test - [x] Tested locally	2025-08-08 18:21:19 -07:00
aibrahim-oai	6cfee15612	Moving the compact prompt near where it's used (#2031 ) - Moved the prompt for compact to core - Renamed it to be more clear	2025-08-08 12:43:43 -07:00
pakrym-oai	fa0051190b	Adjust error messages (#1969 ) <img width="1378" height="285" alt="image" src="https://github.com/user-attachments/assets/f0283378-f839-4a1f-8331-909694a04b1f" />	2025-08-07 18:24:34 -07:00
pakrym-oai	62ed5907f9	Better usage errors (#1941 ) <img width="771" height="279" alt="image" src="https://github.com/user-attachments/assets/e56f967f-bcd7-49f7-8a94-3d88df68b65a" />	2025-08-07 09:46:13 -07:00
Ed Bayes	eb80614a7c	Tint chat composer background (#1921 ) ## Summary - give the chat composer a subtle custom background and apply it across the full area drawn <img width="1008" height="718" alt="composer-bg" src="https://github.com/user-attachments/assets/4b0f7f69-722a-438a-b4e9-0165ae8865a6" /> - update turn interrupted to be more human readable <img width="648" height="170" alt="CleanShot 2025-08-06 at 22 44 47@2x" src="https://github.com/user-attachments/assets/8d35e53a-bbfa-48e7-8612-c280a54e01dd" /> ## Testing - `cargo test --all-features` (fails: `let` expressions in `core/src/client.rs` require newer rustc) - `just fix` (fails: `let` expressions in `core/src/client.rs` require newer rustc) ------ https://chatgpt.com/codex/tasks/task_i_68941f32c1008322bbcc39ee1d29a526	2025-08-07 00:46:45 -07:00
aibrahim-oai	f15e0fe1df	Ensure exec command end always emitted (#1908 ) ## Summary - defer ExecCommandEnd emission until after sandbox resolution - make sandbox error handler return final exec output and response - align sandbox error stderr with response content and rename to `final_output` - replace unstable `let` chains in client command header logic ## Testing - `just fmt` - `just fix` - `cargo test --all-features` (fails: NotPresent in core/tests/client.rs) ------ https://chatgpt.com/codex/tasks/task_i_6893e63b0c408321a8e1ff2a052c4c51	2025-08-07 06:25:56 +00:00
Dylan	dc468d563f	[env] Remove git config for now (#1884 ) ## Summary Forgot to remove this in #1869 last night! Too much of a performance hit on the main thread. We can bring it back via an async thread on startup.	2025-08-06 08:05:17 -07:00
Dylan	3e8bcf0247	[prompts] Add <environment_context> (#1869 ) ## Summary Includes a new user message in the api payload which provides useful environment context for the model, so it knows about things like the current working directory and the sandbox. ## Testing Updated unit tests	2025-08-06 01:13:31 -07:00
Dylan	725dd6be6a	[approval_policy] Add OnRequest approval_policy (#1865 ) ## Summary A split-up PR of #1763 , stacked on top of a tools refactor #1858 to make the change clearer. From the previous summary: > Let's try something new: tell the model about the sandbox, and let it decide when it will need to break the sandbox. Some local testing suggests that it works pretty well with zero iteration on the prompt! ## Testing - [x] Added unit tests - [x] Tested locally and it appears to work smoothly!	2025-08-05 20:44:20 -07:00
Dylan	aff97ed7dd	[core] Separate tools config from openai client (#1858 ) ## Summary In an effort to make tools easier to work with and more configurable, I'm introducing `ToolConfig` and updating `Prompt` to take in a general list of Tools. I think this is simpler and better for a few reasons: - We can easily assemble tools from various sources (our own harness, mcp servers, etc.) and we can consolidate the logic for constructing the logic in one place that is separate from serialization. - client.rs no longer needs arbitrary config values, it just takes in a list of tools to serialize A hefty portion of the PR is now updating our conversion of `mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest accurately called this out as a TODO long ago, I think it's time we tackled it. ## Testing - [x] Experimented locally, no changes, as expected - [x] Added additional unit tests - [x] Responded to rust-review	2025-08-05 19:27:52 -07:00
Dylan	ea7d3f27bd	[core] Stop escalating timeouts (#1853 ) ## Summary Escalating out of sandbox is (almost always) not going to fix long-running commands timing out - therefore we should just pass the failure back to the model instead of asking the user to re-run a command that took a long time anyway. ## Testing - [x] Ran locally with a timeout and confirmed this worked as expected	2025-08-05 17:52:25 -07:00
easong-openai	e0303dbac0	Rescue chat completion changes (#1846 ) https://github.com/openai/codex/pull/1835 has some messed up history. This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-05 08:56:13 +00:00
easong-openai	906d449760	Stream model responses (#1810 ) Stream models thoughts and responses instead of waiting for the whole thing to come through. Very rough right now, but I'm making the risk call to push through.	2025-08-05 04:23:22 +00:00
Ahmed Ibrahim	e38ce39c51	Revert to `3f13ebce10` without rewriting history. Wrong merge	2025-08-04 17:03:24 -07:00
Ahmed Ibrahim	1a33de34b0	unify flag	2025-08-04 16:56:52 -07:00
Ahmed Ibrahim	bd171e5206	add raw reasoning	2025-08-04 16:49:42 -07:00
Gabriel Peal	1f3318c1c5	Add a TurnDiffTracker to create a unified diff for an entire turn (#1770 ) This lets us show an accumulating diff across all patches in a turn. Refer to the docs for TurnDiffTracker for implementation details. There are multiple ways this could have been done and this felt like the right tradeoff between reliability and completeness: Pros * It will pick up all changes to files that the model touched including if they prettier or another command that updates them. * It will not pick up changes made by the user or other agents to files it didn't modify. Cons * It will pick up changes that the user made to a file that the model also touched * It will not pick up changes to codegen or files that were not modified with apply_patch	2025-08-04 11:57:04 -04:00
Jeremy Rose	78a1d49fac	fix command duration display (#1806 ) we were always displaying "0ms" before. <img width="731" height="101" alt="Screenshot 2025-08-02 at 10 51 22 PM" src="https://github.com/user-attachments/assets/f56814ed-b9a4-4164-9e78-181c60ce19b7" />	2025-08-03 11:33:44 -07:00
aibrahim-oai	81bb1c9e26	Fix compact (#1798 ) We are not recording the summary in the history.	2025-08-02 12:05:06 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
aibrahim-oai	e2c994e32a	Add /compact (#1527 ) - Add operation to summarize the context so far. - The operation runs a compact task that summarizes the context. - The operation clear the previous context to free the context window - The operation didn't use `run_task` to avoid corrupting the session - Add /compact in the tui https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c	2025-07-31 21:34:32 -07:00
Michael Bolin	06c786b2da	fix: ensure PatchApplyBeginEvent and PatchApplyEndEvent are dispatched reliably (#1760 ) This is a follow-up to https://github.com/openai/codex/pull/1705, as that PR inadvertently lost the logic where `PatchApplyBeginEvent` and `PatchApplyEndEvent` events were sent when patches were auto-approved. Though as part of this fix, I believe this also makes an important safety fix to `assess_patch_safety()`, as there was a case that returned `SandboxType::None`, which arguably is the thing we were trying to avoid in #1705. On a high level, we want there to be only one codepath where `apply_patch` happens, which should be unified with the patch to run `exec`, in general, so that sandboxing is applied consistently for both cases. Prior to this change, `apply_patch()` in `core` would either: * exit early, delegating to `exec()` to shell out to `apply_patch` using the appropriate sandbox * proceed to run the logic for `apply_patch` in memory `549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)` In this implementation, only the latter would dispatch `PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the `apply_patch` call (or, more specifically, the `codex --codex-run-as-apply-patch PATCH` call). To unify things in this PR, we: * Eliminate the back half of the `apply_patch()` function, and instead have it also return with `DelegateToExec`, though we add an extra field to the return value, `user_explicitly_approved_this_action`. * In `codex.rs` where we process `DelegateToExec`, we use `SandboxType::None` when `user_explicitly_approved_this_action` is `true`. This means we no longer run the apply_patch logic in memory, as we always `exec()`. (Note this is what allowed us to delete so much code in `apply_patch.rs`.) * In `codex.rs`, we further update `notify_exec_command_begin()` and `notify_exec_command_end()` to take additional fields to determine what type of notification to send: `ExecCommand` or `PatchApply`. Admittedly, this PR also drops some of the functionality about giving the user the opportunity to expand the set of writable roots as part of approving the `apply_patch` command. I'm not sure how much that was used, and we should probably rethink how that works as we are currently tidying up the protocol to the TUI, in general.	2025-07-31 11:13:57 -07:00
Michael Bolin	221ebfcccc	fix: run apply_patch calls through the sandbox (#1705 ) Building on the work of https://github.com/openai/codex/pull/1702, this changes how a shell call to `apply_patch` is handled. Previously, a shell call to `apply_patch` was always handled in-process, never leveraging a sandbox. To determine whether the `apply_patch` operation could be auto-approved, the `is_write_patch_constrained_to_writable_paths()` function would check if all the paths listed in the paths were writable. If so, the agent would apply the changes listed in the patch. Unfortunately, this approach afforded a loophole: symlinks! * For a soft link, we could fix this issue by tracing the link and checking whether the target is in the set of writable paths, however... * ...For a hard link, things are not as simple. We can run `stat FILE` to see if the number of links is greater than 1, but then we would have to do something potentially expensive like `find . -inum <inode_number>` to find the other paths for `FILE`. Further, even if this worked, this approach runs the risk of a [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) race condition, so it is not robust. The solution, implemented in this PR, is to take the virtual execution of the `apply_patch` CLI into an _actual_ execution using `codex --codex-run-as-apply-patch PATCH`, which we can run under the sandbox the user specified, just like any other `shell` call. This, of course, assumes that the sandbox prevents writing through symlinks as a mechanism to write to folders that are not in the writable set configured by the sandbox. I verified this by testing the following on both Mac and Linux: ```shell #!/usr/bin/env bash set -euo pipefail # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR? # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory. SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it. EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) echo "SANDBOX_DIR: $SANDBOX_DIR" echo "EXPLOIT_DIR: $EXPLOIT_DIR" cleanup() { # Only remove if it looks sane and still exists [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR" [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR" } trap cleanup EXIT echo "I am the original content" > "${EXPLOIT_DIR}/original.txt" # Drop the -s to test hard links. ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt" cat "${SANDBOX_DIR}/link-to-original.txt" if [[ "$(uname)" == "Linux" ]]; then SANDBOX_SUBCOMMAND=landlock else SANDBOX_SUBCOMMAND=seatbelt fi # Attempt the exploit cd "${SANDBOX_DIR}" codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" \|\| true cat "${EXPLOIT_DIR}/original.txt" ``` Admittedly, this change merits a proper integration test, but I think I will have to do that in a follow-up PR.	2025-07-30 16:45:08 -07:00

1 2 3

113 Commits