valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	08ed618f72	chore: introduce ConversationManager as a clearinghouse for all conversations (#2240 ) This PR does two things because after I got deep into the first one I started pulling on the thread to the second: - Makes `ConversationManager` the place where all in-memory conversations are created and stored. Previously, `MessageProcessor` in the `codex-mcp-server` crate was doing this via its `session_map`, but this is something that should be done in `codex-core`. - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded throughout our code. I think this made sense at one time, but now that we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event, I don't think this was quite right, so I removed it. For `codex exec` and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we no longer make `Notify` a field of `Codex` or `CodexConversation`. Changes of note: - Adds the files `conversation_manager.rs` and `codex_conversation.rs` to `codex-core`. - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`: other crates must use `CodexConversation` instead (which is created via `ConversationManager`). - `core/src/codex_wrapper.rs` has been deleted in favor of `ConversationManager`. - `ConversationManager::new_conversation()` returns `NewConversation`, which is in line with the `new_conversation` tool we want to add to the MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so we eliminate checks in cases like `codex-rs/core/tests/client.rs` to verify `SessionConfiguredEvent` is the first event because that is now internal to `ConversationManager`. - Quite a bit of code was deleted from `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to manage multiple conversations itself: it goes through `ConversationManager` instead. - `core/tests/live_agent.rs` has been deleted because I had to update a bunch of tests and all the tests in here were ignored, and I don't think anyone ever ran them, so this was just technical debt, at this point. - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I hope to refactor the blandly-named `util.rs` into more descriptive files). - In general, I started replacing local variables named `codex` as `conversation`, where appropriate, though admittedly I didn't do it through all the integration tests because that would have added a lot of noise to this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240). * #2264 * #2263 * __->__ #2240	2025-08-13 13:38:18 -07:00
easong-openai	6340acd885	Re-add markdown streaming (#2029 ) Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.	2025-08-12 17:37:28 -07:00
Dylan	d33793d31d	[prompts] integration test prompt caching (#2189 ) ## Summary Our current approach to prompt caching is fragile! The current approach works, but we are planning to update to a more resilient system (storing them in the rollout file). Let's start adding some integration tests to ensure stability while we migrate it. ## Testing - [x] These are the tests 😎	2025-08-11 17:03:13 -07:00
pakrym-oai	0cf57e1f42	Include output truncation message in tool call results (#2183 ) To avoid model being confused about incomplete output.	2025-08-11 11:52:05 -07:00
Dylan	0091930f5a	[core] Allow resume after client errors (#2053 ) ## Summary Allow tui conversations to resume after the client fails out of retries. I tested this with exec / mocked api failures as well, and it appears to be fine. But happy to add an exec integration test as well! ## Testing - [x] Added integration test - [x] Tested locally	2025-08-08 18:21:19 -07:00
Michael Bolin	295abf3e51	chore: change CodexAuth::from_api_key() to take &str instead of String (#1970 ) Good practice and simplifies some of the call sites. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1970). * #1971 * __->__ #1970 * #1966 * #1965 * #1962	2025-08-07 16:55:33 -07:00
Michael Bolin	db76f32888	chore: rename CodexAuth::new() to create_dummy_codex_auth_for_testing() because it is not for general consumption (#1962 ) `CodexAuth::new()` was the first method listed in `CodexAuth`, but it is only meant to be used by tests. Rename it to `create_dummy_chatgpt_auth_for_testing()` and move it to the end of the implementation. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1962). * #1971 * #1970 * #1966 * #1965 * __->__ #1962	2025-08-07 16:33:29 -07:00
ae	0334476894	feat: parse info from auth.json and show in /status (#1923 ) - `/status` renders ``` signed in with chatgpt login: example@example.com plan: plus ``` - Setup for using this info in a few more places. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-07 01:27:45 -07:00
Michael Bolin	cd5f9074af	feat: add /tmp by default (#1919 ) Replaces the `include_default_writable_roots` option on `sandbox_workspace_write` (that defaulted to `true`, which was slightly weird/annoying) with `exclude_tmpdir_env_var`, which defaults to `false`. Though perhaps more importantly `/tmp` is now enabled by default as part of `sandbox_mode = "workspace-write"`, though `exclude_slash_tmp = false` can be used to disable this.	2025-08-07 00:17:00 -07:00
pakrym-oai	8262ba58b2	Prefer env var auth over default codex auth (#1861 ) ## Summary - Prioritize provider-specific API keys over default Codex auth when building requests - Add test to ensure provider env var auth overrides default auth ## Testing - `just fmt` - `just fix` (fails: `let` expressions in this position are unstable) - `cargo test --all-features` (fails: `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_68926a104f7483208f2c8fd36763e0e3	2025-08-06 13:02:00 -07:00
Dylan	3e8bcf0247	[prompts] Add <environment_context> (#1869 ) ## Summary Includes a new user message in the api payload which provides useful environment context for the model, so it knows about things like the current working directory and the sandbox. ## Testing Updated unit tests	2025-08-06 01:13:31 -07:00
Dylan	725dd6be6a	[approval_policy] Add OnRequest approval_policy (#1865 ) ## Summary A split-up PR of #1763 , stacked on top of a tools refactor #1858 to make the change clearer. From the previous summary: > Let's try something new: tell the model about the sandbox, and let it decide when it will need to break the sandbox. Some local testing suggests that it works pretty well with zero iteration on the prompt! ## Testing - [x] Added unit tests - [x] Tested locally and it appears to work smoothly!	2025-08-05 20:44:20 -07:00
Dylan	063083af15	[prompts] Better user_instructions handling (#1836 ) ## Summary Our recent change in #1737 can sometimes lead to the model confusing AGENTS.md context as part of the message. But a little prompting and formatting can help fix this! ## Testing - Ran locally with a few different prompts to verify the model behaves well. - Updated unit tests	2025-08-04 18:55:57 -07:00
pakrym-oai	84bcadb8d9	Restore API key and query param overrides (#1826 ) Addresses https://github.com/openai/codex/issues/1796	2025-08-04 18:07:49 -07:00
Dylan	e3565a3f43	[sandbox] Filter out certain non-sandbox errors (#1804 ) ## Summary Users frequently complain about re-approving commands that have failed for non-sandbox reasons. We can't diagnose with complete accuracy which errors happened because of a sandbox failure, but we can start to eliminate some common simple cases. This PR captures the most common case I've seen, which is a `command not found` error. ## Testing - [x] Added unit tests - [x] Ran a few cases locally	2025-08-03 13:05:48 -07:00
Jeremy Rose	78a1d49fac	fix command duration display (#1806 ) we were always displaying "0ms" before. <img width="731" height="101" alt="Screenshot 2025-08-02 at 10 51 22 PM" src="https://github.com/user-attachments/assets/f56814ed-b9a4-4164-9e78-181c60ce19b7" />	2025-08-03 11:33:44 -07:00
Michael Bolin	80555d4ff2	feat: make .git read-only within a writable root when using Seatbelt (#1765 ) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.	2025-08-01 16:11:24 -07:00
aibrahim-oai	f20de21cb6	collabse `stdout` and `stderr` delta events into one (#1787 )	2025-08-01 14:00:19 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
pakrym-oai	88ea215c80	Add a custom originator setting (#1781 )	2025-08-01 09:55:23 -07:00
aibrahim-oai	b67c485d84	ci fix (#1782 )	2025-08-01 09:17:13 -07:00
aibrahim-oai	e2c994e32a	Add /compact (#1527 ) - Add operation to summarize the context so far. - The operation runs a compact task that summarizes the context. - The operation clear the previous context to free the context window - The operation didn't use `run_task` to avoid corrupting the session - Add /compact in the tui https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c	2025-07-31 21:34:32 -07:00
pakrym-oai	0935e6a875	Send account id when available (#1767 ) For users with multiple accounts we need to specify the account to use.	2025-07-31 15:40:19 -07:00
Michael Bolin	5a0ad5ab8f	chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762 ) At 550 lines, `exec.rs` was a bit large. In particular, I found it hard to locate the Seatbelt-related code quickly without a file with `seatbelt` in the name, so this refactors things so: - `spawn_command_under_seatbelt()` and dependent code moves to a new `seatbelt.rs` file - `spawn_child_async()` and dependent code moves to a new `spawn.rs` file	2025-07-31 13:11:47 -07:00
pakrym-oai	549846b29a	Add codex login --api-key (#1759 ) Allow setting the API key via `codex login --api-key`	2025-07-31 17:48:49 +00:00
Jeremy Rose	be0cd34300	fix git tests (#1747 ) the git tests were failing on my local machine due to gpg signing config in my ~/.gitconfig. tests should not be affected by ~/.gitconfig, so configure them to ignore it.	2025-07-31 09:17:59 -07:00
pakrym-oai	e0e245cc1c	Send AGENTS.md as a separate user message (#1737 )	2025-07-30 13:56:24 -07:00
pakrym-oai	ea01a5ffe2	Add support for a separate chatgpt auth endpoint (#1712 ) Adds a `CodexAuth` type that encapsulates information about available auth modes and logic for refreshing the token. Changes `Responses` API to send requests to different endpoints based on the auth type. Updates login_with_chatgpt to support API-less mode and skip the key exchange.	2025-07-30 19:40:15 +00:00
easong-openai	f8fcaaaf6f	Relative instruction file (#1722 ) Passing in an instruction file with a bad path led to silent failures, also instruction relative paths were handled in an unintuitive fashion.	2025-07-29 10:06:05 -07:00
Michael Bolin	2405c40026	chore: update Codex::spawn() to return a struct instead of a tuple (#1677 ) Also update `init_codex()` to return a `struct` instead of a tuple, as well.	2025-07-27 20:01:35 -07:00
Michael Bolin	7af9cedbd7	fix: create separate test_support crates to eliminate #[allow(dead_code)] (#1667 ) Because of a quirk of how implementation tests work in Rust, we had a number of `#[allow(dead_code)]` annotations that were misleading because the functions _were_ being used, just not by all integration tests in a `tests/` folder, so when compiling the test that did not use the function, clippy would complain that it was unused. This fixes things by create a "test_support" crate under the `tests/` folder that is imported as a dev dependency for the respective crate.	2025-07-24 12:19:46 -07:00
vishnu-oai	2437a8d17a	Record Git metadata to rollout (#1598 ) # Summary - Writing effective evals for codex sessions requires context of the overall repository state at the moment the session began - This change adds this metadata (git repository, branch, commit hash) to the top of the rollout of the session (if available - if not it doesn't add anything) - Currently, this is only effective on a clean working tree, as we can't track uncommitted/untracked changes with the current metadata set. Ideally in the future we may want to track unclean changes somehow, or perhaps prompt the user to stash or commit them. # Testing - Added unit tests - `cargo test && cargo clippy --tests && cargo fmt -- --config imports_granularity=Item` ### Resulting Rollout <img width="1243" height="127" alt="Screenshot 2025-07-17 at 1 50 00 PM" src="https://github.com/user-attachments/assets/68108941-f015-45b2-985c-ea315ce05415" />	2025-07-24 11:35:28 -07:00
pakrym-oai	591cb6149a	Always send entire request context (#1641 ) Always store the entire conversation history. Request encrypted COT when not storing Responses. Send entire input context instead of sending previous_response_id	2025-07-23 10:37:45 -07:00
pakrym-oai	6d82907082	Add support for custom base instructions (#1645 ) Allows providing custom instructions file as a config parameter and custom instruction text via MCP tool call.	2025-07-22 09:42:22 -07:00
Dylan	18b2b30841	[mcp-server] Add reply tool call (#1643 ) ## Summary Adds a new mcp tool call, `codex-reply`, so we can continue existing sessions. This is a first draft and does not yet support sessions from previous processes. ## Testing - [x] tested with mcp client	2025-07-21 21:01:56 -07:00
aibrahim-oai	83eefb55fb	Add session loading support to Codex (#1602 ) ## Summary - extend rollout format to store all session data in JSON - add resume/write helpers for rollouts - track session state after each conversation - support `LoadSession` op to resume a previous rollout - allow starting Codex with an existing session via `experimental_resume` config variable We need a way later for exploring the available sessions in a user friendly way. ## Testing - `cargo test --no-run` (fails: `cargo: command not found`) ------ https://chatgpt.com/codex/tasks/task_i_68792a29dd5c832190bf6930d3466fba This video is outdated. you should use `-c experimental_resume:<full path>` instead of `--resume <full path>` https://github.com/user-attachments/assets/7a9975c7-aa04-4f4e-899a-9e87defd947a	2025-07-18 17:04:04 -07:00
aibrahim-oai	9846adeabf	Refactor env settings into config (#1601 ) ## Summary - add OpenAI retry and timeout fields to Config - inject these settings in tests instead of mutating env vars - plumb Config values through client and chat completions logic - document new configuration options ## Testing - `cargo test -p codex-core --no-run` ------ https://chatgpt.com/codex/tasks/task_i_68792c5b04cc832195c03050c8b6ea94 --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-07-18 19:12:39 +00:00
pakrym-oai	6f2b01bb6b	feat: ensure session ID header is sent in Response API request (#1614 ) Include the current session id in Responses API requests.	2025-07-18 09:59:07 -07:00
aibrahim-oai	fcbcc40f51	Storing the sessions in a more organized way for easier look up. (#1596 ) now storing the sessions in `~/.codex/sessions/YYYY/MM/DD/<file>`	2025-07-17 10:12:15 -07:00
aibrahim-oai	2bd3314886	support deltas in core (#1587 ) - Added support for message and reasoning deltas - Skipped adding the support in the cli and tui for later - Commented a failing test (wrong merge) that needs fix in a separate PR. Side note: I think we need to disable merge when the CI don't pass.	2025-07-16 15:11:18 -07:00
aibrahim-oai	3777e18243	Add CLI streaming integration tests (#1542 ) ## Summary - add integration test for chat mode streaming via CLI using wiremock - add integration test for Responses API streaming via fixture - call `cargo run` to invoke the CLI during tests ## Testing - `cargo test -p codex-core --test cli_stream -- --nocapture` - `cargo clippy --all-targets --all-features -- -D warnings` ------ https://chatgpt.com/codex/tasks/task_i_68715980bbec8321999534fdd6a013c1	2025-07-12 18:05:58 -07:00
aibrahim-oai	0f8ac92390	Allow deadcode in test_support (#1555 ) #1546 Was pushed while not passing the clippy integration tests. This is fixing it.	2025-07-12 17:20:35 -07:00
aibrahim-oai	c46bb67d77	Improve SSE tests (#1546 ) ## Summary - support fixture-based SSE data in tests - add helpers to load SSE JSON fixtures - add table-driven SSE unit tests - let integration tests use fixture loading - fix clippy errors from format! calls ## Testing - `cargo clippy --tests` - `cargo test --workspace --exclude codex-linux-sandbox` ------ https://chatgpt.com/codex/tasks/task_i_68717468c3e48321b51c9ecac6ba0f09	2025-07-12 16:53:55 -07:00
Rene Leonhardt	82b0cebe8b	chore(rs): update dependencies (#1494 ) ### Chores - Update cargo dependencies - Remove unused cargo dependencies - Fix clippy warnings - Update Dockerfile (package.json requires node 22) - Let Dependabot update bun, cargo, devcontainers, docker, github-actions, npm (nix still not supported) ### TODO - Upgrade dependencies with breaking changes ```shell $ cargo update --verbose Unchanged crossterm v0.28.1 (available: v0.29.0) Unchanged schemars v0.8.22 (available: v1.0.4) ```	2025-07-10 11:08:16 -07:00
Michael Bolin	c221eab0b5	feat: support custom HTTP headers for model providers (#1473 ) This adds support for two new model provider config options: - `http_headers` for hardcoded (key, value) pairs - `env_http_headers` for headers whose values should be read from environment variables This also updates the built-in `openai` provider to use this feature to set the following headers: - `originator` => `codex_cli_rs` - `version` => [CLI version] - `OpenAI-Organization` => `OPENAI_ORGANIZATION` env var - `OpenAI-Project` => `OPENAI_PROJECT` env var for consistency with the TypeScript implementation: `bd5a9e8ba9/codex-cli/src/utils/agent/agent-loop.ts (L321-L329)` While here, this also consolidates some logic that was duplicated across `client.rs` and `chat_completions.rs` by introducing `ModelProviderInfo.create_request_builder()`. Resolves https://github.com/openai/codex/discussions/1152	2025-07-07 13:09:16 -07:00
Michael Bolin	6dad5c3b17	feat: add query_params option to ModelProviderInfo to support Azure (#1435 ) As discovered in https://github.com/openai/codex/issues/1365, the Azure provider needs to be able to specify `api-version` as a query param, so this PR introduces a generic `query_params` option to the `model_providers` config so that an Azure provider can be defined as follows: ```toml [model_providers.azure] name = "Azure" base_url = "https://YOUR_PROJECT_NAME.openai.azure.com/openai" env_key = "AZURE_OPENAI_API_KEY" query_params = { api-version = "2025-04-01-preview" } ``` This PR also updates the docs with this example. While here, we also update `wire_api` to default to `"chat"`, as that is likely the common case for someone defining an external provider. Fixes https://github.com/openai/codex/issues/1365.	2025-06-30 11:39:54 -07:00
Michael Bolin	d766e845b3	feat: experimental --output-last-message flag to exec subcommand (#1037 ) This introduces an experimental `--output-last-message` flag that can be used to identify a file where the final message from the agent will be written. Two use cases: - Ultimately, we will likely add a `--quiet` option to `exec`, but even if the user does not want any output written to the terminal, they probably want to know what the agent did. Writing the output to a file makes it possible to get that information in a clean way. - Relatedly, when using `exec` in CI, it is easier to review the transcript written "normally," (i.e., not as JSON or something with extra escapes), but getting programmatic access to the last message is likely helpful, so writing the last message to a file gets the best of both worlds. I am calling this "experimental" because it is possible that we are overfitting and will want a more general solution to this problem that would justify removing this flag.	2025-05-19 16:08:18 -07:00
Michael Bolin	5fc9fc3e3e	chore: expose codex_home via Config (#941 )	2025-05-15 00:30:13 -07:00
Michael Bolin	34aa1991f1	chore: handle all cases for EventMsg (#936 ) For now, this removes the `#[non_exhaustive]` directive on `EventMsg` so that we are forced to handle all `EventMsg` by default. (We may revisit this if/when we publish `core/` as a `lib` crate.) For now, it is helpful to have this as a forcing function because we have effectively two UIs (`tui` and `exec`) and usually when we add a new variant to `EventMsg`, we want to be sure that we update both.	2025-05-14 13:36:43 -07:00
Michael Bolin	a5f3a34827	fix: change EventMsg enum so every variant takes a single struct (#925 ) https://github.com/openai/codex/pull/922 did this for the `SessionConfigured` enum variant, and I think it is generally helpful to be able to work with the values as each enum variant as their own type, so this converts the remaining variants and updates all of the callsites. Added a simple unit test to verify that the JSON-serialized version of `Event` does not have any unexpected nesting.	2025-05-13 20:44:42 -07:00

1 2

63 Commits