valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	f23c3066c8	Add capacity error (#1947 )	2025-08-07 10:46:43 -07:00
pakrym-oai	a593b1c3ab	Use different field for error type (#1945 )	2025-08-07 10:20:33 -07:00
Michael Bolin	107d2ce4e7	fix: change OPENAI_DEFAULT_MODEL to "gpt-5" (#1943 )	2025-08-07 10:13:13 -07:00
pakrym-oai	62ed5907f9	Better usage errors (#1941 ) <img width="771" height="279" alt="image" src="https://github.com/user-attachments/assets/e56f967f-bcd7-49f7-8a94-3d88df68b65a" />	2025-08-07 09:46:13 -07:00
Dylan	bc28b87c7b	[config] Onboarding flow with persistence (#1929 ) ## Summary In collaboration with @gpeal: upgrade the onboarding flow, and persist user settings. --------- Co-authored-by: Gabriel Peal <gabriel@openai.com>	2025-08-07 09:27:38 -07:00
pakrym-oai	7e9ecfbc6a	Rename the model (#1942 )	2025-08-07 09:07:51 -07:00
Michael Bolin	c2c327c723	feat: change shell_environment_policy to default to inherit="all" (#1904 ) Trying to use `core` as the default has been "too clever." Users can always take responsibility for controlling the env without this setting at all by specifying the `env` they use when calling `codex` in the first place. See https://github.com/openai/codex/issues/1249.	2025-08-07 01:55:41 -07:00
Michael Bolin	13982d6b4e	chore: fix outstanding review comments from the bot on #1919 (#1928 ) I should have read the comments before submitting!	2025-08-07 01:30:13 -07:00
ae	28395df957	[fix] fix absolute and % token counts (#1931 ) - For absolute, use non-cached input + output. - For estimating what % of the model's context window is used, we need to account for reasoning output tokens from prior turns being dropped from the context window. We approximate this here by subtracting reasoning output tokens from the total. This will be off for the current turn and pending function calls. We can improve it later.	2025-08-07 08:13:36 +00:00
Ed Bayes	eb80614a7c	Tint chat composer background (#1921 ) ## Summary - give the chat composer a subtle custom background and apply it across the full area drawn <img width="1008" height="718" alt="composer-bg" src="https://github.com/user-attachments/assets/4b0f7f69-722a-438a-b4e9-0165ae8865a6" /> - update turn interrupted to be more human readable <img width="648" height="170" alt="CleanShot 2025-08-06 at 22 44 47@2x" src="https://github.com/user-attachments/assets/8d35e53a-bbfa-48e7-8612-c280a54e01dd" /> ## Testing - `cargo test --all-features` (fails: `let` expressions in `core/src/client.rs` require newer rustc) - `just fix` (fails: `let` expressions in `core/src/client.rs` require newer rustc) ------ https://chatgpt.com/codex/tasks/task_i_68941f32c1008322bbcc39ee1d29a526	2025-08-07 00:46:45 -07:00
Michael Bolin	cd5f9074af	feat: add /tmp by default (#1919 ) Replaces the `include_default_writable_roots` option on `sandbox_workspace_write` (that defaulted to `true`, which was slightly weird/annoying) with `exclude_tmpdir_env_var`, which defaults to `false`. Though perhaps more importantly `/tmp` is now enabled by default as part of `sandbox_mode = "workspace-write"`, though `exclude_slash_tmp = false` can be used to disable this.	2025-08-07 00:17:00 -07:00
aibrahim-oai	f15e0fe1df	Ensure exec command end always emitted (#1908 ) ## Summary - defer ExecCommandEnd emission until after sandbox resolution - make sandbox error handler return final exec output and response - align sandbox error stderr with response content and rename to `final_output` - replace unstable `let` chains in client command header logic ## Testing - `just fmt` - `just fix` - `cargo test --all-features` (fails: NotPresent in core/tests/client.rs) ------ https://chatgpt.com/codex/tasks/task_i_6893e63b0c408321a8e1ff2a052c4c51	2025-08-07 06:25:56 +00:00
Gabriel Peal	8a990b5401	Migrate GitWarning to OnboardingScreen (#1915 ) This paves the way to do per-directory approval settings (https://github.com/openai/codex/pull/1912). This also lets us pass in a Config/ChatWidgetArgs into onboarding which can then mutate it and emit the ChatWidgetArgs it wants at the end which may be modified by the said approval settings. <img width="1180" height="428" alt="CleanShot 2025-08-06 at 19 30 55" src="https://github.com/user-attachments/assets/4dcfda42-0f5e-4b6d-a16d-2597109cc31c" />	2025-08-06 22:39:07 -04:00
pakrym-oai	57c973b571	Add 2025-08-06 model family (#1899 )	2025-08-06 23:14:02 +00:00
Gabriel Peal	2d5de795aa	First pass at a TUI onboarding (#1876 ) This sets up the scaffolding and basic flow for a TUI onboarding experience. It covers sign in with ChatGPT, env auth, as well as some safety guidance. Next up: 1. Replace the git warning screen 2. Use this to configure default approval/sandbox modes Note the shimmer flashes are from me slicing the video, not jank. https://github.com/user-attachments/assets/0fbe3479-fdde-41f3-87fb-a7a83ab895b8	2025-08-06 18:22:14 -04:00
pakrym-oai	8262ba58b2	Prefer env var auth over default codex auth (#1861 ) ## Summary - Prioritize provider-specific API keys over default Codex auth when building requests - Add test to ensure provider env var auth overrides default auth ## Testing - `just fmt` - `just fix` (fails: `let` expressions in this position are unstable) - `cargo test --all-features` (fails: `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_68926a104f7483208f2c8fd36763e0e3	2025-08-06 13:02:00 -07:00
Michael Bolin	64f2f2eca2	fix: support $CODEX_HOME/AGENTS.md instead of $CODEX_HOME/instructions.md (#1891 ) The docs and code do not match. It turns out the docs are "right" in they are what we have been meaning to support, so this PR updates the code: `ae88b69b09/README.md (L298-L302)` Support for `instructions.md` is a holdover from the TypeScript CLI, so we are just going to drop support for it altogether rather than maintain it in perpetuity.	2025-08-06 11:48:03 -07:00
Dylan	dc468d563f	[env] Remove git config for now (#1884 ) ## Summary Forgot to remove this in #1869 last night! Too much of a performance hit on the main thread. We can bring it back via an async thread on startup.	2025-08-06 08:05:17 -07:00
Dylan	3e8bcf0247	[prompts] Add <environment_context> (#1869 ) ## Summary Includes a new user message in the api payload which provides useful environment context for the model, so it knows about things like the current working directory and the sandbox. ## Testing Updated unit tests	2025-08-06 01:13:31 -07:00
easong-openai	f8d70d67b6	Add OSS model info (#1860 ) Add somewhat arbitrarily chosen context window/output limit.	2025-08-05 22:35:00 -07:00
Dylan	725dd6be6a	[approval_policy] Add OnRequest approval_policy (#1865 ) ## Summary A split-up PR of #1763 , stacked on top of a tools refactor #1858 to make the change clearer. From the previous summary: > Let's try something new: tell the model about the sandbox, and let it decide when it will need to break the sandbox. Some local testing suggests that it works pretty well with zero iteration on the prompt! ## Testing - [x] Added unit tests - [x] Tested locally and it appears to work smoothly!	2025-08-05 20:44:20 -07:00
Dylan	aff97ed7dd	[core] Separate tools config from openai client (#1858 ) ## Summary In an effort to make tools easier to work with and more configurable, I'm introducing `ToolConfig` and updating `Prompt` to take in a general list of Tools. I think this is simpler and better for a few reasons: - We can easily assemble tools from various sources (our own harness, mcp servers, etc.) and we can consolidate the logic for constructing the logic in one place that is separate from serialization. - client.rs no longer needs arbitrary config values, it just takes in a list of tools to serialize A hefty portion of the PR is now updating our conversion of `mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest accurately called this out as a TODO long ago, I think it's time we tackled it. ## Testing - [x] Experimented locally, no changes, as expected - [x] Added additional unit tests - [x] Responded to rust-review	2025-08-05 19:27:52 -07:00
Dylan	ea7d3f27bd	[core] Stop escalating timeouts (#1853 ) ## Summary Escalating out of sandbox is (almost always) not going to fix long-running commands timing out - therefore we should just pass the failure back to the model instead of asking the user to re-run a command that took a long time anyway. ## Testing - [x] Ran locally with a timeout and confirmed this worked as expected	2025-08-05 17:52:25 -07:00
Michael Bolin	42bd73e150	chore: remove unnecessary default_ prefix (#1854 ) This prefix is not inline with the other fields on the `ConfigOverrides` struct.	2025-08-05 14:42:49 -07:00
Michael Bolin	d365cae077	fix: when using `--oss`, ensure correct configuration is threaded through correctly (#1859 ) This PR started as an investigation with the goal of eliminating the use of `unsafe { std::env::set_var() }` in `ollama/src/client.rs`, as setting environment variables in a multithreaded context is indeed unsafe and these tests were observed to be flaky, as a result. Though as I dug deeper into the issue, I discovered that the logic for instantiating `OllamaClient` under test scenarios was not quite right. In this PR, I aimed to: - share more code between the two creation codepaths, `try_from_oss_provider()` and `try_from_provider_with_base_url()` - use the values from `Config` when setting up Ollama, as we have various mechanisms for overriding config values, so we should be sure that we are always using the ultimate `Config` for things such as the `ModelProviderInfo` associated with the `oss` id Once this was in place, `OllamaClient::try_from_provider_with_base_url()` could be used in unit tests for `OllamaClient` so it was possible to create a properly configured client without having to set environment variables.	2025-08-05 13:55:32 -07:00
easong-openai	9285350842	Introduce `--oss` flag to use gpt-oss models (#1848 ) This adds support for easily running Codex backed by a local Ollama instance running our new open source models. See https://github.com/openai/gpt-oss for details. If you pass in `--oss` you'll be prompted to install/launch ollama, and it will automatically download the 20b model and attempt to use it. We'll likely want to expand this with some options later to make the experience smoother for users who can't run the 20b or want to run the 120b. Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-08-05 11:31:11 -07:00
easong-openai	e0303dbac0	Rescue chat completion changes (#1846 ) https://github.com/openai/codex/pull/1835 has some messed up history. This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-08-05 08:56:13 +00:00
Dylan	d31e149cb1	[prompt] Update prompt.md (#1839 ) ## Summary Additional clarifications to our prompt. Still very concise, but we'll continue to add more here.	2025-08-05 00:43:23 -07:00
Michael Bolin	136b3ee5bf	chore: introduce ModelFamily abstraction (#1838 ) To date, we have a number of hardcoded OpenAI model slug checks spread throughout the codebase, which makes it hard to audit the various special cases for each model. To mitigate this issue, this PR introduces the idea of a `ModelFamily` that has fields to represent the existing special cases, such as `supports_reasoning_summaries` and `uses_local_shell_tool`. There is a `find_family_for_model()` function that maps the raw model slug to a `ModelFamily`. This function hardcodes all the knowledge about the special attributes for each model. This PR then replaces the hardcoded model name checks with checks against a `ModelFamily`. Note `ModelFamily` is now available as `Config::model_family`. We should ultimately remove `Config::model` in favor of `Config::model_family::slug`.	2025-08-04 23:50:03 -07:00
easong-openai	906d449760	Stream model responses (#1810 ) Stream models thoughts and responses instead of waiting for the whole thing to come through. Very rough right now, but I'm making the risk call to push through.	2025-08-05 04:23:22 +00:00
Dylan	063083af15	[prompts] Better user_instructions handling (#1836 ) ## Summary Our recent change in #1737 can sometimes lead to the model confusing AGENTS.md context as part of the message. But a little prompting and formatting can help fix this! ## Testing - Ran locally with a few different prompts to verify the model behaves well. - Updated unit tests	2025-08-04 18:55:57 -07:00
pakrym-oai	84bcadb8d9	Restore API key and query param overrides (#1826 ) Addresses https://github.com/openai/codex/issues/1796	2025-08-04 18:07:49 -07:00
Ahmed Ibrahim	e38ce39c51	Revert to `3f13ebce10` without rewriting history. Wrong merge	2025-08-04 17:03:24 -07:00
Ahmed Ibrahim	1a33de34b0	unify flag	2025-08-04 16:56:52 -07:00
Ahmed Ibrahim	bd171e5206	add raw reasoning	2025-08-04 16:49:42 -07:00
ae	dc15a5cf0b	feat: accept custom instructions in profiles (#1803 ) Allows users to set their experimental_instructions_file in configs. For example the below enables experimental instructions when running `codex -p foo`. ``` [profiles.foo] experimental_instructions_file = "/Users/foo/.codex/prompt.md" ``` # Testing - ✅ Running against a profile with experimental_instructions_file works. - ✅ Running against a profile without experimental_instructions_file works. - ✅ Running against no profile with experimental_instructions_file works. - ✅ Running against no profile without experimental_instructions_file works.	2025-08-04 09:34:46 -07:00
Gabriel Peal	1f3318c1c5	Add a TurnDiffTracker to create a unified diff for an entire turn (#1770 ) This lets us show an accumulating diff across all patches in a turn. Refer to the docs for TurnDiffTracker for implementation details. There are multiple ways this could have been done and this felt like the right tradeoff between reliability and completeness: Pros * It will pick up all changes to files that the model touched including if they prettier or another command that updates them. * It will not pick up changes made by the user or other agents to files it didn't modify. Cons * It will pick up changes that the user made to a file that the model also touched * It will not pick up changes to codegen or files that were not modified with apply_patch	2025-08-04 11:57:04 -04:00
Dylan	e3565a3f43	[sandbox] Filter out certain non-sandbox errors (#1804 ) ## Summary Users frequently complain about re-approving commands that have failed for non-sandbox reasons. We can't diagnose with complete accuracy which errors happened because of a sandbox failure, but we can start to eliminate some common simple cases. This PR captures the most common case I've seen, which is a `command not found` error. ## Testing - [x] Added unit tests - [x] Ran a few cases locally	2025-08-03 13:05:48 -07:00
Jeremy Rose	78a1d49fac	fix command duration display (#1806 ) we were always displaying "0ms" before. <img width="731" height="101" alt="Screenshot 2025-08-02 at 10 51 22 PM" src="https://github.com/user-attachments/assets/f56814ed-b9a4-4164-9e78-181c60ce19b7" />	2025-08-03 11:33:44 -07:00
David Z Hao	75eecb656e	Fix MacOS multiprocessing by relaxing sandbox (#1808 ) The following test script fails in the codex sandbox: ``` import multiprocessing from multiprocessing import Lock, Process def f(lock): with lock: print("Lock acquired in child process") if __name__ == '__main__': lock = Lock() p = Process(target=f, args=(lock,)) p.start() p.join() ``` with ``` Traceback (most recent call last): File "/Users/david.hao/code/codex/codex-rs/cli/test.py", line 9, in <module> lock = Lock() ^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/context.py", line 68, in Lock return Lock(ctx=self.get_context()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 169, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx) File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 57, in __init__ sl = self._semlock = _multiprocessing.SemLock( ^^^^^^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 1] Operation not permitted ``` After reading, adding this line to the sandbox configs fixes things - MacOS multiprocessing appears to use sem_lock(), which opens an IPC which is considered a disk write even though no file is created. I interrogated ChatGPT about whether it's okay to loosen, and my impression after reading is that it is, although would appreciate a close look Breadcrumb: You can run `cargo run -- debug seatbelt --full-auto <cmd>` to test the sandbox	2025-08-03 06:59:26 -07:00
aibrahim-oai	81bb1c9e26	Fix compact (#1798 ) We are not recording the summary in the history.	2025-08-02 12:05:06 -07:00
Michael Bolin	80555d4ff2	feat: make .git read-only within a writable root when using Seatbelt (#1765 ) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.	2025-08-01 16:11:24 -07:00
Michael Bolin	92f3566d78	chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785 ) Without this change, it is challenging to create integration tests to verify that the folders not included in `writable_roots` in `SandboxPolicy::WorkspaceWrite` are read-only because, by default, `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most integrationt tests do their work. This introduces a `use_exact_writable_roots` option to disable the default includes returned by `get_writable_roots_with_cwd()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785). * #1765 * __->__ #1785	2025-08-01 14:15:55 -07:00
aibrahim-oai	f20de21cb6	collabse `stdout` and `stderr` delta events into one (#1787 )	2025-08-01 14:00:19 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
pakrym-oai	88ea215c80	Add a custom originator setting (#1781 )	2025-08-01 09:55:23 -07:00
aibrahim-oai	e2c994e32a	Add /compact (#1527 ) - Add operation to summarize the context so far. - The operation runs a compact task that summarizes the context. - The operation clear the previous context to free the context window - The operation didn't use `run_task` to avoid corrupting the session - Add /compact in the tui https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c	2025-07-31 21:34:32 -07:00
pakrym-oai	0935e6a875	Send account id when available (#1767 ) For users with multiple accounts we need to specify the account to use.	2025-07-31 15:40:19 -07:00
Michael Bolin	5a0ad5ab8f	chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762 ) At 550 lines, `exec.rs` was a bit large. In particular, I found it hard to locate the Seatbelt-related code quickly without a file with `seatbelt` in the name, so this refactors things so: - `spawn_command_under_seatbelt()` and dependent code moves to a new `seatbelt.rs` file - `spawn_child_async()` and dependent code moves to a new `spawn.rs` file	2025-07-31 13:11:47 -07:00
Michael Bolin	06c786b2da	fix: ensure PatchApplyBeginEvent and PatchApplyEndEvent are dispatched reliably (#1760 ) This is a follow-up to https://github.com/openai/codex/pull/1705, as that PR inadvertently lost the logic where `PatchApplyBeginEvent` and `PatchApplyEndEvent` events were sent when patches were auto-approved. Though as part of this fix, I believe this also makes an important safety fix to `assess_patch_safety()`, as there was a case that returned `SandboxType::None`, which arguably is the thing we were trying to avoid in #1705. On a high level, we want there to be only one codepath where `apply_patch` happens, which should be unified with the patch to run `exec`, in general, so that sandboxing is applied consistently for both cases. Prior to this change, `apply_patch()` in `core` would either: * exit early, delegating to `exec()` to shell out to `apply_patch` using the appropriate sandbox * proceed to run the logic for `apply_patch` in memory `549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)` In this implementation, only the latter would dispatch `PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the `apply_patch` call (or, more specifically, the `codex --codex-run-as-apply-patch PATCH` call). To unify things in this PR, we: * Eliminate the back half of the `apply_patch()` function, and instead have it also return with `DelegateToExec`, though we add an extra field to the return value, `user_explicitly_approved_this_action`. * In `codex.rs` where we process `DelegateToExec`, we use `SandboxType::None` when `user_explicitly_approved_this_action` is `true`. This means we no longer run the apply_patch logic in memory, as we always `exec()`. (Note this is what allowed us to delete so much code in `apply_patch.rs`.) * In `codex.rs`, we further update `notify_exec_command_begin()` and `notify_exec_command_end()` to take additional fields to determine what type of notification to send: `ExecCommand` or `PatchApply`. Admittedly, this PR also drops some of the functionality about giving the user the opportunity to expand the set of writable roots as part of approving the `apply_patch` command. I'm not sure how much that was used, and we should probably rethink how that works as we are currently tidying up the protocol to the TUI, in general.	2025-07-31 11:13:57 -07:00

1 2 3 4

188 Commits