valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Dylan	725dd6be6a	[approval_policy] Add OnRequest approval_policy (#1865 ) ## Summary A split-up PR of #1763 , stacked on top of a tools refactor #1858 to make the change clearer. From the previous summary: > Let's try something new: tell the model about the sandbox, and let it decide when it will need to break the sandbox. Some local testing suggests that it works pretty well with zero iteration on the prompt! ## Testing - [x] Added unit tests - [x] Tested locally and it appears to work smoothly!	2025-08-05 20:44:20 -07:00
Michael Bolin	92f3566d78	chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785 ) Without this change, it is challenging to create integration tests to verify that the folders not included in `writable_roots` in `SandboxPolicy::WorkspaceWrite` are read-only because, by default, `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most integrationt tests do their work. This introduces a `use_exact_writable_roots` option to disable the default includes returned by `get_writable_roots_with_cwd()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785). * #1765 * __->__ #1785	2025-08-01 14:15:55 -07:00
pakrym-oai	51b6bdefbe	Auto format toml (#1745 ) Add recommended extension and configure it to auto format prompt.	2025-07-30 18:37:00 -07:00
aibrahim-oai	83eefb55fb	Add session loading support to Codex (#1602 ) ## Summary - extend rollout format to store all session data in JSON - add resume/write helpers for rollouts - track session state after each conversation - support `LoadSession` op to resume a previous rollout - allow starting Codex with an existing session via `experimental_resume` config variable We need a way later for exploring the available sessions in a user friendly way. ## Testing - `cargo test --no-run` (fails: `cargo: command not found`) ------ https://chatgpt.com/codex/tasks/task_i_68792a29dd5c832190bf6930d3466fba This video is outdated. you should use `-c experimental_resume:<full path>` instead of `--resume <full path>` https://github.com/user-attachments/assets/7a9975c7-aa04-4f4e-899a-9e87defd947a	2025-07-18 17:04:04 -07:00
Rene Leonhardt	82b0cebe8b	chore(rs): update dependencies (#1494 ) ### Chores - Update cargo dependencies - Remove unused cargo dependencies - Fix clippy warnings - Update Dockerfile (package.json requires node 22) - Let Dependabot update bun, cargo, devcontainers, docker, github-actions, npm (nix still not supported) ### TODO - Upgrade dependencies with breaking changes ```shell $ cargo update --verbose Unchanged crossterm v0.28.1 (available: v0.29.0) Unchanged schemars v0.8.22 (available: v1.0.4) ```	2025-07-10 11:08:16 -07:00
Michael Bolin	e0c08cea4f	feat: add support for --sandbox flag (#1476 ) On a high-level, we try to design `config.toml` so that you don't have to "comment out a lot of stuff" when testing different options. Previously, defining a sandbox policy was somewhat at odds with this principle because you would define the policy as attributes of `[sandbox]` like so: ```toml [sandbox] mode = "workspace-write" writable_roots = [ "/tmp" ] ``` but if you wanted to temporarily change to a read-only sandbox, you might feel compelled to modify your file to be: ```toml [sandbox] mode = "read-only" # mode = "workspace-write" # writable_roots = [ "/tmp" ] ``` Technically, commenting out `writable_roots` would not be strictly necessary, as `mode = "read-only"` would ignore `writable_roots`, but it's still a reasonable thing to do to keep things tidy. Currently, the various values for `mode` do not support that many attributes, so this is not that hard to maintain, but one could imagine this becoming more complex in the future. In this PR, we change Codex CLI so that it no longer recognizes `[sandbox]`. Instead, it introduces a top-level option, `sandbox_mode`, and `[sandbox_workspace_write]` is used to further configure the sandbox when when `sandbox_mode = "workspace-write"` is used: ```toml sandbox_mode = "workspace-write" [sandbox_workspace_write] writable_roots = [ "/tmp" ] ``` This feels a bit more future-proof in that it is less tedious to configure different sandboxes: ```toml sandbox_mode = "workspace-write" [sandbox_read_only] # read-only options here... [sandbox_workspace_write] writable_roots = [ "/tmp" ] [sandbox_danger_full_access] # danger-full-access options here... ``` In this scheme, you never need to comment out the configuration for an individual sandbox type: you only need to redefine `sandbox_mode`. Relatedly, previous to this change, a user had to do `-c sandbox.mode=read-only` to change the mode on the command line. With this change, things are arguably a bit cleaner because the equivalent option is `-c sandbox_mode=read-only` (and now `-c sandbox_workspace_write=...` can be set separately). Though more importantly, we introduce the `-s/--sandbox` option to the CLI, which maps directly to `sandbox_mode` in `config.toml`, making config override behavior easier to reason about. Moreover, as you can see in the updates to the various Markdown files, it is much easier to explain how to configure sandboxing when things like `--sandbox read-only` can be used as an example. Relatedly, this cleanup also made it straightforward to add support for a `sandbox` option for Codex when used as an MCP server (see the changes to `mcp-server/src/codex_tool_config.rs`). Fixes https://github.com/openai/codex/issues/1248.	2025-07-07 22:31:30 -07:00
Michael Bolin	72082164c1	chore: rename AskForApproval::UnlessAllowListed to AskForApproval::UnlessTrusted (#1385 ) We could just rename to `Untrusted` instead of `UnlessTrusted`, but I think `AskForApproval::UnlessTrusted` reads a bit better.	2025-06-25 12:26:13 -07:00
Michael Bolin	86d5a9d80d	chore: rename unless-allow-listed to untrusted (#1378 ) For the `approval_policy` config option, renames `unless-allow-listed` to `untrusted`. In general, when it comes to exec'ing commands, I think "trusted" is a more accurate term than "safe." Also drops the `AskForApproval::AutoEdit` variant, as we were not really making use of it, anyway. Fixes https://github.com/openai/codex/issues/1250. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1378). * #1379 * __->__ #1378	2025-06-24 22:19:21 -07:00
Michael Bolin	531ce7626f	fix: pretty-print the sandbox config in the TUI/exec modes (#1376 ) Now that https://github.com/openai/codex/pull/1373 simplified the sandbox config, we can print something much simpler in the TUI (and in `codex exec`) to summarize the sandbox config. Before: ![Screenshot 2025-06-24 at 5 45 52 PM](https://github.com/user-attachments/assets/b7633efb-a619-43e1-9abe-7bb0be2d0ec0) With this change: ![Screenshot 2025-06-24 at 5 46 44 PM](https://github.com/user-attachments/assets/8d099bdd-a429-4796-a08d-70931d984e4f) For reference, my `config.toml` contains: ``` [sandbox] mode = "workspace-write" writable_roots = ["/tmp", "/Users/mbolin/.pyenv/shims"] ``` Fixes https://github.com/openai/codex/issues/1248	2025-06-24 17:48:51 -07:00
Michael Bolin	0776d78357	feat: redesign sandbox config (#1373 ) This is a major redesign of how sandbox configuration works and aims to fix https://github.com/openai/codex/issues/1248. Specifically, it replaces `sandbox_permissions` in `config.toml` (and the `-s`/`--sandbox-permission` CLI flags) with a "table" with effectively three variants: ```toml # Safest option: full disk is read-only, but writes and network access are disallowed. [sandbox] mode = "read-only" # The cwd of the Codex task is writable, as well as $TMPDIR on macOS. # writable_roots can be used to specify additional writable folders. [sandbox] mode = "workspace-write" writable_roots = [] # Optional, defaults to the empty list. network_access = false # Optional, defaults to false. # Disable sandboxing: use at your own risk!!! [sandbox] mode = "danger-full-access" ``` This should make sandboxing easier to reason about. While we have dropped support for `-s`, the way it works now is: - no flags => `read-only` - `--full-auto` => `workspace-write` - currently, there is no way to specify `danger-full-access` via a CLI flag, but we will revisit that as part of https://github.com/openai/codex/issues/1254 Outstanding issue: - As noted in the `TODO` on `SandboxPolicy::is_unrestricted()`, we are still conflating sandbox preferences with approval preferences in that case, which needs to be cleaned up.	2025-06-24 16:59:47 -07:00
Fouad Matin	828e2062c2	fix(codex-rs): use codex-mini-latest as default (#1164 )	2025-05-29 16:55:19 -07:00
Michael Bolin	d60f350cf8	feat: add support for -c/--config to override individual config items (#1137 ) This PR introduces support for `-c`/`--config` so users can override individual config values on the command line using `--config name=value`. Example: ``` codex --config model=o4-mini ``` Making it possible to set arbitrary config values on the command line results in a more flexible configuration scheme and makes it easier to provide single-line examples that can be copy-pasted from documentation. Effectively, it means there are four levels of configuration for some values: - Default value (e.g., `model` currently defaults to `o4-mini`) - Value in `config.toml` (e.g., user could override the default to be `model = "o3"` in their `config.toml`) - Specifying `-c` or `--config` to override `model` (e.g., user can include `-c model=o3` in their list of args to Codex) - If available, a config-specific flag can be used, which takes precedence over `-c` (e.g., user can specify `--model o3` in their list of args to Codex) Now that it is possible to specify anything that could be configured in `config.toml` on the command line using `-c`, we do not need to have a custom flag for every possible config option (which can clutter the output of `--help`). To that end, as part of this PR, we drop support for the `--disable-response-storage` flag, as users can now specify `-c disable_response_storage=true` to get the equivalent functionality. Under the hood, this works by loading the `config.toml` into a `toml::Value`. Then for each `key=value`, we create a small synthetic TOML file with `value` so that we can run the TOML parser to get the equivalent `toml::Value`. We then parse `key` to determine the point in the original `toml::Value` to do the insert/replace. Once all of the overrides from `-c` args have been applied, the `toml::Value` is deserialized into a `ConfigToml` and then the `ConfigOverrides` are applied, as before.	2025-05-27 23:11:44 -07:00
Michael Bolin	30cbfdfa87	chore: update exec crate to use std::time instead of chrono (#952 ) When I originally wrote `elapsed.rs`, I realized we were using both `std::time` and `chrono` with no real benefit of having both. We should try to keep the `exec` subcommand trim (as it also buildable as a standalone executable), so this helps tighten things up.	2025-05-16 08:14:50 -07:00
jcoens-openai	87cf120873	Workspace lints and disallow unwrap (#855 ) Sets submodules to use workspace lints. Added denying unwrap as a workspace level lint, which found a couple of cases where we could have propagated errors. Also manually labeled ones that were fine by my eye.	2025-05-08 09:46:18 -07:00
jcoens-openai	a080d7b0fd	Update submodules version to come from the workspace (#850 ) Tie the version of submodules to the workspace version.	2025-05-07 10:08:06 -07:00
jcoens-openai	8a89d3aeda	Update cargo to 2024 edition (#842 ) Some effects of this change: - New formatting changes across many files. No functionality changes should occur from that. - Calls to `set_env` are considered unsafe, since this only happens in tests we wrap them in `unsafe` blocks	2025-05-07 08:37:48 -07:00
Michael Bolin	c577e94b67	chore: introduce codex-common crate (#843 ) I started this PR because I wanted to share the `format_duration()` utility function in `codex-rs/exec/src/event_processor.rs` with the TUI. The question was: where to put it? `core` should have as few dependencies as possible, so moving it there would introduce a dependency on `chrono`, which seemed undesirable. `core` already had this `cli` feature to deal with a similar situation around sharing common utility functions, so I decided to: * make `core` feature-free * introduce `common` * `common` can have as many "special interest" features as it needs, each of which can declare their own deps * the first two features of common are `cli` and `elapsed` In practice, this meant updating a number of `Cargo.toml` files, replacing this line: ```toml codex-core = { path = "../core", features = ["cli"] } ``` with these: ```toml codex-core = { path = "../core" } codex-common = { path = "../common", features = ["cli"] } ``` Moving `format_duration()` into its own file gave it some "breathing room" to add a unit test, so I had Codex generate some tests and new support for durations over 1 minute.	2025-05-06 17:38:56 -07:00

17 Commits