valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
iceweasel-oai	87cce88f48	Windows Sandbox - Alpha version (#4905 ) - Added the new codex-windows-sandbox crate that builds both a library entry point (run_windows_sandbox_capture) and a CLI executable to launch commands inside a Windows restricted-token sandbox, including ACL management, capability SID provisioning, network lockdown, and output capture (windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54). - Introduced the experimental WindowsSandbox feature flag and wiring so Windows builds can opt into the sandbox: SandboxType::WindowsRestrictedToken, the in-process execution path, and platform sandbox selection now honor the flag (core/src/features.rs:47, core/src/config.rs:1224, core/src/safety.rs:19, core/src/sandboxing/mod.rs:69, core/src/exec.rs:79, core/src/exec.rs:172). - Updated workspace metadata to include the new crate and its Windows-specific dependencies so the core crate can link against it (codex-rs/ Cargo.toml:91, core/Cargo.toml:86). - Added a PowerShell bootstrap script that installs the Windows toolchain, required CLI utilities, and builds the workspace to ease development on the platform (scripts/setup-windows.ps1:1). - Landed a Python smoke-test suite that exercises read-only/workspace-write policies, ACL behavior, and network denial for the Windows sandbox binary (windows-sandbox-rs/sandbox_smoketests.py:1).	2025-10-30 15:51:57 -07:00
jif-oai	3183935bd7	feat: add output even in sandbox denied (#5908 )	2025-10-29 18:21:18 +00:00
Abhishek Bhardwaj	89591e4246	feature: Add "!cmd" user shell execution (#2471 ) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). Summary of changes - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. End-to-end flow TUI handling 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. Core submission loop 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. Task execution 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. Shell tool runtime 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. Completion 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. Conversation context - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. Demo video https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996	2025-10-29 00:31:20 -07:00
Gabriel Peal	b0bdc04c30	[MCP] Render MCP tool call result images to the model (#5600 ) It's pretty amazing we have gotten here without the ability for the model to see image content from MCP tool calls. This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get adequete credit here but I also want to get this fix in ASAP so I gave him a week to update it and haven't gotten a response so I'm going to take it across the finish line. This test highlights how absured the current situation is. I asked the model to read this image using the Chrome MCP <img width="2378" height="674" alt="image" src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0" /> After this change, it correctly outputs: > Captured the page: image dhows a dark terminal-style UI labeled `OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and working directory `/codex/codex-rs` (and more) Before this change, it said: > Took the full-page screenshot you asked for. It shows a long, horizontally repeating pattern of stylized people in orange, light-blue, and mustard clothing, holding hands in alternating poses against a white background. No text or other graphics-just rows of flat illustration stretching off to the right. Without this change, the Figma, Playwright, Chrome, and other visual MCP servers are pretty much entirely useless. I tested this change with the openai respones api as well as a third party completions api	2025-10-27 17:55:57 -04:00
Ahmed Ibrahim	7226365397	Centralize truncation in conversation history (#5652 ) move the truncation logic to conversation history to use on any tool output. This will help us in avoiding edge cases while truncating the tool calls and mcp calls.	2025-10-27 14:05:35 -07:00
jif-oai	e92c4f6561	feat: async ghost commit (#5618 )	2025-10-27 10:09:10 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	a6b9471548	feat: end events on unified exec (#5551 )	2025-10-23 18:51:34 +01:00
jif-oai	6745b12427	chore: testing on apply_path (#5557 )	2025-10-23 17:00:48 +01:00
Ahmed Ibrahim	f59978ed3d	Handle cancelling/aborting while processing a turn (#5543 ) Currently we collect all all turn items in a vector, then we add it to the history on success. This result in losing those items on errors including aborting `ctrl+c`. This PR: - Adds the ability for the tool call to handle cancellation - bubble the turn items up to where we are recording this info Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of error edge cases. The right thing to do is recording to the history on the spot as `items`/`tool calls output` come. However, this isn't possible because of having different `task_kind` that has different `conversation_histories`. The `try_run_turn` has no idea what thread are we using. We cannot also pass an `arc` to the `conversation_histories` because it's a private element of `state`. That's said, `abort` is the most common case and we should cover it until we remove `task kind`	2025-10-23 08:47:10 -07:00
jif-oai	892eaff46d	fix: approval issue (#5525 )	2025-10-23 11:13:53 +01:00
jif-oai	8e291a1706	chore: clean `handle_container_exec_with_params` (#5516 ) Drop `handle_container_exec_with_params` to have simpler and more straight forward execution path	2025-10-23 09:24:01 +01:00
jif-oai	bac7acaa7c	chore: clean spec tests (#5517 )	2025-10-22 18:30:33 +01:00
jif-oai	00b1e130b3	chore: align unified_exec (#5442 ) Align `unified_exec` with b implementation	2025-10-22 11:50:18 +01:00
jif-oai	da82153a8d	fix: fix UI issue when 0 omitted lines (#5451 )	2025-10-21 16:45:05 +00:00
jif-oai	4bd68e4d9e	feat: emit events for unified_exec (#5448 )	2025-10-21 17:32:39 +01:00
pakrym-oai	1b10a3a1b2	Enable plan tool by default (#5384 ) ## Summary - make the plan tool available by default by removing the feature flag and always registering the handler - drop plan-tool CLI and API toggles across the exec, TUI, MCP server, and app server code paths - update tests and configs to reflect the always-on plan tool and guard workspace restriction tests against env leakage ## Testing Manually tested the extension. ------ https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9	2025-10-21 16:25:05 +00:00
pakrym-oai	789e65b9d2	Pass TurnContext around instead of sub_id (#5421 ) Today `sub_id` is an ID of a single incoming Codex Op submition. We then associate all events triggered by this operation using the same `sub_id`. At the same time we are also creating a TurnContext per submission and we'd like to start associating some events (item added/item completed) with an entire turn instead of just the operation that started it. Using turn context when sending events give us flexibility to change notification scheme.	2025-10-21 08:04:16 -07:00
pakrym-oai	9c903c4716	Add ItemStarted/ItemCompleted events for UserInputItem (#5306 ) Adds a new ItemStarted event and delivers UserMessage as the first item type (more to come). Renames `InputItem` to `UserInput` considering we're using the `Item` suffix for actual items.	2025-10-20 13:34:44 -07:00
jif-oai	5e4f3bbb0b	chore: rework tools execution workflow (#5278 ) Re-work the tool execution flow. Read `orchestrator.rs` to understand the structure	2025-10-20 20:57:37 +01:00
jif-oai	6915ba2100	feat: better UX during refusal (#5260 ) <img width="568" height="169" alt="Screenshot 2025-10-16 at 18 28 05" src="https://github.com/user-attachments/assets/f42e8d6d-b7de-4948-b291-a5fbb50b1312" />	2025-10-17 11:06:55 +02:00
Gabriel Peal	40fba1bb4c	[MCP] Add support for resources (#5239 ) This PR adds support for [MCP resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources) by adding three new tools for the model: 1. `list_resources` 2. `list_resource_templates` 3. `read_resource` These 3 tools correspond to the [three primary MCP resource protocol messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages). Example of listing and reading a GitHub resource tempalte <img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10" src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585" /> `/mcp` with Figma configured <img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35" src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee" /> Fixes #4956	2025-10-17 01:05:15 -04:00
jif-oai	f7b4e29609	feat: feature flag (#4948 ) Add proper feature flag instead of having custom flags for everything. This is just for experimental/wip part of the code It can be used through CLI: ```bash codex --enable unified_exec --disable view_image_tool ``` Or in the `config.toml` ```toml # Global toggles applied to every profile unless overridden. [features] apply_patch_freeform = true view_image_tool = false ``` Follow-up: In a following PR, the goal is to have a default have `bundles` of features that we can associate to a model	2025-10-14 17:50:00 +00:00
jif-oai	0026b12615	feat: indentation mode for read_file (#4887 ) Add a read file that select the region of the file based on the indentation level	2025-10-09 15:55:02 +00:00
jif-oai	f52320be86	feat: grep_files as a tool (#4820 ) Add `grep_files` to be able to perform more action in parallel	2025-10-08 11:02:50 +01:00
jif-oai	226215f36d	feat: `list_dir` tool (#4817 ) Add a tool to list_dir. It is useful because we can mark it as non-mutating and so use it in parallel	2025-10-07 19:33:19 +01:00
pakrym-oai	f2555422b9	Simplify parallel (#4829 ) make tool processing return a future and then collect futures. handle cleanup on Drop	2025-10-07 10:12:38 -07:00
jif-oai	dc3c6bf62a	feat: parallel tool calls (#4663 ) Add parallel tool calls. This is configurable at model level and tool level	2025-10-05 16:10:49 +00:00
Dylan	3203862167	chore: update tool config (#4755 ) ## Summary Updates tool config for gpt-5-codex ## Test Plan - [x] Ran locally - [x] Updated unit tests	2025-10-04 22:47:26 -07:00
Ahmed Ibrahim	cc2f4aafd7	Add truncation hint on truncated exec output. (#4740 ) When truncating output, add a hint of the total number of lines	2025-10-05 03:29:07 +00:00
Dylan	4764fc1ee7	feat: Freeform apply_patch with simple shell output (#4718 ) ## Summary This PR is an alternative approach to #4711, but instead of changing our storage, parses out shell calls in the client and reserializes them on the fly before we send them out as part of the request. What this changes: 1. Adds additional serialization logic when the ApplyPatchToolType::Freeform is in use. 2. Adds a --custom-apply-patch flag to enable this setting on a session-by-session basis. This change is delicate, but is not meant to be permanent. It is meant to be the first step in a migration: 1. (This PR) Add in-flight serialization with config 2. Update model_family default 3. Update serialization logic to store turn outputs in a structured format, with logic to serialize based on model_family setting. 4. Remove this rewrite in-flight logic. ## Test Plan - [x] Additional unit tests added - [x] Integration tests added - [x] Tested locally	2025-10-04 19:16:36 -07:00
Ahmed Ibrahim	d7acd146fb	fix: exec commands that blows up context window. (#4706 ) We truncate the output of exec commands to not blow the context window. However, some cases we weren't doing that. This caused reports of people with 76% context window left facing `input exceeded context window` which is weird.	2025-10-04 11:49:56 -07:00
jif-oai	e0b38bd7a2	feat: add `beta_supported_tools` (#4669 ) Gate the new read_file tool behind a new `beta_supported_tools` flag and only enable it for `gpt-5-codex`	2025-10-03 16:58:03 +00:00
jif-oai	33d3ecbccc	chore: refactor tool handling (#4510 ) # Tool System Refactor - Centralizes tool definitions and execution in `core/src/tools/`: specs (`spec.rs`), handlers (`handlers/`), router (`router.rs`), registry/dispatch (`registry.rs`), and shared context (`context.rs`). One registry now builds the model-visible tool list and binds handlers. - Router converts model responses to tool calls; Registry dispatches with consistent telemetry via `codex-rs/otel` and unified error handling. Function, Local Shell, MCP, and experimental `unified_exec` all flow through this path; legacy shell aliases still work. - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and make adding tools predictable and testable. Example: `read_file` - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`, registered by `build_specs`). - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`, 1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation). - E2E test: `core/tests/suite/read_file.rs` validates the tool returns the requested lines. ## Next steps: - Decompose `handle_container_exec_with_params` - Add parallel tool calls	2025-10-03 13:21:06 +01:00

34 Commits