valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
jif-oai	611e00c862	feat: compactor 2 (#6027 ) Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-10-31 14:27:08 -07:00
Ahmed Ibrahim	13e1d0362d	Delegate review to codex instance (#5572 ) In this PR, I am exploring migrating task kind to an invocation of Codex. The main reason would be getting rid off multiple `ConversationHistory` state and streamlining our context/history management. This approach depends on opening a channel between the sub-codex and codex. This channel is responsible for forwarding `interactive` (`approvals`) and `non-interactive` events. The `task` is responsible for handling those events. This opens the door for implementing `codex as a tool`, replacing `compact` and `review`, and potentially subagents. One consideration is this code is very similar to `app-server` specially in the approval part. If in the future we wanted an interactive `sub-codex` we should consider using `codex-mcp`	2025-10-29 21:04:25 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	ea225df22e	feat: context compaction (#3446 ) ## Compact feature: 1. Stops the model when the context window become too large 2. Add a user turn, asking for the model to summarize 3. Build a bridge that contains all the previous user message + the summary. Rendered from a template 4. Start sampling again from a clean conversation with only that bridge	2025-09-12 13:07:10 -07:00

5 Commits