feat: Complete LLMX v0.1.0 - Rebrand from Codex with LiteLLM Integration
This release represents a comprehensive transformation of the codebase from Codex to LLMX, enhanced with LiteLLM integration to support 100+ LLM providers through a unified API. ## Major Changes ### Phase 1: Repository & Infrastructure Setup - Established new repository structure and branching strategy - Created comprehensive project documentation (CLAUDE.md, LITELLM-SETUP.md) - Set up development environment and tooling configuration ### Phase 2: Rust Workspace Transformation - Renamed all Rust crates from `codex-*` to `llmx-*` (30+ crates) - Updated package names, binary names, and workspace members - Renamed core modules: codex.rs → llmx.rs, codex_delegate.rs → llmx_delegate.rs - Updated all internal references, imports, and type names - Renamed directories: codex-rs/ → llmx-rs/, codex-backend-openapi-models/ → llmx-backend-openapi-models/ - Fixed all Rust compilation errors after mass rename ### Phase 3: LiteLLM Integration - Integrated LiteLLM for multi-provider LLM support (Anthropic, OpenAI, Azure, Google AI, AWS Bedrock, etc.) - Implemented OpenAI-compatible Chat Completions API support - Added model family detection and provider-specific handling - Updated authentication to support LiteLLM API keys - Renamed environment variables: OPENAI_BASE_URL → LLMX_BASE_URL - Added LLMX_API_KEY for unified authentication - Enhanced error handling for Chat Completions API responses - Implemented fallback mechanisms between Responses API and Chat Completions API ### Phase 4: TypeScript/Node.js Components - Renamed npm package: @codex/codex-cli → @valknar/llmx - Updated TypeScript SDK to use new LLMX APIs and endpoints - Fixed all TypeScript compilation and linting errors - Updated SDK tests to support both API backends - Enhanced mock server to handle multiple API formats - Updated build scripts for cross-platform packaging ### Phase 5: Configuration & Documentation - Updated all configuration files to use LLMX naming - Rewrote README and documentation for LLMX branding - Updated config paths: ~/.codex/ → ~/.llmx/ - Added comprehensive LiteLLM setup guide - Updated all user-facing strings and help text - Created release plan and migration documentation ### Phase 6: Testing & Validation - Fixed all Rust tests for new naming scheme - Updated snapshot tests in TUI (36 frame files) - Fixed authentication storage tests - Updated Chat Completions payload and SSE tests - Fixed SDK tests for new API endpoints - Ensured compatibility with Claude Sonnet 4.5 model - Fixed test environment variables (LLMX_API_KEY, LLMX_BASE_URL) ### Phase 7: Build & Release Pipeline - Updated GitHub Actions workflows for LLMX binary names - Fixed rust-release.yml to reference llmx-rs/ instead of codex-rs/ - Updated CI/CD pipelines for new package names - Made Apple code signing optional in release workflow - Enhanced npm packaging resilience for partial platform builds - Added Windows sandbox support to workspace - Updated dotslash configuration for new binary names ### Phase 8: Final Polish - Renamed all assets (.github images, labels, templates) - Updated VSCode and DevContainer configurations - Fixed all clippy warnings and formatting issues - Applied cargo fmt and prettier formatting across codebase - Updated issue templates and pull request templates - Fixed all remaining UI text references ## Technical Details **Breaking Changes:** - Binary name changed from `codex` to `llmx` - Config directory changed from `~/.codex/` to `~/.llmx/` - Environment variables renamed (CODEX_* → LLMX_*) - npm package renamed to `@valknar/llmx` **New Features:** - Support for 100+ LLM providers via LiteLLM - Unified authentication with LLMX_API_KEY - Enhanced model provider detection and handling - Improved error handling and fallback mechanisms **Files Changed:** - 578 files modified across Rust, TypeScript, and documentation - 30+ Rust crates renamed and updated - Complete rebrand of UI, CLI, and documentation - All tests updated and passing **Dependencies:** - Updated Cargo.lock with new package names - Updated npm dependencies in llmx-cli - Enhanced OpenAPI models for LLMX backend This release establishes LLMX as a standalone project with comprehensive LiteLLM integration, maintaining full backward compatibility with existing functionality while opening support for a wide ecosystem of LLM providers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Sebastian Krüger <support@pivoine.art>
This commit is contained in:
15
llmx-rs/utils/cache/Cargo.toml
vendored
Normal file
15
llmx-rs/utils/cache/Cargo.toml
vendored
Normal file
@@ -0,0 +1,15 @@
|
||||
[package]
|
||||
name = "llmx-utils-cache"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
[dependencies]
|
||||
lru = { workspace = true }
|
||||
sha1 = { workspace = true }
|
||||
tokio = { workspace = true, features = ["sync", "rt"] }
|
||||
|
||||
[dev-dependencies]
|
||||
tokio = { workspace = true, features = ["macros", "rt", "rt-multi-thread"] }
|
||||
159
llmx-rs/utils/cache/src/lib.rs
vendored
Normal file
159
llmx-rs/utils/cache/src/lib.rs
vendored
Normal file
@@ -0,0 +1,159 @@
|
||||
use std::borrow::Borrow;
|
||||
use std::hash::Hash;
|
||||
use std::num::NonZeroUsize;
|
||||
|
||||
use lru::LruCache;
|
||||
use sha1::Digest;
|
||||
use sha1::Sha1;
|
||||
use tokio::sync::Mutex;
|
||||
use tokio::sync::MutexGuard;
|
||||
|
||||
/// A minimal LRU cache protected by a Tokio mutex.
|
||||
pub struct BlockingLruCache<K, V> {
|
||||
inner: Mutex<LruCache<K, V>>,
|
||||
}
|
||||
|
||||
impl<K, V> BlockingLruCache<K, V>
|
||||
where
|
||||
K: Eq + Hash,
|
||||
{
|
||||
/// Creates a cache with the provided non-zero capacity.
|
||||
#[must_use]
|
||||
pub fn new(capacity: NonZeroUsize) -> Self {
|
||||
Self {
|
||||
inner: Mutex::new(LruCache::new(capacity)),
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns a clone of the cached value for `key`, or computes and inserts it.
|
||||
pub fn get_or_insert_with(&self, key: K, value: impl FnOnce() -> V) -> V
|
||||
where
|
||||
V: Clone,
|
||||
{
|
||||
let mut guard = lock_blocking(&self.inner);
|
||||
if let Some(v) = guard.get(&key) {
|
||||
return v.clone();
|
||||
}
|
||||
let v = value();
|
||||
// Insert and return a clone to keep ownership in the cache.
|
||||
guard.put(key, v.clone());
|
||||
v
|
||||
}
|
||||
|
||||
/// Like `get_or_insert_with`, but the value factory may fail.
|
||||
pub fn get_or_try_insert_with<E>(
|
||||
&self,
|
||||
key: K,
|
||||
value: impl FnOnce() -> Result<V, E>,
|
||||
) -> Result<V, E>
|
||||
where
|
||||
V: Clone,
|
||||
{
|
||||
let mut guard = lock_blocking(&self.inner);
|
||||
if let Some(v) = guard.get(&key) {
|
||||
return Ok(v.clone());
|
||||
}
|
||||
let v = value()?;
|
||||
guard.put(key, v.clone());
|
||||
Ok(v)
|
||||
}
|
||||
|
||||
/// Builds a cache if `capacity` is non-zero, returning `None` otherwise.
|
||||
#[must_use]
|
||||
pub fn try_with_capacity(capacity: usize) -> Option<Self> {
|
||||
NonZeroUsize::new(capacity).map(Self::new)
|
||||
}
|
||||
|
||||
/// Returns a clone of the cached value corresponding to `key`, if present.
|
||||
pub fn get<Q>(&self, key: &Q) -> Option<V>
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Hash + Eq + ?Sized,
|
||||
V: Clone,
|
||||
{
|
||||
lock_blocking(&self.inner).get(key).cloned()
|
||||
}
|
||||
|
||||
/// Inserts `value` for `key`, returning the previous entry if it existed.
|
||||
pub fn insert(&self, key: K, value: V) -> Option<V> {
|
||||
lock_blocking(&self.inner).put(key, value)
|
||||
}
|
||||
|
||||
/// Removes the entry for `key` if it exists, returning it.
|
||||
pub fn remove<Q>(&self, key: &Q) -> Option<V>
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Hash + Eq + ?Sized,
|
||||
{
|
||||
lock_blocking(&self.inner).pop(key)
|
||||
}
|
||||
|
||||
/// Clears all entries from the cache.
|
||||
pub fn clear(&self) {
|
||||
lock_blocking(&self.inner).clear();
|
||||
}
|
||||
|
||||
/// Executes `callback` with a mutable reference to the underlying cache.
|
||||
pub fn with_mut<R>(&self, callback: impl FnOnce(&mut LruCache<K, V>) -> R) -> R {
|
||||
let mut guard = lock_blocking(&self.inner);
|
||||
callback(&mut guard)
|
||||
}
|
||||
|
||||
/// Provides direct access to the cache guard for advanced use cases.
|
||||
pub fn blocking_lock(&self) -> MutexGuard<'_, LruCache<K, V>> {
|
||||
lock_blocking(&self.inner)
|
||||
}
|
||||
}
|
||||
|
||||
fn lock_blocking<K, V>(m: &Mutex<LruCache<K, V>>) -> MutexGuard<'_, LruCache<K, V>>
|
||||
where
|
||||
K: Eq + Hash,
|
||||
{
|
||||
match tokio::runtime::Handle::try_current() {
|
||||
Ok(_) => tokio::task::block_in_place(|| m.blocking_lock()),
|
||||
Err(_) => m.blocking_lock(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Computes the SHA-1 digest of `bytes`.
|
||||
///
|
||||
/// Useful for content-based cache keys when you want to avoid staleness
|
||||
/// caused by path-only keys.
|
||||
#[must_use]
|
||||
pub fn sha1_digest(bytes: &[u8]) -> [u8; 20] {
|
||||
let mut hasher = Sha1::new();
|
||||
hasher.update(bytes);
|
||||
let result = hasher.finalize();
|
||||
let mut out = [0; 20];
|
||||
out.copy_from_slice(&result);
|
||||
out
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::BlockingLruCache;
|
||||
use std::num::NonZeroUsize;
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn stores_and_retrieves_values() {
|
||||
let cache = BlockingLruCache::new(NonZeroUsize::new(2).expect("capacity"));
|
||||
|
||||
assert!(cache.get(&"first").is_none());
|
||||
cache.insert("first", 1);
|
||||
assert_eq!(cache.get(&"first"), Some(1));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn evicts_least_recently_used() {
|
||||
let cache = BlockingLruCache::new(NonZeroUsize::new(2).expect("capacity"));
|
||||
cache.insert("a", 1);
|
||||
cache.insert("b", 2);
|
||||
assert_eq!(cache.get(&"a"), Some(1));
|
||||
|
||||
cache.insert("c", 3);
|
||||
|
||||
assert!(cache.get(&"b").is_none());
|
||||
assert_eq!(cache.get(&"a"), Some(1));
|
||||
assert_eq!(cache.get(&"c"), Some(3));
|
||||
}
|
||||
}
|
||||
26
llmx-rs/utils/git/Cargo.toml
Normal file
26
llmx-rs/utils/git/Cargo.toml
Normal file
@@ -0,0 +1,26 @@
|
||||
[package]
|
||||
name = "llmx-git"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
readme = "README.md"
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
[dependencies]
|
||||
once_cell = "1"
|
||||
regex = "1"
|
||||
schemars = { workspace = true }
|
||||
serde = { workspace = true, features = ["derive"] }
|
||||
tempfile = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
ts-rs = { workspace = true, features = [
|
||||
"uuid-impl",
|
||||
"serde-json-impl",
|
||||
"no-serde-warnings",
|
||||
] }
|
||||
walkdir = { workspace = true }
|
||||
|
||||
[dev-dependencies]
|
||||
assert_matches = { workspace = true }
|
||||
pretty_assertions = { workspace = true }
|
||||
33
llmx-rs/utils/git/README.md
Normal file
33
llmx-rs/utils/git/README.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# llmx-git
|
||||
|
||||
Helpers for interacting with git, including patch application and worktree
|
||||
snapshot utilities.
|
||||
|
||||
```rust,no_run
|
||||
use std::path::Path;
|
||||
|
||||
use llmx_git::{
|
||||
apply_git_patch, create_ghost_commit, restore_ghost_commit, ApplyGitRequest,
|
||||
CreateGhostCommitOptions,
|
||||
};
|
||||
|
||||
let repo = Path::new("/path/to/repo");
|
||||
|
||||
// Apply a patch (omitted here) to the repository.
|
||||
let request = ApplyGitRequest {
|
||||
cwd: repo.to_path_buf(),
|
||||
diff: String::from("...diff contents..."),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let result = apply_git_patch(&request)?;
|
||||
|
||||
// Capture the current working tree as an unreferenced commit.
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
|
||||
|
||||
// Later, undo back to that state.
|
||||
restore_ghost_commit(repo, &ghost)?;
|
||||
```
|
||||
|
||||
Pass a custom message with `.message("…")` or force-include ignored files with
|
||||
`.force_include(["ignored.log".into()])`.
|
||||
715
llmx-rs/utils/git/src/apply.rs
Normal file
715
llmx-rs/utils/git/src/apply.rs
Normal file
@@ -0,0 +1,715 @@
|
||||
//! Helpers for applying unified diffs using the system `git` binary.
|
||||
//!
|
||||
//! The entry point is [`apply_git_patch`], which writes a diff to a temporary
|
||||
//! file, shells out to `git apply` with the right flags, and then parses the
|
||||
//! command’s output into structured details. Callers can opt into dry-run
|
||||
//! mode via [`ApplyGitRequest::preflight`] and inspect the resulting paths to
|
||||
//! learn what would change before applying for real.
|
||||
|
||||
use once_cell::sync::Lazy;
|
||||
use regex::Regex;
|
||||
use std::ffi::OsStr;
|
||||
use std::io;
|
||||
use std::path::Path;
|
||||
use std::path::PathBuf;
|
||||
|
||||
/// Parameters for invoking [`apply_git_patch`].
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct ApplyGitRequest {
|
||||
pub cwd: PathBuf,
|
||||
pub diff: String,
|
||||
pub revert: bool,
|
||||
pub preflight: bool,
|
||||
}
|
||||
|
||||
/// Result of running [`apply_git_patch`], including paths gleaned from stdout/stderr.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct ApplyGitResult {
|
||||
pub exit_code: i32,
|
||||
pub applied_paths: Vec<String>,
|
||||
pub skipped_paths: Vec<String>,
|
||||
pub conflicted_paths: Vec<String>,
|
||||
pub stdout: String,
|
||||
pub stderr: String,
|
||||
pub cmd_for_log: String,
|
||||
}
|
||||
|
||||
/// Apply a unified diff to the target repository by shelling out to `git apply`.
|
||||
///
|
||||
/// When [`ApplyGitRequest::preflight`] is `true`, this behaves like `git apply --check` and
|
||||
/// leaves the working tree untouched while still parsing the command output for diagnostics.
|
||||
pub fn apply_git_patch(req: &ApplyGitRequest) -> io::Result<ApplyGitResult> {
|
||||
let git_root = resolve_git_root(&req.cwd)?;
|
||||
|
||||
// Write unified diff into a temporary file
|
||||
let (tmpdir, patch_path) = write_temp_patch(&req.diff)?;
|
||||
// Keep tmpdir alive until function end to ensure the file exists
|
||||
let _guard = tmpdir;
|
||||
|
||||
if req.revert && !req.preflight {
|
||||
// Stage WT paths first to avoid index mismatch on revert.
|
||||
stage_paths(&git_root, &req.diff)?;
|
||||
}
|
||||
|
||||
// Build git args
|
||||
let mut args: Vec<String> = vec!["apply".into(), "--3way".into()];
|
||||
if req.revert {
|
||||
args.push("-R".into());
|
||||
}
|
||||
|
||||
// Optional: additional git config via env knob (defaults OFF)
|
||||
let mut cfg_parts: Vec<String> = Vec::new();
|
||||
if let Ok(cfg) = std::env::var("LLMX_APPLY_GIT_CFG") {
|
||||
for pair in cfg.split(',') {
|
||||
let p = pair.trim();
|
||||
if p.is_empty() || !p.contains('=') {
|
||||
continue;
|
||||
}
|
||||
cfg_parts.push("-c".into());
|
||||
cfg_parts.push(p.to_string());
|
||||
}
|
||||
}
|
||||
|
||||
args.push(patch_path.to_string_lossy().to_string());
|
||||
|
||||
// Optional preflight: dry-run only; do not modify working tree
|
||||
if req.preflight {
|
||||
let mut check_args = vec!["apply".to_string(), "--check".to_string()];
|
||||
if req.revert {
|
||||
check_args.push("-R".to_string());
|
||||
}
|
||||
check_args.push(patch_path.to_string_lossy().to_string());
|
||||
let rendered = render_command_for_log(&git_root, &cfg_parts, &check_args);
|
||||
let (c_code, c_out, c_err) = run_git(&git_root, &cfg_parts, &check_args)?;
|
||||
let (mut applied_paths, mut skipped_paths, mut conflicted_paths) =
|
||||
parse_git_apply_output(&c_out, &c_err);
|
||||
applied_paths.sort();
|
||||
applied_paths.dedup();
|
||||
skipped_paths.sort();
|
||||
skipped_paths.dedup();
|
||||
conflicted_paths.sort();
|
||||
conflicted_paths.dedup();
|
||||
return Ok(ApplyGitResult {
|
||||
exit_code: c_code,
|
||||
applied_paths,
|
||||
skipped_paths,
|
||||
conflicted_paths,
|
||||
stdout: c_out,
|
||||
stderr: c_err,
|
||||
cmd_for_log: rendered,
|
||||
});
|
||||
}
|
||||
|
||||
let cmd_for_log = render_command_for_log(&git_root, &cfg_parts, &args);
|
||||
let (code, stdout, stderr) = run_git(&git_root, &cfg_parts, &args)?;
|
||||
|
||||
let (mut applied_paths, mut skipped_paths, mut conflicted_paths) =
|
||||
parse_git_apply_output(&stdout, &stderr);
|
||||
applied_paths.sort();
|
||||
applied_paths.dedup();
|
||||
skipped_paths.sort();
|
||||
skipped_paths.dedup();
|
||||
conflicted_paths.sort();
|
||||
conflicted_paths.dedup();
|
||||
|
||||
Ok(ApplyGitResult {
|
||||
exit_code: code,
|
||||
applied_paths,
|
||||
skipped_paths,
|
||||
conflicted_paths,
|
||||
stdout,
|
||||
stderr,
|
||||
cmd_for_log,
|
||||
})
|
||||
}
|
||||
|
||||
fn resolve_git_root(cwd: &Path) -> io::Result<PathBuf> {
|
||||
let out = std::process::Command::new("git")
|
||||
.arg("rev-parse")
|
||||
.arg("--show-toplevel")
|
||||
.current_dir(cwd)
|
||||
.output()?;
|
||||
let code = out.status.code().unwrap_or(-1);
|
||||
if code != 0 {
|
||||
return Err(io::Error::other(format!(
|
||||
"not a git repository (exit {}): {}",
|
||||
code,
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
)));
|
||||
}
|
||||
let root = String::from_utf8_lossy(&out.stdout).trim().to_string();
|
||||
Ok(PathBuf::from(root))
|
||||
}
|
||||
|
||||
fn write_temp_patch(diff: &str) -> io::Result<(tempfile::TempDir, PathBuf)> {
|
||||
let dir = tempfile::tempdir()?;
|
||||
let path = dir.path().join("patch.diff");
|
||||
std::fs::write(&path, diff)?;
|
||||
Ok((dir, path))
|
||||
}
|
||||
|
||||
fn run_git(cwd: &Path, git_cfg: &[String], args: &[String]) -> io::Result<(i32, String, String)> {
|
||||
let mut cmd = std::process::Command::new("git");
|
||||
for p in git_cfg {
|
||||
cmd.arg(p);
|
||||
}
|
||||
for a in args {
|
||||
cmd.arg(a);
|
||||
}
|
||||
let out = cmd.current_dir(cwd).output()?;
|
||||
let code = out.status.code().unwrap_or(-1);
|
||||
let stdout = String::from_utf8_lossy(&out.stdout).into_owned();
|
||||
let stderr = String::from_utf8_lossy(&out.stderr).into_owned();
|
||||
Ok((code, stdout, stderr))
|
||||
}
|
||||
|
||||
fn quote_shell(s: &str) -> String {
|
||||
let simple = s
|
||||
.chars()
|
||||
.all(|c| c.is_ascii_alphanumeric() || "-_.:/@%+".contains(c));
|
||||
if simple {
|
||||
s.to_string()
|
||||
} else {
|
||||
format!("'{}'", s.replace('\'', "'\\''"))
|
||||
}
|
||||
}
|
||||
|
||||
fn render_command_for_log(cwd: &Path, git_cfg: &[String], args: &[String]) -> String {
|
||||
let mut parts: Vec<String> = Vec::new();
|
||||
parts.push("git".to_string());
|
||||
for a in git_cfg {
|
||||
parts.push(quote_shell(a));
|
||||
}
|
||||
for a in args {
|
||||
parts.push(quote_shell(a));
|
||||
}
|
||||
format!(
|
||||
"(cd {} && {})",
|
||||
quote_shell(&cwd.display().to_string()),
|
||||
parts.join(" ")
|
||||
)
|
||||
}
|
||||
|
||||
/// Collect every path referenced by the diff headers inside `diff --git` sections.
|
||||
pub fn extract_paths_from_patch(diff_text: &str) -> Vec<String> {
|
||||
static RE: Lazy<Regex> = Lazy::new(|| {
|
||||
Regex::new(r"(?m)^diff --git a/(.*?) b/(.*)$")
|
||||
.unwrap_or_else(|e| panic!("invalid regex: {e}"))
|
||||
});
|
||||
let mut set = std::collections::BTreeSet::new();
|
||||
for caps in RE.captures_iter(diff_text) {
|
||||
if let Some(a) = caps.get(1).map(|m| m.as_str())
|
||||
&& a != "/dev/null"
|
||||
&& !a.trim().is_empty()
|
||||
{
|
||||
set.insert(a.to_string());
|
||||
}
|
||||
if let Some(b) = caps.get(2).map(|m| m.as_str())
|
||||
&& b != "/dev/null"
|
||||
&& !b.trim().is_empty()
|
||||
{
|
||||
set.insert(b.to_string());
|
||||
}
|
||||
}
|
||||
set.into_iter().collect()
|
||||
}
|
||||
|
||||
/// Stage only the files that actually exist on disk for the given diff.
|
||||
pub fn stage_paths(git_root: &Path, diff: &str) -> io::Result<()> {
|
||||
let paths = extract_paths_from_patch(diff);
|
||||
let mut existing: Vec<String> = Vec::new();
|
||||
for p in paths {
|
||||
let joined = git_root.join(&p);
|
||||
if std::fs::symlink_metadata(&joined).is_ok() {
|
||||
existing.push(p);
|
||||
}
|
||||
}
|
||||
if existing.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
let mut cmd = std::process::Command::new("git");
|
||||
cmd.arg("add");
|
||||
cmd.arg("--");
|
||||
for p in &existing {
|
||||
cmd.arg(OsStr::new(p));
|
||||
}
|
||||
let out = cmd.current_dir(git_root).output()?;
|
||||
let _code = out.status.code().unwrap_or(-1);
|
||||
// We do not hard fail staging; best-effort is OK. Return Ok even on non-zero.
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ============ Parser ported from VS Code (TS) ============
|
||||
|
||||
/// Parse `git apply` output into applied/skipped/conflicted path groupings.
|
||||
pub fn parse_git_apply_output(
|
||||
stdout: &str,
|
||||
stderr: &str,
|
||||
) -> (Vec<String>, Vec<String>, Vec<String>) {
|
||||
let combined = [stdout, stderr]
|
||||
.iter()
|
||||
.filter(|s| !s.is_empty())
|
||||
.cloned()
|
||||
.collect::<Vec<&str>>()
|
||||
.join("\n");
|
||||
|
||||
let mut applied = std::collections::BTreeSet::new();
|
||||
let mut skipped = std::collections::BTreeSet::new();
|
||||
let mut conflicted = std::collections::BTreeSet::new();
|
||||
let mut last_seen_path: Option<String> = None;
|
||||
|
||||
fn add(set: &mut std::collections::BTreeSet<String>, raw: &str) {
|
||||
let trimmed = raw.trim();
|
||||
if trimmed.is_empty() {
|
||||
return;
|
||||
}
|
||||
let first = trimmed.chars().next().unwrap_or('\0');
|
||||
let last = trimmed.chars().last().unwrap_or('\0');
|
||||
let unquoted = if (first == '"' || first == '\'') && last == first && trimmed.len() >= 2 {
|
||||
&trimmed[1..trimmed.len() - 1]
|
||||
} else {
|
||||
trimmed
|
||||
};
|
||||
if !unquoted.is_empty() {
|
||||
set.insert(unquoted.to_string());
|
||||
}
|
||||
}
|
||||
|
||||
static APPLIED_CLEAN: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Applied patch(?: to)?\\s+(?P<path>.+?)\\s+cleanly\\.?$"));
|
||||
static APPLIED_CONFLICTS: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Applied patch(?: to)?\\s+(?P<path>.+?)\\s+with conflicts\\.?$"));
|
||||
static APPLYING_WITH_REJECTS: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci("^Applying patch\\s+(?P<path>.+?)\\s+with\\s+\\d+\\s+rejects?\\.{0,3}$")
|
||||
});
|
||||
static CHECKING_PATCH: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Checking patch\\s+(?P<path>.+?)\\.\\.\\.$"));
|
||||
static UNMERGED_LINE: Lazy<Regex> = Lazy::new(|| regex_ci("^U\\s+(?P<path>.+)$"));
|
||||
static PATCH_FAILED: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+patch failed:\\s+(?P<path>.+?)(?::\\d+)?(?:\\s|$)"));
|
||||
static DOES_NOT_APPLY: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+patch does not apply$"));
|
||||
static THREE_WAY_START: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci("^(?:Performing three-way merge|Falling back to three-way merge)\\.\\.\\.$")
|
||||
});
|
||||
static THREE_WAY_FAILED: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Failed to perform three-way merge\\.\\.\\.$"));
|
||||
static FALLBACK_DIRECT: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Falling back to direct application\\.\\.\\.$"));
|
||||
static LACKS_BLOB: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci(
|
||||
"^(?:error: )?repository lacks the necessary blob to (?:perform|fall back on) 3-?way merge\\.?$",
|
||||
)
|
||||
});
|
||||
static INDEX_MISMATCH: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+does not match index\\b"));
|
||||
static NOT_IN_INDEX: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+does not exist in index\\b"));
|
||||
static ALREADY_EXISTS_WT: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci("^error:\\s+(?P<path>.+?)\\s+already exists in (?:the )?working directory\\b")
|
||||
});
|
||||
static FILE_EXISTS: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+patch failed:\\s+(?P<path>.+?)\\s+File exists"));
|
||||
static RENAMED_DELETED: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^error:\\s+path\\s+(?P<path>.+?)\\s+has been renamed\\/deleted"));
|
||||
static CANNOT_APPLY_BINARY: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci(
|
||||
"^error:\\s+cannot apply binary patch to\\s+['\\\"]?(?P<path>.+?)['\\\"]?\\s+without full index line$",
|
||||
)
|
||||
});
|
||||
static BINARY_DOES_NOT_APPLY: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci("^error:\\s+binary patch does not apply to\\s+['\\\"]?(?P<path>.+?)['\\\"]?$")
|
||||
});
|
||||
static BINARY_INCORRECT_RESULT: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci(
|
||||
"^error:\\s+binary patch to\\s+['\\\"]?(?P<path>.+?)['\\\"]?\\s+creates incorrect result\\b",
|
||||
)
|
||||
});
|
||||
static CANNOT_READ_CURRENT: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci("^error:\\s+cannot read the current contents of\\s+['\\\"]?(?P<path>.+?)['\\\"]?$")
|
||||
});
|
||||
static SKIPPED_PATCH: Lazy<Regex> =
|
||||
Lazy::new(|| regex_ci("^Skipped patch\\s+['\\\"]?(?P<path>.+?)['\\\"]\\.$"));
|
||||
static CANNOT_MERGE_BINARY_WARN: Lazy<Regex> = Lazy::new(|| {
|
||||
regex_ci(
|
||||
"^warning:\\s*Cannot merge binary files:\\s+(?P<path>.+?)\\s+\\(ours\\s+vs\\.\\s+theirs\\)",
|
||||
)
|
||||
});
|
||||
|
||||
for raw_line in combined.lines() {
|
||||
let line = raw_line.trim();
|
||||
if line.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// === "Checking patch <path>..." tracking ===
|
||||
if let Some(c) = CHECKING_PATCH.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
last_seen_path = Some(m.as_str().to_string());
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === Status lines ===
|
||||
if let Some(c) = APPLIED_CLEAN.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut applied, m.as_str());
|
||||
let p = applied.iter().next_back().cloned();
|
||||
if let Some(p) = p {
|
||||
conflicted.remove(&p);
|
||||
skipped.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if let Some(c) = APPLIED_CONFLICTS.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut conflicted, m.as_str());
|
||||
let p = conflicted.iter().next_back().cloned();
|
||||
if let Some(p) = p {
|
||||
applied.remove(&p);
|
||||
skipped.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if let Some(c) = APPLYING_WITH_REJECTS.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut conflicted, m.as_str());
|
||||
let p = conflicted.iter().next_back().cloned();
|
||||
if let Some(p) = p {
|
||||
applied.remove(&p);
|
||||
skipped.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === “U <path>” after conflicts ===
|
||||
if let Some(c) = UNMERGED_LINE.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut conflicted, m.as_str());
|
||||
let p = conflicted.iter().next_back().cloned();
|
||||
if let Some(p) = p {
|
||||
applied.remove(&p);
|
||||
skipped.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === Early hints ===
|
||||
if PATCH_FAILED.is_match(line) || DOES_NOT_APPLY.is_match(line) {
|
||||
if let Some(c) = PATCH_FAILED
|
||||
.captures(line)
|
||||
.or_else(|| DOES_NOT_APPLY.captures(line))
|
||||
&& let Some(m) = c.name("path")
|
||||
{
|
||||
add(&mut skipped, m.as_str());
|
||||
last_seen_path = Some(m.as_str().to_string());
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === Ignore narration ===
|
||||
if THREE_WAY_START.is_match(line) || FALLBACK_DIRECT.is_match(line) {
|
||||
continue;
|
||||
}
|
||||
|
||||
// === 3-way failed entirely; attribute to last_seen_path ===
|
||||
if THREE_WAY_FAILED.is_match(line) || LACKS_BLOB.is_match(line) {
|
||||
if let Some(p) = last_seen_path.clone() {
|
||||
add(&mut skipped, &p);
|
||||
applied.remove(&p);
|
||||
conflicted.remove(&p);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === Skips / I/O problems ===
|
||||
if let Some(c) = INDEX_MISMATCH
|
||||
.captures(line)
|
||||
.or_else(|| NOT_IN_INDEX.captures(line))
|
||||
.or_else(|| ALREADY_EXISTS_WT.captures(line))
|
||||
.or_else(|| FILE_EXISTS.captures(line))
|
||||
.or_else(|| RENAMED_DELETED.captures(line))
|
||||
.or_else(|| CANNOT_APPLY_BINARY.captures(line))
|
||||
.or_else(|| BINARY_DOES_NOT_APPLY.captures(line))
|
||||
.or_else(|| BINARY_INCORRECT_RESULT.captures(line))
|
||||
.or_else(|| CANNOT_READ_CURRENT.captures(line))
|
||||
.or_else(|| SKIPPED_PATCH.captures(line))
|
||||
{
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut skipped, m.as_str());
|
||||
let p_now = skipped.iter().next_back().cloned();
|
||||
if let Some(p) = p_now {
|
||||
applied.remove(&p);
|
||||
conflicted.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// === Warnings that imply conflicts ===
|
||||
if let Some(c) = CANNOT_MERGE_BINARY_WARN.captures(line) {
|
||||
if let Some(m) = c.name("path") {
|
||||
add(&mut conflicted, m.as_str());
|
||||
let p = conflicted.iter().next_back().cloned();
|
||||
if let Some(p) = p {
|
||||
applied.remove(&p);
|
||||
skipped.remove(&p);
|
||||
last_seen_path = Some(p);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
// Final precedence: conflicts > applied > skipped
|
||||
for p in conflicted.iter() {
|
||||
applied.remove(p);
|
||||
skipped.remove(p);
|
||||
}
|
||||
for p in applied.iter() {
|
||||
skipped.remove(p);
|
||||
}
|
||||
|
||||
(
|
||||
applied.into_iter().collect(),
|
||||
skipped.into_iter().collect(),
|
||||
conflicted.into_iter().collect(),
|
||||
)
|
||||
}
|
||||
|
||||
fn regex_ci(pat: &str) -> Regex {
|
||||
Regex::new(&format!("(?i){pat}")).unwrap_or_else(|e| panic!("invalid regex: {e}"))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::path::Path;
|
||||
use std::sync::Mutex;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
fn env_lock() -> &'static Mutex<()> {
|
||||
static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
|
||||
LOCK.get_or_init(|| Mutex::new(()))
|
||||
}
|
||||
|
||||
fn run(cwd: &Path, args: &[&str]) -> (i32, String, String) {
|
||||
let out = std::process::Command::new(args[0])
|
||||
.args(&args[1..])
|
||||
.current_dir(cwd)
|
||||
.output()
|
||||
.expect("spawn ok");
|
||||
(
|
||||
out.status.code().unwrap_or(-1),
|
||||
String::from_utf8_lossy(&out.stdout).into_owned(),
|
||||
String::from_utf8_lossy(&out.stderr).into_owned(),
|
||||
)
|
||||
}
|
||||
|
||||
fn init_repo() -> tempfile::TempDir {
|
||||
let dir = tempfile::tempdir().expect("tempdir");
|
||||
let root = dir.path();
|
||||
// git init and minimal identity
|
||||
let _ = run(root, &["git", "init"]);
|
||||
let _ = run(root, &["git", "config", "user.email", "llmx@example.com"]);
|
||||
let _ = run(root, &["git", "config", "user.name", "Llmx"]);
|
||||
dir
|
||||
}
|
||||
|
||||
fn read_file_normalized(path: &Path) -> String {
|
||||
std::fs::read_to_string(path)
|
||||
.expect("read file")
|
||||
.replace("\r\n", "\n")
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn apply_add_success() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
|
||||
let diff = "diff --git a/hello.txt b/hello.txt\nnew file mode 100644\n--- /dev/null\n+++ b/hello.txt\n@@ -0,0 +1,2 @@\n+hello\n+world\n";
|
||||
let req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let r = apply_git_patch(&req).expect("run apply");
|
||||
assert_eq!(r.exit_code, 0, "exit code 0");
|
||||
// File exists now
|
||||
assert!(root.join("hello.txt").exists());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn apply_modify_conflict() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
// seed file and commit
|
||||
std::fs::write(root.join("file.txt"), "line1\nline2\nline3\n").unwrap();
|
||||
let _ = run(root, &["git", "add", "file.txt"]);
|
||||
let _ = run(root, &["git", "commit", "-m", "seed"]);
|
||||
// local edit (unstaged)
|
||||
std::fs::write(root.join("file.txt"), "line1\nlocal2\nline3\n").unwrap();
|
||||
// patch wants to change the same line differently
|
||||
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,3 +1,3 @@\n line1\n-line2\n+remote2\n line3\n";
|
||||
let req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let r = apply_git_patch(&req).expect("run apply");
|
||||
assert_ne!(r.exit_code, 0, "non-zero exit on conflict");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn apply_modify_skipped_missing_index() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
// Try to modify a file that is not in the index
|
||||
let diff = "diff --git a/ghost.txt b/ghost.txt\n--- a/ghost.txt\n+++ b/ghost.txt\n@@ -1,1 +1,1 @@\n-old\n+new\n";
|
||||
let req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let r = apply_git_patch(&req).expect("run apply");
|
||||
assert_ne!(r.exit_code, 0, "non-zero exit on missing index");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn apply_then_revert_success() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
// Seed file and commit original content
|
||||
std::fs::write(root.join("file.txt"), "orig\n").unwrap();
|
||||
let _ = run(root, &["git", "add", "file.txt"]);
|
||||
let _ = run(root, &["git", "commit", "-m", "seed"]);
|
||||
|
||||
// Forward patch: orig -> ORIG
|
||||
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,1 +1,1 @@\n-orig\n+ORIG\n";
|
||||
let apply_req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let res_apply = apply_git_patch(&apply_req).expect("apply ok");
|
||||
assert_eq!(res_apply.exit_code, 0, "forward apply succeeded");
|
||||
let after_apply = read_file_normalized(&root.join("file.txt"));
|
||||
assert_eq!(after_apply, "ORIG\n");
|
||||
|
||||
// Revert patch: ORIG -> orig (stage paths first; engine handles it)
|
||||
let revert_req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: true,
|
||||
preflight: false,
|
||||
};
|
||||
let res_revert = apply_git_patch(&revert_req).expect("revert ok");
|
||||
assert_eq!(res_revert.exit_code, 0, "revert apply succeeded");
|
||||
let after_revert = read_file_normalized(&root.join("file.txt"));
|
||||
assert_eq!(after_revert, "orig\n");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn revert_preflight_does_not_stage_index() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
// Seed repo and apply forward patch so the working tree reflects the change.
|
||||
std::fs::write(root.join("file.txt"), "orig\n").unwrap();
|
||||
let _ = run(root, &["git", "add", "file.txt"]);
|
||||
let _ = run(root, &["git", "commit", "-m", "seed"]);
|
||||
|
||||
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,1 +1,1 @@\n-orig\n+ORIG\n";
|
||||
let apply_req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let res_apply = apply_git_patch(&apply_req).expect("apply ok");
|
||||
assert_eq!(res_apply.exit_code, 0, "forward apply succeeded");
|
||||
let (commit_code, _, commit_err) = run(root, &["git", "commit", "-am", "apply change"]);
|
||||
assert_eq!(commit_code, 0, "commit applied change: {commit_err}");
|
||||
|
||||
let (_code_before, staged_before, _stderr_before) =
|
||||
run(root, &["git", "diff", "--cached", "--name-only"]);
|
||||
|
||||
let preflight_req = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: true,
|
||||
preflight: true,
|
||||
};
|
||||
let res_preflight = apply_git_patch(&preflight_req).expect("preflight ok");
|
||||
assert_eq!(res_preflight.exit_code, 0, "revert preflight succeeded");
|
||||
let (_code_after, staged_after, _stderr_after) =
|
||||
run(root, &["git", "diff", "--cached", "--name-only"]);
|
||||
assert_eq!(
|
||||
staged_after.trim(),
|
||||
staged_before.trim(),
|
||||
"preflight should not stage new paths",
|
||||
);
|
||||
|
||||
let after_preflight = read_file_normalized(&root.join("file.txt"));
|
||||
assert_eq!(after_preflight, "ORIG\n");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn preflight_blocks_partial_changes() {
|
||||
let _g = env_lock().lock().unwrap();
|
||||
let repo = init_repo();
|
||||
let root = repo.path();
|
||||
// Build a multi-file diff: one valid add (ok.txt) and one invalid modify (ghost.txt)
|
||||
let diff = "diff --git a/ok.txt b/ok.txt\nnew file mode 100644\n--- /dev/null\n+++ b/ok.txt\n@@ -0,0 +1,2 @@\n+alpha\n+beta\n\n\
|
||||
diff --git a/ghost.txt b/ghost.txt\n--- a/ghost.txt\n+++ b/ghost.txt\n@@ -1,1 +1,1 @@\n-old\n+new\n";
|
||||
|
||||
// 1) With preflight enabled, nothing should be changed (even though ok.txt could be added)
|
||||
let req1 = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: true,
|
||||
};
|
||||
let r1 = apply_git_patch(&req1).expect("preflight apply");
|
||||
assert_ne!(r1.exit_code, 0, "preflight reports failure");
|
||||
assert!(
|
||||
!root.join("ok.txt").exists(),
|
||||
"preflight must prevent adding ok.txt"
|
||||
);
|
||||
assert!(
|
||||
r1.cmd_for_log.contains("--check"),
|
||||
"preflight path recorded --check"
|
||||
);
|
||||
|
||||
// 2) Without preflight, we should see no --check in the executed command
|
||||
let req2 = ApplyGitRequest {
|
||||
cwd: root.to_path_buf(),
|
||||
diff: diff.to_string(),
|
||||
revert: false,
|
||||
preflight: false,
|
||||
};
|
||||
let r2 = apply_git_patch(&req2).expect("direct apply");
|
||||
assert_ne!(r2.exit_code, 0, "apply is expected to fail overall");
|
||||
assert!(
|
||||
!r2.cmd_for_log.contains("--check"),
|
||||
"non-preflight path should not use --check"
|
||||
);
|
||||
}
|
||||
}
|
||||
35
llmx-rs/utils/git/src/errors.rs
Normal file
35
llmx-rs/utils/git/src/errors.rs
Normal file
@@ -0,0 +1,35 @@
|
||||
use std::path::PathBuf;
|
||||
use std::process::ExitStatus;
|
||||
use std::string::FromUtf8Error;
|
||||
|
||||
use thiserror::Error;
|
||||
use walkdir::Error as WalkdirError;
|
||||
|
||||
/// Errors returned while managing git worktree snapshots.
|
||||
#[derive(Debug, Error)]
|
||||
pub enum GitToolingError {
|
||||
#[error("git command `{command}` failed with status {status}: {stderr}")]
|
||||
GitCommand {
|
||||
command: String,
|
||||
status: ExitStatus,
|
||||
stderr: String,
|
||||
},
|
||||
#[error("git command `{command}` produced non-UTF-8 output")]
|
||||
GitOutputUtf8 {
|
||||
command: String,
|
||||
#[source]
|
||||
source: FromUtf8Error,
|
||||
},
|
||||
#[error("{path:?} is not a git repository")]
|
||||
NotAGitRepository { path: PathBuf },
|
||||
#[error("path {path:?} must be relative to the repository root")]
|
||||
NonRelativePath { path: PathBuf },
|
||||
#[error("path {path:?} escapes the repository root")]
|
||||
PathEscapesRepository { path: PathBuf },
|
||||
#[error("failed to process path inside worktree")]
|
||||
PathPrefix(#[from] std::path::StripPrefixError),
|
||||
#[error(transparent)]
|
||||
Walkdir(#[from] WalkdirError),
|
||||
#[error(transparent)]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
709
llmx-rs/utils/git/src/ghost_commits.rs
Normal file
709
llmx-rs/utils/git/src/ghost_commits.rs
Normal file
@@ -0,0 +1,709 @@
|
||||
use std::collections::HashSet;
|
||||
use std::ffi::OsString;
|
||||
use std::fs;
|
||||
use std::io;
|
||||
use std::path::Path;
|
||||
use std::path::PathBuf;
|
||||
|
||||
use tempfile::Builder;
|
||||
|
||||
use crate::GhostCommit;
|
||||
use crate::GitToolingError;
|
||||
use crate::operations::apply_repo_prefix_to_force_include;
|
||||
use crate::operations::ensure_git_repository;
|
||||
use crate::operations::normalize_relative_path;
|
||||
use crate::operations::repo_subdir;
|
||||
use crate::operations::resolve_head;
|
||||
use crate::operations::resolve_repository_root;
|
||||
use crate::operations::run_git_for_status;
|
||||
use crate::operations::run_git_for_stdout;
|
||||
use crate::operations::run_git_for_stdout_all;
|
||||
|
||||
/// Default commit message used for ghost commits when none is provided.
|
||||
const DEFAULT_COMMIT_MESSAGE: &str = "llmx snapshot";
|
||||
|
||||
/// Options to control ghost commit creation.
|
||||
pub struct CreateGhostCommitOptions<'a> {
|
||||
pub repo_path: &'a Path,
|
||||
pub message: Option<&'a str>,
|
||||
pub force_include: Vec<PathBuf>,
|
||||
}
|
||||
|
||||
impl<'a> CreateGhostCommitOptions<'a> {
|
||||
/// Creates options scoped to the provided repository path.
|
||||
pub fn new(repo_path: &'a Path) -> Self {
|
||||
Self {
|
||||
repo_path,
|
||||
message: None,
|
||||
force_include: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Sets a custom commit message for the ghost commit.
|
||||
pub fn message(mut self, message: &'a str) -> Self {
|
||||
self.message = Some(message);
|
||||
self
|
||||
}
|
||||
|
||||
/// Supplies the entire force-include path list at once.
|
||||
pub fn force_include<I>(mut self, paths: I) -> Self
|
||||
where
|
||||
I: IntoIterator<Item = PathBuf>,
|
||||
{
|
||||
self.force_include = paths.into_iter().collect();
|
||||
self
|
||||
}
|
||||
|
||||
/// Adds a single path to the force-include list.
|
||||
pub fn push_force_include<P>(mut self, path: P) -> Self
|
||||
where
|
||||
P: Into<PathBuf>,
|
||||
{
|
||||
self.force_include.push(path.into());
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a ghost commit capturing the current state of the repository's working tree.
|
||||
pub fn create_ghost_commit(
|
||||
options: &CreateGhostCommitOptions<'_>,
|
||||
) -> Result<GhostCommit, GitToolingError> {
|
||||
ensure_git_repository(options.repo_path)?;
|
||||
|
||||
let repo_root = resolve_repository_root(options.repo_path)?;
|
||||
let repo_prefix = repo_subdir(repo_root.as_path(), options.repo_path);
|
||||
let parent = resolve_head(repo_root.as_path())?;
|
||||
let existing_untracked =
|
||||
capture_existing_untracked(repo_root.as_path(), repo_prefix.as_deref())?;
|
||||
|
||||
let normalized_force = options
|
||||
.force_include
|
||||
.iter()
|
||||
.map(|path| normalize_relative_path(path))
|
||||
.collect::<Result<Vec<_>, _>>()?;
|
||||
let force_include =
|
||||
apply_repo_prefix_to_force_include(repo_prefix.as_deref(), &normalized_force);
|
||||
let index_tempdir = Builder::new().prefix("llmx-git-index-").tempdir()?;
|
||||
let index_path = index_tempdir.path().join("index");
|
||||
let base_env = vec![(
|
||||
OsString::from("GIT_INDEX_FILE"),
|
||||
OsString::from(index_path.as_os_str()),
|
||||
)];
|
||||
|
||||
// Pre-populate the temporary index with HEAD so unchanged tracked files
|
||||
// are included in the snapshot tree.
|
||||
if let Some(parent_sha) = parent.as_deref() {
|
||||
run_git_for_status(
|
||||
repo_root.as_path(),
|
||||
vec![OsString::from("read-tree"), OsString::from(parent_sha)],
|
||||
Some(base_env.as_slice()),
|
||||
)?;
|
||||
}
|
||||
|
||||
let mut add_args = vec![OsString::from("add"), OsString::from("--all")];
|
||||
if let Some(prefix) = repo_prefix.as_deref() {
|
||||
add_args.extend([OsString::from("--"), prefix.as_os_str().to_os_string()]);
|
||||
}
|
||||
|
||||
run_git_for_status(repo_root.as_path(), add_args, Some(base_env.as_slice()))?;
|
||||
if !force_include.is_empty() {
|
||||
let mut args = Vec::with_capacity(force_include.len() + 2);
|
||||
args.push(OsString::from("add"));
|
||||
args.push(OsString::from("--force"));
|
||||
args.extend(
|
||||
force_include
|
||||
.iter()
|
||||
.map(|path| OsString::from(path.as_os_str())),
|
||||
);
|
||||
run_git_for_status(repo_root.as_path(), args, Some(base_env.as_slice()))?;
|
||||
}
|
||||
|
||||
let tree_id = run_git_for_stdout(
|
||||
repo_root.as_path(),
|
||||
vec![OsString::from("write-tree")],
|
||||
Some(base_env.as_slice()),
|
||||
)?;
|
||||
|
||||
let mut commit_env = base_env;
|
||||
commit_env.extend(default_commit_identity());
|
||||
let message = options.message.unwrap_or(DEFAULT_COMMIT_MESSAGE);
|
||||
let commit_args = {
|
||||
let mut result = vec![OsString::from("commit-tree"), OsString::from(&tree_id)];
|
||||
if let Some(parent) = parent.as_deref() {
|
||||
result.extend([OsString::from("-p"), OsString::from(parent)]);
|
||||
}
|
||||
result.extend([OsString::from("-m"), OsString::from(message)]);
|
||||
result
|
||||
};
|
||||
|
||||
// Retrieve commit ID.
|
||||
let commit_id = run_git_for_stdout(
|
||||
repo_root.as_path(),
|
||||
commit_args,
|
||||
Some(commit_env.as_slice()),
|
||||
)?;
|
||||
|
||||
Ok(GhostCommit::new(
|
||||
commit_id,
|
||||
parent,
|
||||
existing_untracked.files,
|
||||
existing_untracked.dirs,
|
||||
))
|
||||
}
|
||||
|
||||
/// Restore the working tree to match the provided ghost commit.
|
||||
pub fn restore_ghost_commit(repo_path: &Path, commit: &GhostCommit) -> Result<(), GitToolingError> {
|
||||
ensure_git_repository(repo_path)?;
|
||||
|
||||
let repo_root = resolve_repository_root(repo_path)?;
|
||||
let repo_prefix = repo_subdir(repo_root.as_path(), repo_path);
|
||||
let current_untracked =
|
||||
capture_existing_untracked(repo_root.as_path(), repo_prefix.as_deref())?;
|
||||
restore_to_commit_inner(repo_root.as_path(), repo_prefix.as_deref(), commit.id())?;
|
||||
remove_new_untracked(
|
||||
repo_root.as_path(),
|
||||
commit.preexisting_untracked_files(),
|
||||
commit.preexisting_untracked_dirs(),
|
||||
current_untracked,
|
||||
)
|
||||
}
|
||||
|
||||
/// Restore the working tree to match the given commit ID.
|
||||
pub fn restore_to_commit(repo_path: &Path, commit_id: &str) -> Result<(), GitToolingError> {
|
||||
ensure_git_repository(repo_path)?;
|
||||
|
||||
let repo_root = resolve_repository_root(repo_path)?;
|
||||
let repo_prefix = repo_subdir(repo_root.as_path(), repo_path);
|
||||
restore_to_commit_inner(repo_root.as_path(), repo_prefix.as_deref(), commit_id)
|
||||
}
|
||||
|
||||
/// Restores the working tree and index to the given commit using `git restore`.
|
||||
/// The repository root and optional repository-relative prefix limit the restore scope.
|
||||
fn restore_to_commit_inner(
|
||||
repo_root: &Path,
|
||||
repo_prefix: Option<&Path>,
|
||||
commit_id: &str,
|
||||
) -> Result<(), GitToolingError> {
|
||||
let mut restore_args = vec![
|
||||
OsString::from("restore"),
|
||||
OsString::from("--source"),
|
||||
OsString::from(commit_id),
|
||||
OsString::from("--worktree"),
|
||||
OsString::from("--staged"),
|
||||
OsString::from("--"),
|
||||
];
|
||||
if let Some(prefix) = repo_prefix {
|
||||
restore_args.push(prefix.as_os_str().to_os_string());
|
||||
} else {
|
||||
restore_args.push(OsString::from("."));
|
||||
}
|
||||
|
||||
run_git_for_status(repo_root, restore_args, None)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
struct UntrackedSnapshot {
|
||||
files: Vec<PathBuf>,
|
||||
dirs: Vec<PathBuf>,
|
||||
}
|
||||
|
||||
/// Captures the untracked and ignored entries under `repo_root`, optionally limited by `repo_prefix`.
|
||||
/// Returns the result as an `UntrackedSnapshot`.
|
||||
fn capture_existing_untracked(
|
||||
repo_root: &Path,
|
||||
repo_prefix: Option<&Path>,
|
||||
) -> Result<UntrackedSnapshot, GitToolingError> {
|
||||
// Ask git for the zero-delimited porcelain status so we can enumerate
|
||||
// every untracked or ignored path (including ones filtered by prefix).
|
||||
let mut args = vec![
|
||||
OsString::from("status"),
|
||||
OsString::from("--porcelain=2"),
|
||||
OsString::from("-z"),
|
||||
OsString::from("--ignored=matching"),
|
||||
OsString::from("--untracked-files=all"),
|
||||
];
|
||||
if let Some(prefix) = repo_prefix {
|
||||
args.push(OsString::from("--"));
|
||||
args.push(prefix.as_os_str().to_os_string());
|
||||
}
|
||||
|
||||
let output = run_git_for_stdout_all(repo_root, args, None)?;
|
||||
if output.is_empty() {
|
||||
return Ok(UntrackedSnapshot::default());
|
||||
}
|
||||
|
||||
let mut snapshot = UntrackedSnapshot::default();
|
||||
// Each entry is of the form "<code> <path>" where code is '?' (untracked)
|
||||
// or '!' (ignored); everything else is irrelevant to this snapshot.
|
||||
for entry in output.split('\0') {
|
||||
if entry.is_empty() {
|
||||
continue;
|
||||
}
|
||||
let mut parts = entry.splitn(2, ' ');
|
||||
let code = parts.next();
|
||||
let path_part = parts.next();
|
||||
let (Some(code), Some(path_part)) = (code, path_part) else {
|
||||
continue;
|
||||
};
|
||||
if code != "?" && code != "!" {
|
||||
continue;
|
||||
}
|
||||
if path_part.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
let normalized = normalize_relative_path(Path::new(path_part))?;
|
||||
let absolute = repo_root.join(&normalized);
|
||||
let is_dir = absolute.is_dir();
|
||||
if is_dir {
|
||||
snapshot.dirs.push(normalized);
|
||||
} else {
|
||||
snapshot.files.push(normalized);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(snapshot)
|
||||
}
|
||||
|
||||
/// Removes untracked files and directories that were not present when the snapshot was captured.
|
||||
fn remove_new_untracked(
|
||||
repo_root: &Path,
|
||||
preserved_files: &[PathBuf],
|
||||
preserved_dirs: &[PathBuf],
|
||||
current: UntrackedSnapshot,
|
||||
) -> Result<(), GitToolingError> {
|
||||
if current.files.is_empty() && current.dirs.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let preserved_file_set: HashSet<PathBuf> = preserved_files.iter().cloned().collect();
|
||||
let preserved_dirs_vec: Vec<PathBuf> = preserved_dirs.to_vec();
|
||||
|
||||
for path in current.files {
|
||||
if should_preserve(&path, &preserved_file_set, &preserved_dirs_vec) {
|
||||
continue;
|
||||
}
|
||||
remove_path(&repo_root.join(&path))?;
|
||||
}
|
||||
|
||||
for dir in current.dirs {
|
||||
if should_preserve(&dir, &preserved_file_set, &preserved_dirs_vec) {
|
||||
continue;
|
||||
}
|
||||
remove_path(&repo_root.join(&dir))?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Determines whether an untracked path should be kept because it existed in the snapshot.
|
||||
fn should_preserve(
|
||||
path: &Path,
|
||||
preserved_files: &HashSet<PathBuf>,
|
||||
preserved_dirs: &[PathBuf],
|
||||
) -> bool {
|
||||
if preserved_files.contains(path) {
|
||||
return true;
|
||||
}
|
||||
|
||||
preserved_dirs
|
||||
.iter()
|
||||
.any(|dir| path.starts_with(dir.as_path()))
|
||||
}
|
||||
|
||||
/// Deletes the file or directory at the provided path, ignoring if it is already absent.
|
||||
fn remove_path(path: &Path) -> Result<(), GitToolingError> {
|
||||
match fs::symlink_metadata(path) {
|
||||
Ok(metadata) => {
|
||||
if metadata.is_dir() {
|
||||
fs::remove_dir_all(path)?;
|
||||
} else {
|
||||
fs::remove_file(path)?;
|
||||
}
|
||||
}
|
||||
Err(err) => {
|
||||
if err.kind() == io::ErrorKind::NotFound {
|
||||
return Ok(());
|
||||
}
|
||||
return Err(err.into());
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Returns the default author and committer identity for ghost commits.
|
||||
fn default_commit_identity() -> Vec<(OsString, OsString)> {
|
||||
vec![
|
||||
(
|
||||
OsString::from("GIT_AUTHOR_NAME"),
|
||||
OsString::from("Llmx Snapshot"),
|
||||
),
|
||||
(
|
||||
OsString::from("GIT_AUTHOR_EMAIL"),
|
||||
OsString::from("snapshot@llmx.local"),
|
||||
),
|
||||
(
|
||||
OsString::from("GIT_COMMITTER_NAME"),
|
||||
OsString::from("Llmx Snapshot"),
|
||||
),
|
||||
(
|
||||
OsString::from("GIT_COMMITTER_EMAIL"),
|
||||
OsString::from("snapshot@llmx.local"),
|
||||
),
|
||||
]
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::operations::run_git_for_stdout;
|
||||
use assert_matches::assert_matches;
|
||||
use pretty_assertions::assert_eq;
|
||||
use std::process::Command;
|
||||
|
||||
/// Runs a git command in the test repository and asserts success.
|
||||
fn run_git_in(repo_path: &Path, args: &[&str]) {
|
||||
let status = Command::new("git")
|
||||
.current_dir(repo_path)
|
||||
.args(args)
|
||||
.status()
|
||||
.expect("git command");
|
||||
assert!(status.success(), "git command failed: {args:?}");
|
||||
}
|
||||
|
||||
/// Runs a git command and returns its trimmed stdout output.
|
||||
fn run_git_stdout(repo_path: &Path, args: &[&str]) -> String {
|
||||
let output = Command::new("git")
|
||||
.current_dir(repo_path)
|
||||
.args(args)
|
||||
.output()
|
||||
.expect("git command");
|
||||
assert!(output.status.success(), "git command failed: {args:?}");
|
||||
String::from_utf8_lossy(&output.stdout).trim().to_string()
|
||||
}
|
||||
|
||||
/// Initializes a repository with consistent settings for cross-platform tests.
|
||||
fn init_test_repo(repo: &Path) {
|
||||
run_git_in(repo, &["init", "--initial-branch=main"]);
|
||||
run_git_in(repo, &["config", "core.autocrlf", "false"]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Verifies a ghost commit can be created and restored end to end.
|
||||
fn create_and_restore_roundtrip() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
std::fs::write(repo.join("tracked.txt"), "initial\n")?;
|
||||
std::fs::write(repo.join("delete-me.txt"), "to be removed\n")?;
|
||||
run_git_in(repo, &["add", "tracked.txt", "delete-me.txt"]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"init",
|
||||
],
|
||||
);
|
||||
|
||||
let preexisting_untracked = repo.join("notes.txt");
|
||||
std::fs::write(&preexisting_untracked, "notes before\n")?;
|
||||
|
||||
let tracked_contents = "modified contents\n";
|
||||
std::fs::write(repo.join("tracked.txt"), tracked_contents)?;
|
||||
std::fs::remove_file(repo.join("delete-me.txt"))?;
|
||||
let new_file_contents = "hello ghost\n";
|
||||
std::fs::write(repo.join("new-file.txt"), new_file_contents)?;
|
||||
std::fs::write(repo.join(".gitignore"), "ignored.txt\n")?;
|
||||
let ignored_contents = "ignored but captured\n";
|
||||
std::fs::write(repo.join("ignored.txt"), ignored_contents)?;
|
||||
|
||||
let options =
|
||||
CreateGhostCommitOptions::new(repo).force_include(vec![PathBuf::from("ignored.txt")]);
|
||||
let ghost = create_ghost_commit(&options)?;
|
||||
|
||||
assert!(ghost.parent().is_some());
|
||||
let cat = run_git_for_stdout(
|
||||
repo,
|
||||
vec![
|
||||
OsString::from("show"),
|
||||
OsString::from(format!("{}:ignored.txt", ghost.id())),
|
||||
],
|
||||
None,
|
||||
)?;
|
||||
assert_eq!(cat, ignored_contents.trim());
|
||||
|
||||
std::fs::write(repo.join("tracked.txt"), "other state\n")?;
|
||||
std::fs::write(repo.join("ignored.txt"), "changed\n")?;
|
||||
std::fs::remove_file(repo.join("new-file.txt"))?;
|
||||
std::fs::write(repo.join("ephemeral.txt"), "temp data\n")?;
|
||||
std::fs::write(&preexisting_untracked, "notes after\n")?;
|
||||
|
||||
restore_ghost_commit(repo, &ghost)?;
|
||||
|
||||
let tracked_after = std::fs::read_to_string(repo.join("tracked.txt"))?;
|
||||
assert_eq!(tracked_after, tracked_contents);
|
||||
let ignored_after = std::fs::read_to_string(repo.join("ignored.txt"))?;
|
||||
assert_eq!(ignored_after, ignored_contents);
|
||||
let new_file_after = std::fs::read_to_string(repo.join("new-file.txt"))?;
|
||||
assert_eq!(new_file_after, new_file_contents);
|
||||
assert_eq!(repo.join("delete-me.txt").exists(), false);
|
||||
assert!(!repo.join("ephemeral.txt").exists());
|
||||
let notes_after = std::fs::read_to_string(&preexisting_untracked)?;
|
||||
assert_eq!(notes_after, "notes before\n");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Ensures ghost commits succeed in repositories without an existing HEAD.
|
||||
fn create_snapshot_without_existing_head() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
let tracked_contents = "first contents\n";
|
||||
std::fs::write(repo.join("tracked.txt"), tracked_contents)?;
|
||||
let ignored_contents = "ignored but captured\n";
|
||||
std::fs::write(repo.join(".gitignore"), "ignored.txt\n")?;
|
||||
std::fs::write(repo.join("ignored.txt"), ignored_contents)?;
|
||||
|
||||
let options =
|
||||
CreateGhostCommitOptions::new(repo).force_include(vec![PathBuf::from("ignored.txt")]);
|
||||
let ghost = create_ghost_commit(&options)?;
|
||||
|
||||
assert!(ghost.parent().is_none());
|
||||
|
||||
let message = run_git_stdout(repo, &["log", "-1", "--format=%s", ghost.id()]);
|
||||
assert_eq!(message, DEFAULT_COMMIT_MESSAGE);
|
||||
|
||||
let ignored = run_git_stdout(repo, &["show", &format!("{}:ignored.txt", ghost.id())]);
|
||||
assert_eq!(ignored, ignored_contents.trim());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Confirms custom messages are used when creating ghost commits.
|
||||
fn create_ghost_commit_uses_custom_message() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
std::fs::write(repo.join("tracked.txt"), "contents\n")?;
|
||||
run_git_in(repo, &["add", "tracked.txt"]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"initial",
|
||||
],
|
||||
);
|
||||
|
||||
let message = "custom message";
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo).message(message))?;
|
||||
let commit_message = run_git_stdout(repo, &["log", "-1", "--format=%s", ghost.id()]);
|
||||
assert_eq!(commit_message, message);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Rejects force-included paths that escape the repository.
|
||||
fn create_ghost_commit_rejects_force_include_parent_path() {
|
||||
let temp = tempfile::tempdir().expect("tempdir");
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
let options = CreateGhostCommitOptions::new(repo)
|
||||
.force_include(vec![PathBuf::from("../outside.txt")]);
|
||||
let err = create_ghost_commit(&options).unwrap_err();
|
||||
assert_matches!(err, GitToolingError::PathEscapesRepository { .. });
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Restoring a ghost commit from a non-git directory fails.
|
||||
fn restore_requires_git_repository() {
|
||||
let temp = tempfile::tempdir().expect("tempdir");
|
||||
let err = restore_to_commit(temp.path(), "deadbeef").unwrap_err();
|
||||
assert_matches!(err, GitToolingError::NotAGitRepository { .. });
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Restoring from a subdirectory affects only that subdirectory.
|
||||
fn restore_from_subdirectory_restores_files_relatively() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
std::fs::create_dir_all(repo.join("workspace"))?;
|
||||
let workspace = repo.join("workspace");
|
||||
std::fs::write(repo.join("root.txt"), "root contents\n")?;
|
||||
std::fs::write(workspace.join("nested.txt"), "nested contents\n")?;
|
||||
run_git_in(repo, &["add", "."]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"initial",
|
||||
],
|
||||
);
|
||||
|
||||
std::fs::write(repo.join("root.txt"), "root modified\n")?;
|
||||
std::fs::write(workspace.join("nested.txt"), "nested modified\n")?;
|
||||
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(&workspace))?;
|
||||
|
||||
std::fs::write(repo.join("root.txt"), "root after\n")?;
|
||||
std::fs::write(workspace.join("nested.txt"), "nested after\n")?;
|
||||
|
||||
restore_ghost_commit(&workspace, &ghost)?;
|
||||
|
||||
let root_after = std::fs::read_to_string(repo.join("root.txt"))?;
|
||||
assert_eq!(root_after, "root after\n");
|
||||
let nested_after = std::fs::read_to_string(workspace.join("nested.txt"))?;
|
||||
assert_eq!(nested_after, "nested modified\n");
|
||||
assert!(!workspace.join("llmx-rs").exists());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Restoring from a subdirectory preserves ignored files in parent folders.
|
||||
fn restore_from_subdirectory_preserves_parent_vscode() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
let workspace = repo.join("llmx-rs");
|
||||
std::fs::create_dir_all(&workspace)?;
|
||||
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
|
||||
std::fs::write(workspace.join("tracked.txt"), "snapshot version\n")?;
|
||||
run_git_in(repo, &["add", "."]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"initial",
|
||||
],
|
||||
);
|
||||
|
||||
std::fs::write(workspace.join("tracked.txt"), "snapshot delta\n")?;
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(&workspace))?;
|
||||
|
||||
std::fs::write(workspace.join("tracked.txt"), "post-snapshot\n")?;
|
||||
let vscode = repo.join(".vscode");
|
||||
std::fs::create_dir_all(&vscode)?;
|
||||
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
|
||||
|
||||
restore_ghost_commit(&workspace, &ghost)?;
|
||||
|
||||
let tracked_after = std::fs::read_to_string(workspace.join("tracked.txt"))?;
|
||||
assert_eq!(tracked_after, "snapshot delta\n");
|
||||
assert!(vscode.join("settings.json").exists());
|
||||
let settings_after = std::fs::read_to_string(vscode.join("settings.json"))?;
|
||||
assert_eq!(settings_after, "{\n \"after\": true\n}\n");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Restoring from the repository root keeps ignored files intact.
|
||||
fn restore_preserves_ignored_files() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
|
||||
std::fs::write(repo.join("tracked.txt"), "snapshot version\n")?;
|
||||
let vscode = repo.join(".vscode");
|
||||
std::fs::create_dir_all(&vscode)?;
|
||||
std::fs::write(vscode.join("settings.json"), "{\n \"before\": true\n}\n")?;
|
||||
run_git_in(repo, &["add", ".gitignore", "tracked.txt"]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"initial",
|
||||
],
|
||||
);
|
||||
|
||||
std::fs::write(repo.join("tracked.txt"), "snapshot delta\n")?;
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
|
||||
|
||||
std::fs::write(repo.join("tracked.txt"), "post-snapshot\n")?;
|
||||
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
|
||||
std::fs::write(repo.join("temp.txt"), "new file\n")?;
|
||||
|
||||
restore_ghost_commit(repo, &ghost)?;
|
||||
|
||||
let tracked_after = std::fs::read_to_string(repo.join("tracked.txt"))?;
|
||||
assert_eq!(tracked_after, "snapshot delta\n");
|
||||
assert!(vscode.join("settings.json").exists());
|
||||
let settings_after = std::fs::read_to_string(vscode.join("settings.json"))?;
|
||||
assert_eq!(settings_after, "{\n \"after\": true\n}\n");
|
||||
assert!(!repo.join("temp.txt").exists());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
/// Restoring removes ignored directories created after the snapshot.
|
||||
fn restore_removes_new_ignored_directory() -> Result<(), GitToolingError> {
|
||||
let temp = tempfile::tempdir()?;
|
||||
let repo = temp.path();
|
||||
init_test_repo(repo);
|
||||
|
||||
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
|
||||
std::fs::write(repo.join("tracked.txt"), "snapshot version\n")?;
|
||||
run_git_in(repo, &["add", ".gitignore", "tracked.txt"]);
|
||||
run_git_in(
|
||||
repo,
|
||||
&[
|
||||
"-c",
|
||||
"user.name=Tester",
|
||||
"-c",
|
||||
"user.email=test@example.com",
|
||||
"commit",
|
||||
"-m",
|
||||
"initial",
|
||||
],
|
||||
);
|
||||
|
||||
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
|
||||
|
||||
let vscode = repo.join(".vscode");
|
||||
std::fs::create_dir_all(&vscode)?;
|
||||
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
|
||||
|
||||
restore_ghost_commit(repo, &ghost)?;
|
||||
|
||||
assert!(!vscode.exists());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
79
llmx-rs/utils/git/src/lib.rs
Normal file
79
llmx-rs/utils/git/src/lib.rs
Normal file
@@ -0,0 +1,79 @@
|
||||
use std::fmt;
|
||||
use std::path::PathBuf;
|
||||
|
||||
mod apply;
|
||||
mod errors;
|
||||
mod ghost_commits;
|
||||
mod operations;
|
||||
mod platform;
|
||||
|
||||
pub use apply::ApplyGitRequest;
|
||||
pub use apply::ApplyGitResult;
|
||||
pub use apply::apply_git_patch;
|
||||
pub use apply::extract_paths_from_patch;
|
||||
pub use apply::parse_git_apply_output;
|
||||
pub use apply::stage_paths;
|
||||
pub use errors::GitToolingError;
|
||||
pub use ghost_commits::CreateGhostCommitOptions;
|
||||
pub use ghost_commits::create_ghost_commit;
|
||||
pub use ghost_commits::restore_ghost_commit;
|
||||
pub use ghost_commits::restore_to_commit;
|
||||
pub use platform::create_symlink;
|
||||
use schemars::JsonSchema;
|
||||
use serde::Deserialize;
|
||||
use serde::Serialize;
|
||||
use ts_rs::TS;
|
||||
|
||||
type CommitID = String;
|
||||
|
||||
/// Details of a ghost commit created from a repository state.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, JsonSchema, TS)]
|
||||
pub struct GhostCommit {
|
||||
id: CommitID,
|
||||
parent: Option<CommitID>,
|
||||
preexisting_untracked_files: Vec<PathBuf>,
|
||||
preexisting_untracked_dirs: Vec<PathBuf>,
|
||||
}
|
||||
|
||||
impl GhostCommit {
|
||||
/// Create a new ghost commit wrapper from a raw commit ID and optional parent.
|
||||
pub fn new(
|
||||
id: CommitID,
|
||||
parent: Option<CommitID>,
|
||||
preexisting_untracked_files: Vec<PathBuf>,
|
||||
preexisting_untracked_dirs: Vec<PathBuf>,
|
||||
) -> Self {
|
||||
Self {
|
||||
id,
|
||||
parent,
|
||||
preexisting_untracked_files,
|
||||
preexisting_untracked_dirs,
|
||||
}
|
||||
}
|
||||
|
||||
/// Commit ID for the snapshot.
|
||||
pub fn id(&self) -> &str {
|
||||
&self.id
|
||||
}
|
||||
|
||||
/// Parent commit ID, if the repository had a `HEAD` at creation time.
|
||||
pub fn parent(&self) -> Option<&str> {
|
||||
self.parent.as_deref()
|
||||
}
|
||||
|
||||
/// Untracked or ignored files that already existed when the snapshot was captured.
|
||||
pub fn preexisting_untracked_files(&self) -> &[PathBuf] {
|
||||
&self.preexisting_untracked_files
|
||||
}
|
||||
|
||||
/// Untracked or ignored directories that already existed when the snapshot was captured.
|
||||
pub fn preexisting_untracked_dirs(&self) -> &[PathBuf] {
|
||||
&self.preexisting_untracked_dirs
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Display for GhostCommit {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
write!(f, "{}", self.id)
|
||||
}
|
||||
}
|
||||
239
llmx-rs/utils/git/src/operations.rs
Normal file
239
llmx-rs/utils/git/src/operations.rs
Normal file
@@ -0,0 +1,239 @@
|
||||
use std::ffi::OsStr;
|
||||
use std::ffi::OsString;
|
||||
use std::path::Component;
|
||||
use std::path::Path;
|
||||
use std::path::PathBuf;
|
||||
use std::process::Command;
|
||||
|
||||
use crate::GitToolingError;
|
||||
|
||||
pub(crate) fn ensure_git_repository(path: &Path) -> Result<(), GitToolingError> {
|
||||
match run_git_for_stdout(
|
||||
path,
|
||||
vec![
|
||||
OsString::from("rev-parse"),
|
||||
OsString::from("--is-inside-work-tree"),
|
||||
],
|
||||
None,
|
||||
) {
|
||||
Ok(output) if output.trim() == "true" => Ok(()),
|
||||
Ok(_) => Err(GitToolingError::NotAGitRepository {
|
||||
path: path.to_path_buf(),
|
||||
}),
|
||||
Err(GitToolingError::GitCommand { status, .. }) if status.code() == Some(128) => {
|
||||
Err(GitToolingError::NotAGitRepository {
|
||||
path: path.to_path_buf(),
|
||||
})
|
||||
}
|
||||
Err(err) => Err(err),
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn resolve_head(path: &Path) -> Result<Option<String>, GitToolingError> {
|
||||
match run_git_for_stdout(
|
||||
path,
|
||||
vec![
|
||||
OsString::from("rev-parse"),
|
||||
OsString::from("--verify"),
|
||||
OsString::from("HEAD"),
|
||||
],
|
||||
None,
|
||||
) {
|
||||
Ok(sha) => Ok(Some(sha)),
|
||||
Err(GitToolingError::GitCommand { status, .. }) if status.code() == Some(128) => Ok(None),
|
||||
Err(other) => Err(other),
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn normalize_relative_path(path: &Path) -> Result<PathBuf, GitToolingError> {
|
||||
let mut result = PathBuf::new();
|
||||
let mut saw_component = false;
|
||||
for component in path.components() {
|
||||
saw_component = true;
|
||||
match component {
|
||||
Component::Normal(part) => result.push(part),
|
||||
Component::CurDir => {}
|
||||
Component::ParentDir => {
|
||||
if !result.pop() {
|
||||
return Err(GitToolingError::PathEscapesRepository {
|
||||
path: path.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
Component::RootDir | Component::Prefix(_) => {
|
||||
return Err(GitToolingError::NonRelativePath {
|
||||
path: path.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !saw_component {
|
||||
return Err(GitToolingError::NonRelativePath {
|
||||
path: path.to_path_buf(),
|
||||
});
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
pub(crate) fn resolve_repository_root(path: &Path) -> Result<PathBuf, GitToolingError> {
|
||||
let root = run_git_for_stdout(
|
||||
path,
|
||||
vec![
|
||||
OsString::from("rev-parse"),
|
||||
OsString::from("--show-toplevel"),
|
||||
],
|
||||
None,
|
||||
)?;
|
||||
Ok(PathBuf::from(root))
|
||||
}
|
||||
|
||||
pub(crate) fn apply_repo_prefix_to_force_include(
|
||||
prefix: Option<&Path>,
|
||||
paths: &[PathBuf],
|
||||
) -> Vec<PathBuf> {
|
||||
if paths.is_empty() {
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
match prefix {
|
||||
Some(prefix) => paths.iter().map(|path| prefix.join(path)).collect(),
|
||||
None => paths.to_vec(),
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn repo_subdir(repo_root: &Path, repo_path: &Path) -> Option<PathBuf> {
|
||||
if repo_root == repo_path {
|
||||
return None;
|
||||
}
|
||||
|
||||
repo_path
|
||||
.strip_prefix(repo_root)
|
||||
.ok()
|
||||
.and_then(non_empty_path)
|
||||
.or_else(|| {
|
||||
let repo_root_canon = repo_root.canonicalize().ok()?;
|
||||
let repo_path_canon = repo_path.canonicalize().ok()?;
|
||||
repo_path_canon
|
||||
.strip_prefix(&repo_root_canon)
|
||||
.ok()
|
||||
.and_then(non_empty_path)
|
||||
})
|
||||
}
|
||||
|
||||
fn non_empty_path(path: &Path) -> Option<PathBuf> {
|
||||
if path.as_os_str().is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(path.to_path_buf())
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn run_git_for_status<I, S>(
|
||||
dir: &Path,
|
||||
args: I,
|
||||
env: Option<&[(OsString, OsString)]>,
|
||||
) -> Result<(), GitToolingError>
|
||||
where
|
||||
I: IntoIterator<Item = S>,
|
||||
S: AsRef<OsStr>,
|
||||
{
|
||||
run_git(dir, args, env)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub(crate) fn run_git_for_stdout<I, S>(
|
||||
dir: &Path,
|
||||
args: I,
|
||||
env: Option<&[(OsString, OsString)]>,
|
||||
) -> Result<String, GitToolingError>
|
||||
where
|
||||
I: IntoIterator<Item = S>,
|
||||
S: AsRef<OsStr>,
|
||||
{
|
||||
let run = run_git(dir, args, env)?;
|
||||
String::from_utf8(run.output.stdout)
|
||||
.map(|value| value.trim().to_string())
|
||||
.map_err(|source| GitToolingError::GitOutputUtf8 {
|
||||
command: run.command,
|
||||
source,
|
||||
})
|
||||
}
|
||||
|
||||
/// Executes `git` and returns the full stdout without trimming so callers
|
||||
/// can parse delimiter-sensitive output, propagating UTF-8 errors with context.
|
||||
pub(crate) fn run_git_for_stdout_all<I, S>(
|
||||
dir: &Path,
|
||||
args: I,
|
||||
env: Option<&[(OsString, OsString)]>,
|
||||
) -> Result<String, GitToolingError>
|
||||
where
|
||||
I: IntoIterator<Item = S>,
|
||||
S: AsRef<OsStr>,
|
||||
{
|
||||
// Keep the raw stdout untouched so callers can parse delimiter-sensitive
|
||||
// output (e.g. NUL-separated paths) without trimming artefacts.
|
||||
let run = run_git(dir, args, env)?;
|
||||
// Propagate UTF-8 conversion failures with the command context for debugging.
|
||||
String::from_utf8(run.output.stdout).map_err(|source| GitToolingError::GitOutputUtf8 {
|
||||
command: run.command,
|
||||
source,
|
||||
})
|
||||
}
|
||||
|
||||
fn run_git<I, S>(
|
||||
dir: &Path,
|
||||
args: I,
|
||||
env: Option<&[(OsString, OsString)]>,
|
||||
) -> Result<GitRun, GitToolingError>
|
||||
where
|
||||
I: IntoIterator<Item = S>,
|
||||
S: AsRef<OsStr>,
|
||||
{
|
||||
let iterator = args.into_iter();
|
||||
let (lower, upper) = iterator.size_hint();
|
||||
let mut args_vec = Vec::with_capacity(upper.unwrap_or(lower));
|
||||
for arg in iterator {
|
||||
args_vec.push(OsString::from(arg.as_ref()));
|
||||
}
|
||||
let command_string = build_command_string(&args_vec);
|
||||
let mut command = Command::new("git");
|
||||
command.current_dir(dir);
|
||||
if let Some(envs) = env {
|
||||
for (key, value) in envs {
|
||||
command.env(key, value);
|
||||
}
|
||||
}
|
||||
command.args(&args_vec);
|
||||
let output = command.output()?;
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
|
||||
return Err(GitToolingError::GitCommand {
|
||||
command: command_string,
|
||||
status: output.status,
|
||||
stderr,
|
||||
});
|
||||
}
|
||||
Ok(GitRun {
|
||||
command: command_string,
|
||||
output,
|
||||
})
|
||||
}
|
||||
|
||||
fn build_command_string(args: &[OsString]) -> String {
|
||||
if args.is_empty() {
|
||||
return "git".to_string();
|
||||
}
|
||||
let joined = args
|
||||
.iter()
|
||||
.map(|arg| arg.to_string_lossy().into_owned())
|
||||
.collect::<Vec<_>>()
|
||||
.join(" ");
|
||||
format!("git {joined}")
|
||||
}
|
||||
|
||||
struct GitRun {
|
||||
command: String,
|
||||
output: std::process::Output,
|
||||
}
|
||||
37
llmx-rs/utils/git/src/platform.rs
Normal file
37
llmx-rs/utils/git/src/platform.rs
Normal file
@@ -0,0 +1,37 @@
|
||||
use std::path::Path;
|
||||
|
||||
use crate::GitToolingError;
|
||||
|
||||
#[cfg(unix)]
|
||||
pub fn create_symlink(
|
||||
_source: &Path,
|
||||
link_target: &Path,
|
||||
destination: &Path,
|
||||
) -> Result<(), GitToolingError> {
|
||||
use std::os::unix::fs::symlink;
|
||||
|
||||
symlink(link_target, destination)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
pub fn create_symlink(
|
||||
source: &Path,
|
||||
link_target: &Path,
|
||||
destination: &Path,
|
||||
) -> Result<(), GitToolingError> {
|
||||
use std::os::windows::fs::FileTypeExt;
|
||||
use std::os::windows::fs::symlink_dir;
|
||||
use std::os::windows::fs::symlink_file;
|
||||
|
||||
let metadata = std::fs::symlink_metadata(source)?;
|
||||
if metadata.file_type().is_symlink_dir() {
|
||||
symlink_dir(link_target, destination)?;
|
||||
} else {
|
||||
symlink_file(link_target, destination)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(not(any(unix, windows)))]
|
||||
compile_error!("llmx-git symlink support is only implemented for Unix and Windows");
|
||||
18
llmx-rs/utils/image/Cargo.toml
Normal file
18
llmx-rs/utils/image/Cargo.toml
Normal file
@@ -0,0 +1,18 @@
|
||||
[package]
|
||||
name = "llmx-utils-image"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
[dependencies]
|
||||
base64 = { workspace = true }
|
||||
image = { workspace = true, features = ["jpeg", "png"] }
|
||||
llmx-utils-cache = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
tokio = { workspace = true, features = ["fs", "rt", "rt-multi-thread", "macros"] }
|
||||
|
||||
[dev-dependencies]
|
||||
image = { workspace = true, features = ["jpeg", "png"] }
|
||||
tempfile = { workspace = true }
|
||||
25
llmx-rs/utils/image/src/error.rs
Normal file
25
llmx-rs/utils/image/src/error.rs
Normal file
@@ -0,0 +1,25 @@
|
||||
use image::ImageFormat;
|
||||
use std::path::PathBuf;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ImageProcessingError {
|
||||
#[error("failed to read image at {path}: {source}")]
|
||||
Read {
|
||||
path: PathBuf,
|
||||
#[source]
|
||||
source: std::io::Error,
|
||||
},
|
||||
#[error("failed to decode image at {path}: {source}")]
|
||||
Decode {
|
||||
path: PathBuf,
|
||||
#[source]
|
||||
source: image::ImageError,
|
||||
},
|
||||
#[error("failed to encode image as {format:?}: {source}")]
|
||||
Encode {
|
||||
format: ImageFormat,
|
||||
#[source]
|
||||
source: image::ImageError,
|
||||
},
|
||||
}
|
||||
252
llmx-rs/utils/image/src/lib.rs
Normal file
252
llmx-rs/utils/image/src/lib.rs
Normal file
@@ -0,0 +1,252 @@
|
||||
use std::num::NonZeroUsize;
|
||||
use std::path::Path;
|
||||
use std::sync::LazyLock;
|
||||
|
||||
use crate::error::ImageProcessingError;
|
||||
use base64::Engine;
|
||||
use base64::engine::general_purpose::STANDARD as BASE64_STANDARD;
|
||||
use image::ColorType;
|
||||
use image::DynamicImage;
|
||||
use image::GenericImageView;
|
||||
use image::ImageEncoder;
|
||||
use image::ImageFormat;
|
||||
use image::codecs::jpeg::JpegEncoder;
|
||||
use image::codecs::png::PngEncoder;
|
||||
use image::imageops::FilterType;
|
||||
use llmx_utils_cache::BlockingLruCache;
|
||||
use llmx_utils_cache::sha1_digest;
|
||||
/// Maximum width used when resizing images before uploading.
|
||||
pub const MAX_WIDTH: u32 = 2048;
|
||||
/// Maximum height used when resizing images before uploading.
|
||||
pub const MAX_HEIGHT: u32 = 768;
|
||||
|
||||
pub mod error;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct EncodedImage {
|
||||
pub bytes: Vec<u8>,
|
||||
pub mime: String,
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
}
|
||||
|
||||
impl EncodedImage {
|
||||
pub fn into_data_url(self) -> String {
|
||||
let encoded = BASE64_STANDARD.encode(&self.bytes);
|
||||
format!("data:{};base64,{}", self.mime, encoded)
|
||||
}
|
||||
}
|
||||
|
||||
static IMAGE_CACHE: LazyLock<BlockingLruCache<[u8; 20], EncodedImage>> =
|
||||
LazyLock::new(|| BlockingLruCache::new(NonZeroUsize::new(32).unwrap_or(NonZeroUsize::MIN)));
|
||||
|
||||
pub fn load_and_resize_to_fit(path: &Path) -> Result<EncodedImage, ImageProcessingError> {
|
||||
let path_buf = path.to_path_buf();
|
||||
|
||||
let file_bytes = read_file_bytes(path, &path_buf)?;
|
||||
|
||||
let key = sha1_digest(&file_bytes);
|
||||
|
||||
IMAGE_CACHE.get_or_try_insert_with(key, move || {
|
||||
let format = match image::guess_format(&file_bytes) {
|
||||
Ok(ImageFormat::Png) => Some(ImageFormat::Png),
|
||||
Ok(ImageFormat::Jpeg) => Some(ImageFormat::Jpeg),
|
||||
_ => None,
|
||||
};
|
||||
|
||||
let dynamic = image::load_from_memory(&file_bytes).map_err(|source| {
|
||||
ImageProcessingError::Decode {
|
||||
path: path_buf.clone(),
|
||||
source,
|
||||
}
|
||||
})?;
|
||||
|
||||
let (width, height) = dynamic.dimensions();
|
||||
|
||||
let encoded = if width <= MAX_WIDTH && height <= MAX_HEIGHT {
|
||||
if let Some(format) = format {
|
||||
let mime = format_to_mime(format);
|
||||
EncodedImage {
|
||||
bytes: file_bytes,
|
||||
mime,
|
||||
width,
|
||||
height,
|
||||
}
|
||||
} else {
|
||||
let (bytes, output_format) = encode_image(&dynamic, ImageFormat::Png)?;
|
||||
let mime = format_to_mime(output_format);
|
||||
EncodedImage {
|
||||
bytes,
|
||||
mime,
|
||||
width,
|
||||
height,
|
||||
}
|
||||
}
|
||||
} else {
|
||||
let resized = dynamic.resize(MAX_WIDTH, MAX_HEIGHT, FilterType::Triangle);
|
||||
let target_format = format.unwrap_or(ImageFormat::Png);
|
||||
let (bytes, output_format) = encode_image(&resized, target_format)?;
|
||||
let mime = format_to_mime(output_format);
|
||||
EncodedImage {
|
||||
bytes,
|
||||
mime,
|
||||
width: resized.width(),
|
||||
height: resized.height(),
|
||||
}
|
||||
};
|
||||
|
||||
Ok(encoded)
|
||||
})
|
||||
}
|
||||
|
||||
fn read_file_bytes(path: &Path, path_for_error: &Path) -> Result<Vec<u8>, ImageProcessingError> {
|
||||
match tokio::runtime::Handle::try_current() {
|
||||
// If we're inside a Tokio runtime, avoid block_on (it panics on worker threads).
|
||||
// Use block_in_place and do a standard blocking read safely.
|
||||
Ok(_) => tokio::task::block_in_place(|| std::fs::read(path)).map_err(|source| {
|
||||
ImageProcessingError::Read {
|
||||
path: path_for_error.to_path_buf(),
|
||||
source,
|
||||
}
|
||||
}),
|
||||
// Outside a runtime, just read synchronously.
|
||||
Err(_) => std::fs::read(path).map_err(|source| ImageProcessingError::Read {
|
||||
path: path_for_error.to_path_buf(),
|
||||
source,
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
fn encode_image(
|
||||
image: &DynamicImage,
|
||||
preferred_format: ImageFormat,
|
||||
) -> Result<(Vec<u8>, ImageFormat), ImageProcessingError> {
|
||||
let target_format = match preferred_format {
|
||||
ImageFormat::Jpeg => ImageFormat::Jpeg,
|
||||
_ => ImageFormat::Png,
|
||||
};
|
||||
|
||||
let mut buffer = Vec::new();
|
||||
|
||||
match target_format {
|
||||
ImageFormat::Png => {
|
||||
let rgba = image.to_rgba8();
|
||||
let encoder = PngEncoder::new(&mut buffer);
|
||||
encoder
|
||||
.write_image(
|
||||
rgba.as_raw(),
|
||||
image.width(),
|
||||
image.height(),
|
||||
ColorType::Rgba8.into(),
|
||||
)
|
||||
.map_err(|source| ImageProcessingError::Encode {
|
||||
format: target_format,
|
||||
source,
|
||||
})?;
|
||||
}
|
||||
ImageFormat::Jpeg => {
|
||||
let mut encoder = JpegEncoder::new_with_quality(&mut buffer, 85);
|
||||
encoder
|
||||
.encode_image(image)
|
||||
.map_err(|source| ImageProcessingError::Encode {
|
||||
format: target_format,
|
||||
source,
|
||||
})?;
|
||||
}
|
||||
_ => unreachable!("unsupported target_format should have been handled earlier"),
|
||||
}
|
||||
|
||||
Ok((buffer, target_format))
|
||||
}
|
||||
|
||||
fn format_to_mime(format: ImageFormat) -> String {
|
||||
match format {
|
||||
ImageFormat::Jpeg => "image/jpeg".to_string(),
|
||||
_ => "image/png".to_string(),
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use image::GenericImageView;
|
||||
use image::ImageBuffer;
|
||||
use image::Rgba;
|
||||
use tempfile::NamedTempFile;
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn returns_original_image_when_within_bounds() {
|
||||
let temp_file = NamedTempFile::new().expect("temp file");
|
||||
let image = ImageBuffer::from_pixel(64, 32, Rgba([10u8, 20, 30, 255]));
|
||||
image
|
||||
.save_with_format(temp_file.path(), ImageFormat::Png)
|
||||
.expect("write png to temp file");
|
||||
|
||||
let original_bytes = std::fs::read(temp_file.path()).expect("read written image");
|
||||
|
||||
let encoded = load_and_resize_to_fit(temp_file.path()).expect("process image");
|
||||
|
||||
assert_eq!(encoded.width, 64);
|
||||
assert_eq!(encoded.height, 32);
|
||||
assert_eq!(encoded.mime, "image/png");
|
||||
assert_eq!(encoded.bytes, original_bytes);
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn downscales_large_image() {
|
||||
let temp_file = NamedTempFile::new().expect("temp file");
|
||||
let image = ImageBuffer::from_pixel(4096, 2048, Rgba([200u8, 10, 10, 255]));
|
||||
image
|
||||
.save_with_format(temp_file.path(), ImageFormat::Png)
|
||||
.expect("write png to temp file");
|
||||
|
||||
let processed = load_and_resize_to_fit(temp_file.path()).expect("process image");
|
||||
|
||||
assert!(processed.width <= MAX_WIDTH);
|
||||
assert!(processed.height <= MAX_HEIGHT);
|
||||
|
||||
let loaded =
|
||||
image::load_from_memory(&processed.bytes).expect("read resized bytes back into image");
|
||||
assert_eq!(loaded.dimensions(), (processed.width, processed.height));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn fails_cleanly_for_invalid_images() {
|
||||
let temp_file = NamedTempFile::new().expect("temp file");
|
||||
std::fs::write(temp_file.path(), b"not an image").expect("write bytes");
|
||||
|
||||
let err = load_and_resize_to_fit(temp_file.path()).expect_err("invalid image should fail");
|
||||
match err {
|
||||
ImageProcessingError::Decode { .. } => {}
|
||||
_ => panic!("unexpected error variant"),
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread")]
|
||||
async fn reprocesses_updated_file_contents() {
|
||||
{
|
||||
IMAGE_CACHE.clear();
|
||||
}
|
||||
|
||||
let temp_file = NamedTempFile::new().expect("temp file");
|
||||
let first_image = ImageBuffer::from_pixel(32, 16, Rgba([20u8, 120, 220, 255]));
|
||||
first_image
|
||||
.save_with_format(temp_file.path(), ImageFormat::Png)
|
||||
.expect("write initial image");
|
||||
|
||||
let first = load_and_resize_to_fit(temp_file.path()).expect("process first image");
|
||||
|
||||
let second_image = ImageBuffer::from_pixel(96, 48, Rgba([50u8, 60, 70, 255]));
|
||||
second_image
|
||||
.save_with_format(temp_file.path(), ImageFormat::Png)
|
||||
.expect("write updated image");
|
||||
|
||||
let second = load_and_resize_to_fit(temp_file.path()).expect("process updated image");
|
||||
|
||||
assert_eq!(first.width, 32);
|
||||
assert_eq!(first.height, 16);
|
||||
assert_eq!(second.width, 96);
|
||||
assert_eq!(second.height, 48);
|
||||
assert_ne!(second.bytes, first.bytes);
|
||||
}
|
||||
}
|
||||
14
llmx-rs/utils/json-to-toml/Cargo.toml
Normal file
14
llmx-rs/utils/json-to-toml/Cargo.toml
Normal file
@@ -0,0 +1,14 @@
|
||||
[package]
|
||||
edition.workspace = true
|
||||
name = "llmx-utils-json-to-toml"
|
||||
version.workspace = true
|
||||
|
||||
[dependencies]
|
||||
serde_json = { workspace = true }
|
||||
toml = { workspace = true }
|
||||
|
||||
[dev-dependencies]
|
||||
pretty_assertions = { workspace = true }
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
83
llmx-rs/utils/json-to-toml/src/lib.rs
Normal file
83
llmx-rs/utils/json-to-toml/src/lib.rs
Normal file
@@ -0,0 +1,83 @@
|
||||
use serde_json::Value as JsonValue;
|
||||
use toml::Value as TomlValue;
|
||||
|
||||
/// Convert a `serde_json::Value` into a semantically equivalent `toml::Value`.
|
||||
pub fn json_to_toml(v: JsonValue) -> TomlValue {
|
||||
match v {
|
||||
JsonValue::Null => TomlValue::String(String::new()),
|
||||
JsonValue::Bool(b) => TomlValue::Boolean(b),
|
||||
JsonValue::Number(n) => {
|
||||
if let Some(i) = n.as_i64() {
|
||||
TomlValue::Integer(i)
|
||||
} else if let Some(f) = n.as_f64() {
|
||||
TomlValue::Float(f)
|
||||
} else {
|
||||
TomlValue::String(n.to_string())
|
||||
}
|
||||
}
|
||||
JsonValue::String(s) => TomlValue::String(s),
|
||||
JsonValue::Array(arr) => TomlValue::Array(arr.into_iter().map(json_to_toml).collect()),
|
||||
JsonValue::Object(map) => {
|
||||
let tbl = map
|
||||
.into_iter()
|
||||
.map(|(k, v)| (k, json_to_toml(v)))
|
||||
.collect::<toml::value::Table>();
|
||||
TomlValue::Table(tbl)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use pretty_assertions::assert_eq;
|
||||
use serde_json::json;
|
||||
|
||||
#[test]
|
||||
fn json_number_to_toml() {
|
||||
let json_value = json!(123);
|
||||
assert_eq!(TomlValue::Integer(123), json_to_toml(json_value));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_array_to_toml() {
|
||||
let json_value = json!([true, 1]);
|
||||
assert_eq!(
|
||||
TomlValue::Array(vec![TomlValue::Boolean(true), TomlValue::Integer(1)]),
|
||||
json_to_toml(json_value)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_bool_to_toml() {
|
||||
let json_value = json!(false);
|
||||
assert_eq!(TomlValue::Boolean(false), json_to_toml(json_value));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_float_to_toml() {
|
||||
let json_value = json!(1.25);
|
||||
assert_eq!(TomlValue::Float(1.25), json_to_toml(json_value));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_null_to_toml() {
|
||||
let json_value = serde_json::Value::Null;
|
||||
assert_eq!(TomlValue::String(String::new()), json_to_toml(json_value));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_object_nested() {
|
||||
let json_value = json!({ "outer": { "inner": 2 } });
|
||||
let expected = {
|
||||
let mut inner = toml::value::Table::new();
|
||||
inner.insert("inner".into(), TomlValue::Integer(2));
|
||||
|
||||
let mut outer = toml::value::Table::new();
|
||||
outer.insert("outer".into(), TomlValue::Table(inner));
|
||||
TomlValue::Table(outer)
|
||||
};
|
||||
|
||||
assert_eq!(json_to_toml(json_value), expected);
|
||||
}
|
||||
}
|
||||
16
llmx-rs/utils/pty/Cargo.toml
Normal file
16
llmx-rs/utils/pty/Cargo.toml
Normal file
@@ -0,0 +1,16 @@
|
||||
[package]
|
||||
edition = "2021"
|
||||
name = "llmx-utils-pty"
|
||||
version = { workspace = true }
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
[dependencies]
|
||||
anyhow = { workspace = true }
|
||||
portable-pty = { workspace = true }
|
||||
tokio = { workspace = true, features = [
|
||||
"macros",
|
||||
"rt-multi-thread",
|
||||
"sync",
|
||||
] }
|
||||
211
llmx-rs/utils/pty/src/lib.rs
Normal file
211
llmx-rs/utils/pty/src/lib.rs
Normal file
@@ -0,0 +1,211 @@
|
||||
use std::collections::HashMap;
|
||||
use std::io::ErrorKind;
|
||||
use std::path::Path;
|
||||
use std::sync::atomic::AtomicBool;
|
||||
use std::sync::Arc;
|
||||
use std::sync::Mutex as StdMutex;
|
||||
use std::time::Duration;
|
||||
|
||||
use anyhow::Result;
|
||||
use portable_pty::native_pty_system;
|
||||
use portable_pty::CommandBuilder;
|
||||
use portable_pty::PtySize;
|
||||
use tokio::sync::broadcast;
|
||||
use tokio::sync::mpsc;
|
||||
use tokio::sync::oneshot;
|
||||
use tokio::sync::Mutex as TokioMutex;
|
||||
use tokio::task::JoinHandle;
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct ExecCommandSession {
|
||||
writer_tx: mpsc::Sender<Vec<u8>>,
|
||||
output_tx: broadcast::Sender<Vec<u8>>,
|
||||
killer: StdMutex<Option<Box<dyn portable_pty::ChildKiller + Send + Sync>>>,
|
||||
reader_handle: StdMutex<Option<JoinHandle<()>>>,
|
||||
writer_handle: StdMutex<Option<JoinHandle<()>>>,
|
||||
wait_handle: StdMutex<Option<JoinHandle<()>>>,
|
||||
exit_status: Arc<AtomicBool>,
|
||||
exit_code: Arc<StdMutex<Option<i32>>>,
|
||||
}
|
||||
|
||||
impl ExecCommandSession {
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn new(
|
||||
writer_tx: mpsc::Sender<Vec<u8>>,
|
||||
output_tx: broadcast::Sender<Vec<u8>>,
|
||||
killer: Box<dyn portable_pty::ChildKiller + Send + Sync>,
|
||||
reader_handle: JoinHandle<()>,
|
||||
writer_handle: JoinHandle<()>,
|
||||
wait_handle: JoinHandle<()>,
|
||||
exit_status: Arc<AtomicBool>,
|
||||
exit_code: Arc<StdMutex<Option<i32>>>,
|
||||
) -> (Self, broadcast::Receiver<Vec<u8>>) {
|
||||
let initial_output_rx = output_tx.subscribe();
|
||||
(
|
||||
Self {
|
||||
writer_tx,
|
||||
output_tx,
|
||||
killer: StdMutex::new(Some(killer)),
|
||||
reader_handle: StdMutex::new(Some(reader_handle)),
|
||||
writer_handle: StdMutex::new(Some(writer_handle)),
|
||||
wait_handle: StdMutex::new(Some(wait_handle)),
|
||||
exit_status,
|
||||
exit_code,
|
||||
},
|
||||
initial_output_rx,
|
||||
)
|
||||
}
|
||||
|
||||
pub fn writer_sender(&self) -> mpsc::Sender<Vec<u8>> {
|
||||
self.writer_tx.clone()
|
||||
}
|
||||
|
||||
pub fn output_receiver(&self) -> broadcast::Receiver<Vec<u8>> {
|
||||
self.output_tx.subscribe()
|
||||
}
|
||||
|
||||
pub fn has_exited(&self) -> bool {
|
||||
self.exit_status.load(std::sync::atomic::Ordering::SeqCst)
|
||||
}
|
||||
|
||||
pub fn exit_code(&self) -> Option<i32> {
|
||||
self.exit_code.lock().ok().and_then(|guard| *guard)
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for ExecCommandSession {
|
||||
fn drop(&mut self) {
|
||||
if let Ok(mut killer_opt) = self.killer.lock() {
|
||||
if let Some(mut killer) = killer_opt.take() {
|
||||
let _ = killer.kill();
|
||||
}
|
||||
}
|
||||
|
||||
if let Ok(mut h) = self.reader_handle.lock() {
|
||||
if let Some(handle) = h.take() {
|
||||
handle.abort();
|
||||
}
|
||||
}
|
||||
if let Ok(mut h) = self.writer_handle.lock() {
|
||||
if let Some(handle) = h.take() {
|
||||
handle.abort();
|
||||
}
|
||||
}
|
||||
if let Ok(mut h) = self.wait_handle.lock() {
|
||||
if let Some(handle) = h.take() {
|
||||
handle.abort();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct SpawnedPty {
|
||||
pub session: ExecCommandSession,
|
||||
pub output_rx: broadcast::Receiver<Vec<u8>>,
|
||||
pub exit_rx: oneshot::Receiver<i32>,
|
||||
}
|
||||
|
||||
pub async fn spawn_pty_process(
|
||||
program: &str,
|
||||
args: &[String],
|
||||
cwd: &Path,
|
||||
env: &HashMap<String, String>,
|
||||
arg0: &Option<String>,
|
||||
) -> Result<SpawnedPty> {
|
||||
if program.is_empty() {
|
||||
anyhow::bail!("missing program for PTY spawn");
|
||||
}
|
||||
|
||||
let pty_system = native_pty_system();
|
||||
let pair = pty_system.openpty(PtySize {
|
||||
rows: 24,
|
||||
cols: 80,
|
||||
pixel_width: 0,
|
||||
pixel_height: 0,
|
||||
})?;
|
||||
|
||||
let mut command_builder = CommandBuilder::new(arg0.as_ref().unwrap_or(&program.to_string()));
|
||||
command_builder.cwd(cwd);
|
||||
command_builder.env_clear();
|
||||
for arg in args {
|
||||
command_builder.arg(arg);
|
||||
}
|
||||
for (key, value) in env {
|
||||
command_builder.env(key, value);
|
||||
}
|
||||
|
||||
let mut child = pair.slave.spawn_command(command_builder)?;
|
||||
let killer = child.clone_killer();
|
||||
|
||||
let (writer_tx, mut writer_rx) = mpsc::channel::<Vec<u8>>(128);
|
||||
let (output_tx, _) = broadcast::channel::<Vec<u8>>(256);
|
||||
|
||||
let mut reader = pair.master.try_clone_reader()?;
|
||||
let output_tx_clone = output_tx.clone();
|
||||
let reader_handle: JoinHandle<()> = tokio::task::spawn_blocking(move || {
|
||||
let mut buf = [0u8; 8_192];
|
||||
loop {
|
||||
match reader.read(&mut buf) {
|
||||
Ok(0) => break,
|
||||
Ok(n) => {
|
||||
let _ = output_tx_clone.send(buf[..n].to_vec());
|
||||
}
|
||||
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
|
||||
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
|
||||
std::thread::sleep(Duration::from_millis(5));
|
||||
continue;
|
||||
}
|
||||
Err(_) => break,
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
let writer = pair.master.take_writer()?;
|
||||
let writer = Arc::new(TokioMutex::new(writer));
|
||||
let writer_handle: JoinHandle<()> = tokio::spawn({
|
||||
let writer = Arc::clone(&writer);
|
||||
async move {
|
||||
while let Some(bytes) = writer_rx.recv().await {
|
||||
let mut guard = writer.lock().await;
|
||||
use std::io::Write;
|
||||
let _ = guard.write_all(&bytes);
|
||||
let _ = guard.flush();
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
let (exit_tx, exit_rx) = oneshot::channel::<i32>();
|
||||
let exit_status = Arc::new(AtomicBool::new(false));
|
||||
let wait_exit_status = Arc::clone(&exit_status);
|
||||
let exit_code = Arc::new(StdMutex::new(None));
|
||||
let wait_exit_code = Arc::clone(&exit_code);
|
||||
let wait_handle: JoinHandle<()> = tokio::task::spawn_blocking(move || {
|
||||
let code = match child.wait() {
|
||||
Ok(status) => status.exit_code() as i32,
|
||||
Err(_) => -1,
|
||||
};
|
||||
wait_exit_status.store(true, std::sync::atomic::Ordering::SeqCst);
|
||||
if let Ok(mut guard) = wait_exit_code.lock() {
|
||||
*guard = Some(code);
|
||||
}
|
||||
let _ = exit_tx.send(code);
|
||||
});
|
||||
|
||||
let (session, output_rx) = ExecCommandSession::new(
|
||||
writer_tx,
|
||||
output_tx,
|
||||
killer,
|
||||
reader_handle,
|
||||
writer_handle,
|
||||
wait_handle,
|
||||
exit_status,
|
||||
exit_code,
|
||||
);
|
||||
|
||||
Ok(SpawnedPty {
|
||||
session,
|
||||
output_rx,
|
||||
exit_rx,
|
||||
})
|
||||
}
|
||||
17
llmx-rs/utils/readiness/Cargo.toml
Normal file
17
llmx-rs/utils/readiness/Cargo.toml
Normal file
@@ -0,0 +1,17 @@
|
||||
[package]
|
||||
name = "llmx-utils-readiness"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
|
||||
[dependencies]
|
||||
async-trait = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
time = { workspace = true }
|
||||
tokio = { workspace = true, features = ["sync", "time"] }
|
||||
|
||||
[dev-dependencies]
|
||||
assert_matches = { workspace = true }
|
||||
tokio = { workspace = true, features = ["macros", "rt", "rt-multi-thread"] }
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
292
llmx-rs/utils/readiness/src/lib.rs
Normal file
292
llmx-rs/utils/readiness/src/lib.rs
Normal file
@@ -0,0 +1,292 @@
|
||||
//! Readiness flag with token-based authorization and async waiting (Tokio).
|
||||
|
||||
use std::collections::HashSet;
|
||||
use std::fmt;
|
||||
use std::sync::atomic::AtomicBool;
|
||||
use std::sync::atomic::AtomicI32;
|
||||
use std::sync::atomic::Ordering;
|
||||
use std::time::Duration;
|
||||
|
||||
use tokio::sync::Mutex;
|
||||
use tokio::sync::watch;
|
||||
use tokio::time;
|
||||
|
||||
/// Opaque subscription token returned by `subscribe()`.
|
||||
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
|
||||
pub struct Token(i32);
|
||||
|
||||
const LOCK_TIMEOUT: Duration = Duration::from_millis(1000);
|
||||
|
||||
#[async_trait::async_trait]
|
||||
pub trait Readiness: Send + Sync + 'static {
|
||||
/// Returns true if the flag is currently marked ready. At least one token needs to be marked
|
||||
/// as ready before.
|
||||
/// `true` is not reversible.
|
||||
fn is_ready(&self) -> bool;
|
||||
|
||||
/// Subscribe to readiness and receive an authorization token.
|
||||
///
|
||||
/// If the flag is already ready, returns `FlagAlreadyReady`.
|
||||
async fn subscribe(&self) -> Result<Token, errors::ReadinessError>;
|
||||
|
||||
/// Attempt to mark the flag ready, validated by the provided token.
|
||||
///
|
||||
/// Returns `true` iff:
|
||||
/// - `token` is currently subscribed, and
|
||||
/// - the flag was not already ready.
|
||||
async fn mark_ready(&self, token: Token) -> Result<bool, errors::ReadinessError>;
|
||||
|
||||
/// Asynchronously wait until the flag becomes ready.
|
||||
async fn wait_ready(&self);
|
||||
}
|
||||
|
||||
pub struct ReadinessFlag {
|
||||
/// Atomic for cheap reads.
|
||||
ready: AtomicBool,
|
||||
/// Used to generate the next i32 token.
|
||||
next_id: AtomicI32,
|
||||
/// Set of active subscriptions.
|
||||
tokens: Mutex<HashSet<Token>>,
|
||||
/// Broadcasts readiness to async waiters.
|
||||
tx: watch::Sender<bool>,
|
||||
}
|
||||
|
||||
impl ReadinessFlag {
|
||||
/// Create a new, not-yet-ready flag.
|
||||
pub fn new() -> Self {
|
||||
let (tx, _rx) = watch::channel(false);
|
||||
Self {
|
||||
ready: AtomicBool::new(false),
|
||||
next_id: AtomicI32::new(1), // Reserve 0.
|
||||
tokens: Mutex::new(HashSet::new()),
|
||||
tx,
|
||||
}
|
||||
}
|
||||
|
||||
async fn with_tokens<R>(
|
||||
&self,
|
||||
f: impl FnOnce(&mut HashSet<Token>) -> R,
|
||||
) -> Result<R, errors::ReadinessError> {
|
||||
let mut guard = time::timeout(LOCK_TIMEOUT, self.tokens.lock())
|
||||
.await
|
||||
.map_err(|_| errors::ReadinessError::TokenLockFailed)?;
|
||||
Ok(f(&mut guard))
|
||||
}
|
||||
|
||||
fn load_ready(&self) -> bool {
|
||||
self.ready.load(Ordering::Acquire)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ReadinessFlag {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Debug for ReadinessFlag {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.debug_struct("ReadinessFlag")
|
||||
.field("ready", &self.load_ready())
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl Readiness for ReadinessFlag {
|
||||
fn is_ready(&self) -> bool {
|
||||
if self.load_ready() {
|
||||
return true;
|
||||
}
|
||||
|
||||
if let Ok(tokens) = self.tokens.try_lock()
|
||||
&& tokens.is_empty()
|
||||
{
|
||||
let was_ready = self.ready.swap(true, Ordering::AcqRel);
|
||||
drop(tokens);
|
||||
if !was_ready {
|
||||
let _ = self.tx.send(true);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
self.load_ready()
|
||||
}
|
||||
|
||||
async fn subscribe(&self) -> Result<Token, errors::ReadinessError> {
|
||||
if self.load_ready() {
|
||||
return Err(errors::ReadinessError::FlagAlreadyReady);
|
||||
}
|
||||
|
||||
// Generate a token; ensure it's not 0.
|
||||
let token = Token(self.next_id.fetch_add(1, Ordering::Relaxed));
|
||||
|
||||
// Recheck readiness while holding the lock so mark_ready can't flip the flag between the
|
||||
// check above and inserting the token.
|
||||
let inserted = self
|
||||
.with_tokens(|tokens| {
|
||||
if self.load_ready() {
|
||||
return false;
|
||||
}
|
||||
tokens.insert(token);
|
||||
true
|
||||
})
|
||||
.await?;
|
||||
|
||||
if !inserted {
|
||||
return Err(errors::ReadinessError::FlagAlreadyReady);
|
||||
}
|
||||
|
||||
Ok(token)
|
||||
}
|
||||
|
||||
async fn mark_ready(&self, token: Token) -> Result<bool, errors::ReadinessError> {
|
||||
if self.load_ready() {
|
||||
return Ok(false);
|
||||
}
|
||||
if token.0 == 0 {
|
||||
return Ok(false); // Never authorize.
|
||||
}
|
||||
|
||||
let marked = self
|
||||
.with_tokens(|set| {
|
||||
if !set.remove(&token) {
|
||||
return false; // invalid or already used
|
||||
}
|
||||
self.ready.store(true, Ordering::Release);
|
||||
set.clear(); // no further tokens needed once ready
|
||||
true
|
||||
})
|
||||
.await?;
|
||||
if !marked {
|
||||
return Ok(false);
|
||||
}
|
||||
// Best-effort broadcast; ignore error if there are no receivers.
|
||||
let _ = self.tx.send(true);
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
async fn wait_ready(&self) {
|
||||
if self.is_ready() {
|
||||
return;
|
||||
}
|
||||
let mut rx = self.tx.subscribe();
|
||||
// Fast-path check before awaiting.
|
||||
if *rx.borrow() {
|
||||
return;
|
||||
}
|
||||
// Await changes until true is observed.
|
||||
while rx.changed().await.is_ok() {
|
||||
if *rx.borrow() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
mod errors {
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ReadinessError {
|
||||
#[error("Failed to acquire readiness token lock")]
|
||||
TokenLockFailed,
|
||||
#[error("Flag is already ready. Impossible to subscribe")]
|
||||
FlagAlreadyReady,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use std::sync::Arc;
|
||||
|
||||
use super::Readiness;
|
||||
use super::ReadinessFlag;
|
||||
use super::Token;
|
||||
use super::errors::ReadinessError;
|
||||
use assert_matches::assert_matches;
|
||||
|
||||
#[tokio::test]
|
||||
async fn subscribe_and_mark_ready_roundtrip() -> Result<(), ReadinessError> {
|
||||
let flag = ReadinessFlag::new();
|
||||
let token = flag.subscribe().await?;
|
||||
|
||||
assert!(flag.mark_ready(token).await?);
|
||||
assert!(flag.is_ready());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn subscribe_after_ready_returns_none() -> Result<(), ReadinessError> {
|
||||
let flag = ReadinessFlag::new();
|
||||
let token = flag.subscribe().await?;
|
||||
assert!(flag.mark_ready(token).await?);
|
||||
|
||||
assert!(flag.subscribe().await.is_err());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn mark_ready_rejects_unknown_token() -> Result<(), ReadinessError> {
|
||||
let flag = ReadinessFlag::new();
|
||||
assert!(!flag.mark_ready(Token(42)).await?);
|
||||
assert!(!flag.load_ready());
|
||||
assert!(flag.is_ready());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn wait_ready_unblocks_after_mark_ready() -> Result<(), ReadinessError> {
|
||||
let flag = Arc::new(ReadinessFlag::new());
|
||||
let token = flag.subscribe().await?;
|
||||
|
||||
let waiter = {
|
||||
let flag = Arc::clone(&flag);
|
||||
tokio::spawn(async move {
|
||||
flag.wait_ready().await;
|
||||
})
|
||||
};
|
||||
|
||||
assert!(flag.mark_ready(token).await?);
|
||||
waiter.await.expect("waiting task should not panic");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn mark_ready_twice_uses_single_token() -> Result<(), ReadinessError> {
|
||||
let flag = ReadinessFlag::new();
|
||||
let token = flag.subscribe().await?;
|
||||
|
||||
assert!(flag.mark_ready(token).await?);
|
||||
assert!(!flag.mark_ready(token).await?);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn is_ready_without_subscribers_marks_flag_ready() -> Result<(), ReadinessError> {
|
||||
let flag = ReadinessFlag::new();
|
||||
|
||||
assert!(flag.is_ready());
|
||||
assert!(flag.is_ready());
|
||||
assert_matches!(
|
||||
flag.subscribe().await,
|
||||
Err(ReadinessError::FlagAlreadyReady)
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn subscribe_returns_error_when_lock_is_held() {
|
||||
let flag = ReadinessFlag::new();
|
||||
let _guard = flag
|
||||
.tokens
|
||||
.try_lock()
|
||||
.expect("initial lock acquisition should succeed");
|
||||
|
||||
let err = flag
|
||||
.subscribe()
|
||||
.await
|
||||
.expect_err("contended subscribe should report a lock failure");
|
||||
assert_matches!(err, ReadinessError::TokenLockFailed);
|
||||
}
|
||||
}
|
||||
7
llmx-rs/utils/string/Cargo.toml
Normal file
7
llmx-rs/utils/string/Cargo.toml
Normal file
@@ -0,0 +1,7 @@
|
||||
[package]
|
||||
edition.workspace = true
|
||||
name = "llmx-utils-string"
|
||||
version.workspace = true
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
38
llmx-rs/utils/string/src/lib.rs
Normal file
38
llmx-rs/utils/string/src/lib.rs
Normal file
@@ -0,0 +1,38 @@
|
||||
// Truncate a &str to a byte budget at a char boundary (prefix)
|
||||
#[inline]
|
||||
pub fn take_bytes_at_char_boundary(s: &str, maxb: usize) -> &str {
|
||||
if s.len() <= maxb {
|
||||
return s;
|
||||
}
|
||||
let mut last_ok = 0;
|
||||
for (i, ch) in s.char_indices() {
|
||||
let nb = i + ch.len_utf8();
|
||||
if nb > maxb {
|
||||
break;
|
||||
}
|
||||
last_ok = nb;
|
||||
}
|
||||
&s[..last_ok]
|
||||
}
|
||||
|
||||
// Take a suffix of a &str within a byte budget at a char boundary
|
||||
#[inline]
|
||||
pub fn take_last_bytes_at_char_boundary(s: &str, maxb: usize) -> &str {
|
||||
if s.len() <= maxb {
|
||||
return s;
|
||||
}
|
||||
let mut start = s.len();
|
||||
let mut used = 0usize;
|
||||
for (i, ch) in s.char_indices().rev() {
|
||||
let nb = ch.len_utf8();
|
||||
if used + nb > maxb {
|
||||
break;
|
||||
}
|
||||
start = i;
|
||||
used += nb;
|
||||
if start == 0 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
&s[start..]
|
||||
}
|
||||
15
llmx-rs/utils/tokenizer/Cargo.toml
Normal file
15
llmx-rs/utils/tokenizer/Cargo.toml
Normal file
@@ -0,0 +1,15 @@
|
||||
[package]
|
||||
edition.workspace = true
|
||||
name = "llmx-utils-tokenizer"
|
||||
version.workspace = true
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
[dependencies]
|
||||
anyhow = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
tiktoken-rs = "0.7"
|
||||
|
||||
[dev-dependencies]
|
||||
pretty_assertions = { workspace = true }
|
||||
161
llmx-rs/utils/tokenizer/src/lib.rs
Normal file
161
llmx-rs/utils/tokenizer/src/lib.rs
Normal file
@@ -0,0 +1,161 @@
|
||||
use std::fmt;
|
||||
|
||||
use anyhow::Context;
|
||||
use anyhow::Error as AnyhowError;
|
||||
use thiserror::Error;
|
||||
use tiktoken_rs::CoreBPE;
|
||||
|
||||
/// Supported local encodings.
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
|
||||
pub enum EncodingKind {
|
||||
O200kBase,
|
||||
Cl100kBase,
|
||||
}
|
||||
|
||||
impl fmt::Display for EncodingKind {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
match self {
|
||||
Self::O200kBase => f.write_str("o200k_base"),
|
||||
Self::Cl100kBase => f.write_str("cl100k_base"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Tokenizer error type.
|
||||
#[derive(Debug, Error)]
|
||||
pub enum TokenizerError {
|
||||
#[error("failed to load encoding {kind}")]
|
||||
LoadEncoding {
|
||||
kind: EncodingKind,
|
||||
#[source]
|
||||
source: AnyhowError,
|
||||
},
|
||||
#[error("failed to decode tokens")]
|
||||
Decode {
|
||||
#[source]
|
||||
source: AnyhowError,
|
||||
},
|
||||
}
|
||||
|
||||
/// Thin wrapper around a `tiktoken_rs::CoreBPE` tokenizer.
|
||||
#[derive(Clone)]
|
||||
pub struct Tokenizer {
|
||||
inner: CoreBPE,
|
||||
}
|
||||
|
||||
impl Tokenizer {
|
||||
/// Build a tokenizer for a specific encoding.
|
||||
pub fn new(kind: EncodingKind) -> Result<Self, TokenizerError> {
|
||||
let loader: fn() -> anyhow::Result<CoreBPE> = match kind {
|
||||
EncodingKind::O200kBase => tiktoken_rs::o200k_base,
|
||||
EncodingKind::Cl100kBase => tiktoken_rs::cl100k_base,
|
||||
};
|
||||
|
||||
let inner = loader().map_err(|source| TokenizerError::LoadEncoding { kind, source })?;
|
||||
Ok(Self { inner })
|
||||
}
|
||||
|
||||
/// Default to `O200kBase`
|
||||
pub fn try_default() -> Result<Self, TokenizerError> {
|
||||
Self::new(EncodingKind::O200kBase)
|
||||
}
|
||||
|
||||
/// Build a tokenizer using an `OpenAI` model name (maps to an encoding).
|
||||
/// Falls back to the `O200kBase` encoding when the model is unknown.
|
||||
pub fn for_model(model: &str) -> Result<Self, TokenizerError> {
|
||||
match tiktoken_rs::get_bpe_from_model(model) {
|
||||
Ok(inner) => Ok(Self { inner }),
|
||||
Err(model_error) => {
|
||||
let inner = tiktoken_rs::o200k_base()
|
||||
.with_context(|| {
|
||||
format!("fallback after model lookup failure for {model}: {model_error}")
|
||||
})
|
||||
.map_err(|source| TokenizerError::LoadEncoding {
|
||||
kind: EncodingKind::O200kBase,
|
||||
source,
|
||||
})?;
|
||||
Ok(Self { inner })
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Encode text to token IDs. If `with_special_tokens` is true, special
|
||||
/// tokens are allowed and may appear in the result.
|
||||
#[must_use]
|
||||
pub fn encode(&self, text: &str, with_special_tokens: bool) -> Vec<i32> {
|
||||
let raw = if with_special_tokens {
|
||||
self.inner.encode_with_special_tokens(text)
|
||||
} else {
|
||||
self.inner.encode_ordinary(text)
|
||||
};
|
||||
raw.into_iter().map(|t| t as i32).collect()
|
||||
}
|
||||
|
||||
/// Count tokens in `text` as a signed integer.
|
||||
#[must_use]
|
||||
pub fn count(&self, text: &str) -> i64 {
|
||||
// Signed length to satisfy our style preference.
|
||||
i64::try_from(self.inner.encode_ordinary(text).len()).unwrap_or(i64::MAX)
|
||||
}
|
||||
|
||||
/// Decode token IDs back to text.
|
||||
pub fn decode(&self, tokens: &[i32]) -> Result<String, TokenizerError> {
|
||||
let raw: Vec<u32> = tokens.iter().map(|t| *t as u32).collect();
|
||||
self.inner
|
||||
.decode(raw)
|
||||
.map_err(|source| TokenizerError::Decode { source })
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use pretty_assertions::assert_eq;
|
||||
|
||||
#[test]
|
||||
fn cl100k_base_roundtrip_simple() -> Result<(), TokenizerError> {
|
||||
let tok = Tokenizer::new(EncodingKind::Cl100kBase)?;
|
||||
let s = "hello world";
|
||||
let ids = tok.encode(s, false);
|
||||
// Stable expectation for cl100k_base
|
||||
assert_eq!(ids, vec![15339, 1917]);
|
||||
let back = tok.decode(&ids)?;
|
||||
assert_eq!(back, s);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn preserves_whitespace_and_special_tokens_flag() -> Result<(), TokenizerError> {
|
||||
let tok = Tokenizer::new(EncodingKind::Cl100kBase)?;
|
||||
let s = "This has multiple spaces";
|
||||
let ids_no_special = tok.encode(s, false);
|
||||
let round = tok.decode(&ids_no_special)?;
|
||||
assert_eq!(round, s);
|
||||
|
||||
// With special tokens allowed, result may be identical for normal text,
|
||||
// but the API should still function.
|
||||
let ids_with_special = tok.encode(s, true);
|
||||
let round2 = tok.decode(&ids_with_special)?;
|
||||
assert_eq!(round2, s);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn model_mapping_builds_tokenizer() -> Result<(), TokenizerError> {
|
||||
// Choose a long-standing model alias that maps to cl100k_base.
|
||||
let tok = Tokenizer::for_model("gpt-5")?;
|
||||
let ids = tok.encode("ok", false);
|
||||
let back = tok.decode(&ids)?;
|
||||
assert_eq!(back, "ok");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_model_defaults_to_o200k_base() -> Result<(), TokenizerError> {
|
||||
let fallback = Tokenizer::new(EncodingKind::O200kBase)?;
|
||||
let tok = Tokenizer::for_model("does-not-exist")?;
|
||||
let text = "fallback please";
|
||||
assert_eq!(tok.encode(text, false), fallback.encode(text, false));
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user