feat: Complete LLMX v0.1.0 - Rebrand from Codex with LiteLLM Integration

This release represents a comprehensive transformation of the codebase from Codex to LLMX,
enhanced with LiteLLM integration to support 100+ LLM providers through a unified API.

## Major Changes

### Phase 1: Repository & Infrastructure Setup
- Established new repository structure and branching strategy
- Created comprehensive project documentation (CLAUDE.md, LITELLM-SETUP.md)
- Set up development environment and tooling configuration

### Phase 2: Rust Workspace Transformation
- Renamed all Rust crates from `codex-*` to `llmx-*` (30+ crates)
- Updated package names, binary names, and workspace members
- Renamed core modules: codex.rs → llmx.rs, codex_delegate.rs → llmx_delegate.rs
- Updated all internal references, imports, and type names
- Renamed directories: codex-rs/ → llmx-rs/, codex-backend-openapi-models/ → llmx-backend-openapi-models/
- Fixed all Rust compilation errors after mass rename

### Phase 3: LiteLLM Integration
- Integrated LiteLLM for multi-provider LLM support (Anthropic, OpenAI, Azure, Google AI, AWS Bedrock, etc.)
- Implemented OpenAI-compatible Chat Completions API support
- Added model family detection and provider-specific handling
- Updated authentication to support LiteLLM API keys
- Renamed environment variables: OPENAI_BASE_URL → LLMX_BASE_URL
- Added LLMX_API_KEY for unified authentication
- Enhanced error handling for Chat Completions API responses
- Implemented fallback mechanisms between Responses API and Chat Completions API

### Phase 4: TypeScript/Node.js Components
- Renamed npm package: @codex/codex-cli → @valknar/llmx
- Updated TypeScript SDK to use new LLMX APIs and endpoints
- Fixed all TypeScript compilation and linting errors
- Updated SDK tests to support both API backends
- Enhanced mock server to handle multiple API formats
- Updated build scripts for cross-platform packaging

### Phase 5: Configuration & Documentation
- Updated all configuration files to use LLMX naming
- Rewrote README and documentation for LLMX branding
- Updated config paths: ~/.codex/ → ~/.llmx/
- Added comprehensive LiteLLM setup guide
- Updated all user-facing strings and help text
- Created release plan and migration documentation

### Phase 6: Testing & Validation
- Fixed all Rust tests for new naming scheme
- Updated snapshot tests in TUI (36 frame files)
- Fixed authentication storage tests
- Updated Chat Completions payload and SSE tests
- Fixed SDK tests for new API endpoints
- Ensured compatibility with Claude Sonnet 4.5 model
- Fixed test environment variables (LLMX_API_KEY, LLMX_BASE_URL)

### Phase 7: Build & Release Pipeline
- Updated GitHub Actions workflows for LLMX binary names
- Fixed rust-release.yml to reference llmx-rs/ instead of codex-rs/
- Updated CI/CD pipelines for new package names
- Made Apple code signing optional in release workflow
- Enhanced npm packaging resilience for partial platform builds
- Added Windows sandbox support to workspace
- Updated dotslash configuration for new binary names

### Phase 8: Final Polish
- Renamed all assets (.github images, labels, templates)
- Updated VSCode and DevContainer configurations
- Fixed all clippy warnings and formatting issues
- Applied cargo fmt and prettier formatting across codebase
- Updated issue templates and pull request templates
- Fixed all remaining UI text references

## Technical Details

**Breaking Changes:**
- Binary name changed from `codex` to `llmx`
- Config directory changed from `~/.codex/` to `~/.llmx/`
- Environment variables renamed (CODEX_* → LLMX_*)
- npm package renamed to `@valknar/llmx`

**New Features:**
- Support for 100+ LLM providers via LiteLLM
- Unified authentication with LLMX_API_KEY
- Enhanced model provider detection and handling
- Improved error handling and fallback mechanisms

**Files Changed:**
- 578 files modified across Rust, TypeScript, and documentation
- 30+ Rust crates renamed and updated
- Complete rebrand of UI, CLI, and documentation
- All tests updated and passing

**Dependencies:**
- Updated Cargo.lock with new package names
- Updated npm dependencies in llmx-cli
- Enhanced OpenAPI models for LLMX backend

This release establishes LLMX as a standalone project with comprehensive LiteLLM
integration, maintaining full backward compatibility with existing functionality
while opening support for a wide ecosystem of LLM providers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Sebastian Krüger <support@pivoine.art>
This commit is contained in:
Sebastian Krüger
2025-11-12 20:40:44 +01:00
parent 052b052832
commit 3c7efc58c8
1248 changed files with 10085 additions and 9580 deletions

15
llmx-rs/utils/cache/Cargo.toml vendored Normal file
View File

@@ -0,0 +1,15 @@
[package]
name = "llmx-utils-cache"
version.workspace = true
edition.workspace = true
[lints]
workspace = true
[dependencies]
lru = { workspace = true }
sha1 = { workspace = true }
tokio = { workspace = true, features = ["sync", "rt"] }
[dev-dependencies]
tokio = { workspace = true, features = ["macros", "rt", "rt-multi-thread"] }

159
llmx-rs/utils/cache/src/lib.rs vendored Normal file
View File

@@ -0,0 +1,159 @@
use std::borrow::Borrow;
use std::hash::Hash;
use std::num::NonZeroUsize;
use lru::LruCache;
use sha1::Digest;
use sha1::Sha1;
use tokio::sync::Mutex;
use tokio::sync::MutexGuard;
/// A minimal LRU cache protected by a Tokio mutex.
pub struct BlockingLruCache<K, V> {
inner: Mutex<LruCache<K, V>>,
}
impl<K, V> BlockingLruCache<K, V>
where
K: Eq + Hash,
{
/// Creates a cache with the provided non-zero capacity.
#[must_use]
pub fn new(capacity: NonZeroUsize) -> Self {
Self {
inner: Mutex::new(LruCache::new(capacity)),
}
}
/// Returns a clone of the cached value for `key`, or computes and inserts it.
pub fn get_or_insert_with(&self, key: K, value: impl FnOnce() -> V) -> V
where
V: Clone,
{
let mut guard = lock_blocking(&self.inner);
if let Some(v) = guard.get(&key) {
return v.clone();
}
let v = value();
// Insert and return a clone to keep ownership in the cache.
guard.put(key, v.clone());
v
}
/// Like `get_or_insert_with`, but the value factory may fail.
pub fn get_or_try_insert_with<E>(
&self,
key: K,
value: impl FnOnce() -> Result<V, E>,
) -> Result<V, E>
where
V: Clone,
{
let mut guard = lock_blocking(&self.inner);
if let Some(v) = guard.get(&key) {
return Ok(v.clone());
}
let v = value()?;
guard.put(key, v.clone());
Ok(v)
}
/// Builds a cache if `capacity` is non-zero, returning `None` otherwise.
#[must_use]
pub fn try_with_capacity(capacity: usize) -> Option<Self> {
NonZeroUsize::new(capacity).map(Self::new)
}
/// Returns a clone of the cached value corresponding to `key`, if present.
pub fn get<Q>(&self, key: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: Hash + Eq + ?Sized,
V: Clone,
{
lock_blocking(&self.inner).get(key).cloned()
}
/// Inserts `value` for `key`, returning the previous entry if it existed.
pub fn insert(&self, key: K, value: V) -> Option<V> {
lock_blocking(&self.inner).put(key, value)
}
/// Removes the entry for `key` if it exists, returning it.
pub fn remove<Q>(&self, key: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: Hash + Eq + ?Sized,
{
lock_blocking(&self.inner).pop(key)
}
/// Clears all entries from the cache.
pub fn clear(&self) {
lock_blocking(&self.inner).clear();
}
/// Executes `callback` with a mutable reference to the underlying cache.
pub fn with_mut<R>(&self, callback: impl FnOnce(&mut LruCache<K, V>) -> R) -> R {
let mut guard = lock_blocking(&self.inner);
callback(&mut guard)
}
/// Provides direct access to the cache guard for advanced use cases.
pub fn blocking_lock(&self) -> MutexGuard<'_, LruCache<K, V>> {
lock_blocking(&self.inner)
}
}
fn lock_blocking<K, V>(m: &Mutex<LruCache<K, V>>) -> MutexGuard<'_, LruCache<K, V>>
where
K: Eq + Hash,
{
match tokio::runtime::Handle::try_current() {
Ok(_) => tokio::task::block_in_place(|| m.blocking_lock()),
Err(_) => m.blocking_lock(),
}
}
/// Computes the SHA-1 digest of `bytes`.
///
/// Useful for content-based cache keys when you want to avoid staleness
/// caused by path-only keys.
#[must_use]
pub fn sha1_digest(bytes: &[u8]) -> [u8; 20] {
let mut hasher = Sha1::new();
hasher.update(bytes);
let result = hasher.finalize();
let mut out = [0; 20];
out.copy_from_slice(&result);
out
}
#[cfg(test)]
mod tests {
use super::BlockingLruCache;
use std::num::NonZeroUsize;
#[tokio::test(flavor = "multi_thread")]
async fn stores_and_retrieves_values() {
let cache = BlockingLruCache::new(NonZeroUsize::new(2).expect("capacity"));
assert!(cache.get(&"first").is_none());
cache.insert("first", 1);
assert_eq!(cache.get(&"first"), Some(1));
}
#[tokio::test(flavor = "multi_thread")]
async fn evicts_least_recently_used() {
let cache = BlockingLruCache::new(NonZeroUsize::new(2).expect("capacity"));
cache.insert("a", 1);
cache.insert("b", 2);
assert_eq!(cache.get(&"a"), Some(1));
cache.insert("c", 3);
assert!(cache.get(&"b").is_none());
assert_eq!(cache.get(&"a"), Some(1));
assert_eq!(cache.get(&"c"), Some(3));
}
}

View File

@@ -0,0 +1,26 @@
[package]
name = "llmx-git"
version.workspace = true
edition.workspace = true
readme = "README.md"
[lints]
workspace = true
[dependencies]
once_cell = "1"
regex = "1"
schemars = { workspace = true }
serde = { workspace = true, features = ["derive"] }
tempfile = { workspace = true }
thiserror = { workspace = true }
ts-rs = { workspace = true, features = [
"uuid-impl",
"serde-json-impl",
"no-serde-warnings",
] }
walkdir = { workspace = true }
[dev-dependencies]
assert_matches = { workspace = true }
pretty_assertions = { workspace = true }

View File

@@ -0,0 +1,33 @@
# llmx-git
Helpers for interacting with git, including patch application and worktree
snapshot utilities.
```rust,no_run
use std::path::Path;
use llmx_git::{
apply_git_patch, create_ghost_commit, restore_ghost_commit, ApplyGitRequest,
CreateGhostCommitOptions,
};
let repo = Path::new("/path/to/repo");
// Apply a patch (omitted here) to the repository.
let request = ApplyGitRequest {
cwd: repo.to_path_buf(),
diff: String::from("...diff contents..."),
revert: false,
preflight: false,
};
let result = apply_git_patch(&request)?;
// Capture the current working tree as an unreferenced commit.
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
// Later, undo back to that state.
restore_ghost_commit(repo, &ghost)?;
```
Pass a custom message with `.message("…")` or force-include ignored files with
`.force_include(["ignored.log".into()])`.

View File

@@ -0,0 +1,715 @@
//! Helpers for applying unified diffs using the system `git` binary.
//!
//! The entry point is [`apply_git_patch`], which writes a diff to a temporary
//! file, shells out to `git apply` with the right flags, and then parses the
//! commands output into structured details. Callers can opt into dry-run
//! mode via [`ApplyGitRequest::preflight`] and inspect the resulting paths to
//! learn what would change before applying for real.
use once_cell::sync::Lazy;
use regex::Regex;
use std::ffi::OsStr;
use std::io;
use std::path::Path;
use std::path::PathBuf;
/// Parameters for invoking [`apply_git_patch`].
#[derive(Debug, Clone)]
pub struct ApplyGitRequest {
pub cwd: PathBuf,
pub diff: String,
pub revert: bool,
pub preflight: bool,
}
/// Result of running [`apply_git_patch`], including paths gleaned from stdout/stderr.
#[derive(Debug, Clone)]
pub struct ApplyGitResult {
pub exit_code: i32,
pub applied_paths: Vec<String>,
pub skipped_paths: Vec<String>,
pub conflicted_paths: Vec<String>,
pub stdout: String,
pub stderr: String,
pub cmd_for_log: String,
}
/// Apply a unified diff to the target repository by shelling out to `git apply`.
///
/// When [`ApplyGitRequest::preflight`] is `true`, this behaves like `git apply --check` and
/// leaves the working tree untouched while still parsing the command output for diagnostics.
pub fn apply_git_patch(req: &ApplyGitRequest) -> io::Result<ApplyGitResult> {
let git_root = resolve_git_root(&req.cwd)?;
// Write unified diff into a temporary file
let (tmpdir, patch_path) = write_temp_patch(&req.diff)?;
// Keep tmpdir alive until function end to ensure the file exists
let _guard = tmpdir;
if req.revert && !req.preflight {
// Stage WT paths first to avoid index mismatch on revert.
stage_paths(&git_root, &req.diff)?;
}
// Build git args
let mut args: Vec<String> = vec!["apply".into(), "--3way".into()];
if req.revert {
args.push("-R".into());
}
// Optional: additional git config via env knob (defaults OFF)
let mut cfg_parts: Vec<String> = Vec::new();
if let Ok(cfg) = std::env::var("LLMX_APPLY_GIT_CFG") {
for pair in cfg.split(',') {
let p = pair.trim();
if p.is_empty() || !p.contains('=') {
continue;
}
cfg_parts.push("-c".into());
cfg_parts.push(p.to_string());
}
}
args.push(patch_path.to_string_lossy().to_string());
// Optional preflight: dry-run only; do not modify working tree
if req.preflight {
let mut check_args = vec!["apply".to_string(), "--check".to_string()];
if req.revert {
check_args.push("-R".to_string());
}
check_args.push(patch_path.to_string_lossy().to_string());
let rendered = render_command_for_log(&git_root, &cfg_parts, &check_args);
let (c_code, c_out, c_err) = run_git(&git_root, &cfg_parts, &check_args)?;
let (mut applied_paths, mut skipped_paths, mut conflicted_paths) =
parse_git_apply_output(&c_out, &c_err);
applied_paths.sort();
applied_paths.dedup();
skipped_paths.sort();
skipped_paths.dedup();
conflicted_paths.sort();
conflicted_paths.dedup();
return Ok(ApplyGitResult {
exit_code: c_code,
applied_paths,
skipped_paths,
conflicted_paths,
stdout: c_out,
stderr: c_err,
cmd_for_log: rendered,
});
}
let cmd_for_log = render_command_for_log(&git_root, &cfg_parts, &args);
let (code, stdout, stderr) = run_git(&git_root, &cfg_parts, &args)?;
let (mut applied_paths, mut skipped_paths, mut conflicted_paths) =
parse_git_apply_output(&stdout, &stderr);
applied_paths.sort();
applied_paths.dedup();
skipped_paths.sort();
skipped_paths.dedup();
conflicted_paths.sort();
conflicted_paths.dedup();
Ok(ApplyGitResult {
exit_code: code,
applied_paths,
skipped_paths,
conflicted_paths,
stdout,
stderr,
cmd_for_log,
})
}
fn resolve_git_root(cwd: &Path) -> io::Result<PathBuf> {
let out = std::process::Command::new("git")
.arg("rev-parse")
.arg("--show-toplevel")
.current_dir(cwd)
.output()?;
let code = out.status.code().unwrap_or(-1);
if code != 0 {
return Err(io::Error::other(format!(
"not a git repository (exit {}): {}",
code,
String::from_utf8_lossy(&out.stderr)
)));
}
let root = String::from_utf8_lossy(&out.stdout).trim().to_string();
Ok(PathBuf::from(root))
}
fn write_temp_patch(diff: &str) -> io::Result<(tempfile::TempDir, PathBuf)> {
let dir = tempfile::tempdir()?;
let path = dir.path().join("patch.diff");
std::fs::write(&path, diff)?;
Ok((dir, path))
}
fn run_git(cwd: &Path, git_cfg: &[String], args: &[String]) -> io::Result<(i32, String, String)> {
let mut cmd = std::process::Command::new("git");
for p in git_cfg {
cmd.arg(p);
}
for a in args {
cmd.arg(a);
}
let out = cmd.current_dir(cwd).output()?;
let code = out.status.code().unwrap_or(-1);
let stdout = String::from_utf8_lossy(&out.stdout).into_owned();
let stderr = String::from_utf8_lossy(&out.stderr).into_owned();
Ok((code, stdout, stderr))
}
fn quote_shell(s: &str) -> String {
let simple = s
.chars()
.all(|c| c.is_ascii_alphanumeric() || "-_.:/@%+".contains(c));
if simple {
s.to_string()
} else {
format!("'{}'", s.replace('\'', "'\\''"))
}
}
fn render_command_for_log(cwd: &Path, git_cfg: &[String], args: &[String]) -> String {
let mut parts: Vec<String> = Vec::new();
parts.push("git".to_string());
for a in git_cfg {
parts.push(quote_shell(a));
}
for a in args {
parts.push(quote_shell(a));
}
format!(
"(cd {} && {})",
quote_shell(&cwd.display().to_string()),
parts.join(" ")
)
}
/// Collect every path referenced by the diff headers inside `diff --git` sections.
pub fn extract_paths_from_patch(diff_text: &str) -> Vec<String> {
static RE: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"(?m)^diff --git a/(.*?) b/(.*)$")
.unwrap_or_else(|e| panic!("invalid regex: {e}"))
});
let mut set = std::collections::BTreeSet::new();
for caps in RE.captures_iter(diff_text) {
if let Some(a) = caps.get(1).map(|m| m.as_str())
&& a != "/dev/null"
&& !a.trim().is_empty()
{
set.insert(a.to_string());
}
if let Some(b) = caps.get(2).map(|m| m.as_str())
&& b != "/dev/null"
&& !b.trim().is_empty()
{
set.insert(b.to_string());
}
}
set.into_iter().collect()
}
/// Stage only the files that actually exist on disk for the given diff.
pub fn stage_paths(git_root: &Path, diff: &str) -> io::Result<()> {
let paths = extract_paths_from_patch(diff);
let mut existing: Vec<String> = Vec::new();
for p in paths {
let joined = git_root.join(&p);
if std::fs::symlink_metadata(&joined).is_ok() {
existing.push(p);
}
}
if existing.is_empty() {
return Ok(());
}
let mut cmd = std::process::Command::new("git");
cmd.arg("add");
cmd.arg("--");
for p in &existing {
cmd.arg(OsStr::new(p));
}
let out = cmd.current_dir(git_root).output()?;
let _code = out.status.code().unwrap_or(-1);
// We do not hard fail staging; best-effort is OK. Return Ok even on non-zero.
Ok(())
}
// ============ Parser ported from VS Code (TS) ============
/// Parse `git apply` output into applied/skipped/conflicted path groupings.
pub fn parse_git_apply_output(
stdout: &str,
stderr: &str,
) -> (Vec<String>, Vec<String>, Vec<String>) {
let combined = [stdout, stderr]
.iter()
.filter(|s| !s.is_empty())
.cloned()
.collect::<Vec<&str>>()
.join("\n");
let mut applied = std::collections::BTreeSet::new();
let mut skipped = std::collections::BTreeSet::new();
let mut conflicted = std::collections::BTreeSet::new();
let mut last_seen_path: Option<String> = None;
fn add(set: &mut std::collections::BTreeSet<String>, raw: &str) {
let trimmed = raw.trim();
if trimmed.is_empty() {
return;
}
let first = trimmed.chars().next().unwrap_or('\0');
let last = trimmed.chars().last().unwrap_or('\0');
let unquoted = if (first == '"' || first == '\'') && last == first && trimmed.len() >= 2 {
&trimmed[1..trimmed.len() - 1]
} else {
trimmed
};
if !unquoted.is_empty() {
set.insert(unquoted.to_string());
}
}
static APPLIED_CLEAN: Lazy<Regex> =
Lazy::new(|| regex_ci("^Applied patch(?: to)?\\s+(?P<path>.+?)\\s+cleanly\\.?$"));
static APPLIED_CONFLICTS: Lazy<Regex> =
Lazy::new(|| regex_ci("^Applied patch(?: to)?\\s+(?P<path>.+?)\\s+with conflicts\\.?$"));
static APPLYING_WITH_REJECTS: Lazy<Regex> = Lazy::new(|| {
regex_ci("^Applying patch\\s+(?P<path>.+?)\\s+with\\s+\\d+\\s+rejects?\\.{0,3}$")
});
static CHECKING_PATCH: Lazy<Regex> =
Lazy::new(|| regex_ci("^Checking patch\\s+(?P<path>.+?)\\.\\.\\.$"));
static UNMERGED_LINE: Lazy<Regex> = Lazy::new(|| regex_ci("^U\\s+(?P<path>.+)$"));
static PATCH_FAILED: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+patch failed:\\s+(?P<path>.+?)(?::\\d+)?(?:\\s|$)"));
static DOES_NOT_APPLY: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+patch does not apply$"));
static THREE_WAY_START: Lazy<Regex> = Lazy::new(|| {
regex_ci("^(?:Performing three-way merge|Falling back to three-way merge)\\.\\.\\.$")
});
static THREE_WAY_FAILED: Lazy<Regex> =
Lazy::new(|| regex_ci("^Failed to perform three-way merge\\.\\.\\.$"));
static FALLBACK_DIRECT: Lazy<Regex> =
Lazy::new(|| regex_ci("^Falling back to direct application\\.\\.\\.$"));
static LACKS_BLOB: Lazy<Regex> = Lazy::new(|| {
regex_ci(
"^(?:error: )?repository lacks the necessary blob to (?:perform|fall back on) 3-?way merge\\.?$",
)
});
static INDEX_MISMATCH: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+does not match index\\b"));
static NOT_IN_INDEX: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+(?P<path>.+?):\\s+does not exist in index\\b"));
static ALREADY_EXISTS_WT: Lazy<Regex> = Lazy::new(|| {
regex_ci("^error:\\s+(?P<path>.+?)\\s+already exists in (?:the )?working directory\\b")
});
static FILE_EXISTS: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+patch failed:\\s+(?P<path>.+?)\\s+File exists"));
static RENAMED_DELETED: Lazy<Regex> =
Lazy::new(|| regex_ci("^error:\\s+path\\s+(?P<path>.+?)\\s+has been renamed\\/deleted"));
static CANNOT_APPLY_BINARY: Lazy<Regex> = Lazy::new(|| {
regex_ci(
"^error:\\s+cannot apply binary patch to\\s+['\\\"]?(?P<path>.+?)['\\\"]?\\s+without full index line$",
)
});
static BINARY_DOES_NOT_APPLY: Lazy<Regex> = Lazy::new(|| {
regex_ci("^error:\\s+binary patch does not apply to\\s+['\\\"]?(?P<path>.+?)['\\\"]?$")
});
static BINARY_INCORRECT_RESULT: Lazy<Regex> = Lazy::new(|| {
regex_ci(
"^error:\\s+binary patch to\\s+['\\\"]?(?P<path>.+?)['\\\"]?\\s+creates incorrect result\\b",
)
});
static CANNOT_READ_CURRENT: Lazy<Regex> = Lazy::new(|| {
regex_ci("^error:\\s+cannot read the current contents of\\s+['\\\"]?(?P<path>.+?)['\\\"]?$")
});
static SKIPPED_PATCH: Lazy<Regex> =
Lazy::new(|| regex_ci("^Skipped patch\\s+['\\\"]?(?P<path>.+?)['\\\"]\\.$"));
static CANNOT_MERGE_BINARY_WARN: Lazy<Regex> = Lazy::new(|| {
regex_ci(
"^warning:\\s*Cannot merge binary files:\\s+(?P<path>.+?)\\s+\\(ours\\s+vs\\.\\s+theirs\\)",
)
});
for raw_line in combined.lines() {
let line = raw_line.trim();
if line.is_empty() {
continue;
}
// === "Checking patch <path>..." tracking ===
if let Some(c) = CHECKING_PATCH.captures(line) {
if let Some(m) = c.name("path") {
last_seen_path = Some(m.as_str().to_string());
}
continue;
}
// === Status lines ===
if let Some(c) = APPLIED_CLEAN.captures(line) {
if let Some(m) = c.name("path") {
add(&mut applied, m.as_str());
let p = applied.iter().next_back().cloned();
if let Some(p) = p {
conflicted.remove(&p);
skipped.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
if let Some(c) = APPLIED_CONFLICTS.captures(line) {
if let Some(m) = c.name("path") {
add(&mut conflicted, m.as_str());
let p = conflicted.iter().next_back().cloned();
if let Some(p) = p {
applied.remove(&p);
skipped.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
if let Some(c) = APPLYING_WITH_REJECTS.captures(line) {
if let Some(m) = c.name("path") {
add(&mut conflicted, m.as_str());
let p = conflicted.iter().next_back().cloned();
if let Some(p) = p {
applied.remove(&p);
skipped.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
// === “U <path>” after conflicts ===
if let Some(c) = UNMERGED_LINE.captures(line) {
if let Some(m) = c.name("path") {
add(&mut conflicted, m.as_str());
let p = conflicted.iter().next_back().cloned();
if let Some(p) = p {
applied.remove(&p);
skipped.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
// === Early hints ===
if PATCH_FAILED.is_match(line) || DOES_NOT_APPLY.is_match(line) {
if let Some(c) = PATCH_FAILED
.captures(line)
.or_else(|| DOES_NOT_APPLY.captures(line))
&& let Some(m) = c.name("path")
{
add(&mut skipped, m.as_str());
last_seen_path = Some(m.as_str().to_string());
}
continue;
}
// === Ignore narration ===
if THREE_WAY_START.is_match(line) || FALLBACK_DIRECT.is_match(line) {
continue;
}
// === 3-way failed entirely; attribute to last_seen_path ===
if THREE_WAY_FAILED.is_match(line) || LACKS_BLOB.is_match(line) {
if let Some(p) = last_seen_path.clone() {
add(&mut skipped, &p);
applied.remove(&p);
conflicted.remove(&p);
}
continue;
}
// === Skips / I/O problems ===
if let Some(c) = INDEX_MISMATCH
.captures(line)
.or_else(|| NOT_IN_INDEX.captures(line))
.or_else(|| ALREADY_EXISTS_WT.captures(line))
.or_else(|| FILE_EXISTS.captures(line))
.or_else(|| RENAMED_DELETED.captures(line))
.or_else(|| CANNOT_APPLY_BINARY.captures(line))
.or_else(|| BINARY_DOES_NOT_APPLY.captures(line))
.or_else(|| BINARY_INCORRECT_RESULT.captures(line))
.or_else(|| CANNOT_READ_CURRENT.captures(line))
.or_else(|| SKIPPED_PATCH.captures(line))
{
if let Some(m) = c.name("path") {
add(&mut skipped, m.as_str());
let p_now = skipped.iter().next_back().cloned();
if let Some(p) = p_now {
applied.remove(&p);
conflicted.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
// === Warnings that imply conflicts ===
if let Some(c) = CANNOT_MERGE_BINARY_WARN.captures(line) {
if let Some(m) = c.name("path") {
add(&mut conflicted, m.as_str());
let p = conflicted.iter().next_back().cloned();
if let Some(p) = p {
applied.remove(&p);
skipped.remove(&p);
last_seen_path = Some(p);
}
}
continue;
}
}
// Final precedence: conflicts > applied > skipped
for p in conflicted.iter() {
applied.remove(p);
skipped.remove(p);
}
for p in applied.iter() {
skipped.remove(p);
}
(
applied.into_iter().collect(),
skipped.into_iter().collect(),
conflicted.into_iter().collect(),
)
}
fn regex_ci(pat: &str) -> Regex {
Regex::new(&format!("(?i){pat}")).unwrap_or_else(|e| panic!("invalid regex: {e}"))
}
#[cfg(test)]
mod tests {
use super::*;
use std::path::Path;
use std::sync::Mutex;
use std::sync::OnceLock;
fn env_lock() -> &'static Mutex<()> {
static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
LOCK.get_or_init(|| Mutex::new(()))
}
fn run(cwd: &Path, args: &[&str]) -> (i32, String, String) {
let out = std::process::Command::new(args[0])
.args(&args[1..])
.current_dir(cwd)
.output()
.expect("spawn ok");
(
out.status.code().unwrap_or(-1),
String::from_utf8_lossy(&out.stdout).into_owned(),
String::from_utf8_lossy(&out.stderr).into_owned(),
)
}
fn init_repo() -> tempfile::TempDir {
let dir = tempfile::tempdir().expect("tempdir");
let root = dir.path();
// git init and minimal identity
let _ = run(root, &["git", "init"]);
let _ = run(root, &["git", "config", "user.email", "llmx@example.com"]);
let _ = run(root, &["git", "config", "user.name", "Llmx"]);
dir
}
fn read_file_normalized(path: &Path) -> String {
std::fs::read_to_string(path)
.expect("read file")
.replace("\r\n", "\n")
}
#[test]
fn apply_add_success() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
let diff = "diff --git a/hello.txt b/hello.txt\nnew file mode 100644\n--- /dev/null\n+++ b/hello.txt\n@@ -0,0 +1,2 @@\n+hello\n+world\n";
let req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let r = apply_git_patch(&req).expect("run apply");
assert_eq!(r.exit_code, 0, "exit code 0");
// File exists now
assert!(root.join("hello.txt").exists());
}
#[test]
fn apply_modify_conflict() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
// seed file and commit
std::fs::write(root.join("file.txt"), "line1\nline2\nline3\n").unwrap();
let _ = run(root, &["git", "add", "file.txt"]);
let _ = run(root, &["git", "commit", "-m", "seed"]);
// local edit (unstaged)
std::fs::write(root.join("file.txt"), "line1\nlocal2\nline3\n").unwrap();
// patch wants to change the same line differently
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,3 +1,3 @@\n line1\n-line2\n+remote2\n line3\n";
let req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let r = apply_git_patch(&req).expect("run apply");
assert_ne!(r.exit_code, 0, "non-zero exit on conflict");
}
#[test]
fn apply_modify_skipped_missing_index() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
// Try to modify a file that is not in the index
let diff = "diff --git a/ghost.txt b/ghost.txt\n--- a/ghost.txt\n+++ b/ghost.txt\n@@ -1,1 +1,1 @@\n-old\n+new\n";
let req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let r = apply_git_patch(&req).expect("run apply");
assert_ne!(r.exit_code, 0, "non-zero exit on missing index");
}
#[test]
fn apply_then_revert_success() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
// Seed file and commit original content
std::fs::write(root.join("file.txt"), "orig\n").unwrap();
let _ = run(root, &["git", "add", "file.txt"]);
let _ = run(root, &["git", "commit", "-m", "seed"]);
// Forward patch: orig -> ORIG
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,1 +1,1 @@\n-orig\n+ORIG\n";
let apply_req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let res_apply = apply_git_patch(&apply_req).expect("apply ok");
assert_eq!(res_apply.exit_code, 0, "forward apply succeeded");
let after_apply = read_file_normalized(&root.join("file.txt"));
assert_eq!(after_apply, "ORIG\n");
// Revert patch: ORIG -> orig (stage paths first; engine handles it)
let revert_req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: true,
preflight: false,
};
let res_revert = apply_git_patch(&revert_req).expect("revert ok");
assert_eq!(res_revert.exit_code, 0, "revert apply succeeded");
let after_revert = read_file_normalized(&root.join("file.txt"));
assert_eq!(after_revert, "orig\n");
}
#[test]
fn revert_preflight_does_not_stage_index() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
// Seed repo and apply forward patch so the working tree reflects the change.
std::fs::write(root.join("file.txt"), "orig\n").unwrap();
let _ = run(root, &["git", "add", "file.txt"]);
let _ = run(root, &["git", "commit", "-m", "seed"]);
let diff = "diff --git a/file.txt b/file.txt\n--- a/file.txt\n+++ b/file.txt\n@@ -1,1 +1,1 @@\n-orig\n+ORIG\n";
let apply_req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let res_apply = apply_git_patch(&apply_req).expect("apply ok");
assert_eq!(res_apply.exit_code, 0, "forward apply succeeded");
let (commit_code, _, commit_err) = run(root, &["git", "commit", "-am", "apply change"]);
assert_eq!(commit_code, 0, "commit applied change: {commit_err}");
let (_code_before, staged_before, _stderr_before) =
run(root, &["git", "diff", "--cached", "--name-only"]);
let preflight_req = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: true,
preflight: true,
};
let res_preflight = apply_git_patch(&preflight_req).expect("preflight ok");
assert_eq!(res_preflight.exit_code, 0, "revert preflight succeeded");
let (_code_after, staged_after, _stderr_after) =
run(root, &["git", "diff", "--cached", "--name-only"]);
assert_eq!(
staged_after.trim(),
staged_before.trim(),
"preflight should not stage new paths",
);
let after_preflight = read_file_normalized(&root.join("file.txt"));
assert_eq!(after_preflight, "ORIG\n");
}
#[test]
fn preflight_blocks_partial_changes() {
let _g = env_lock().lock().unwrap();
let repo = init_repo();
let root = repo.path();
// Build a multi-file diff: one valid add (ok.txt) and one invalid modify (ghost.txt)
let diff = "diff --git a/ok.txt b/ok.txt\nnew file mode 100644\n--- /dev/null\n+++ b/ok.txt\n@@ -0,0 +1,2 @@\n+alpha\n+beta\n\n\
diff --git a/ghost.txt b/ghost.txt\n--- a/ghost.txt\n+++ b/ghost.txt\n@@ -1,1 +1,1 @@\n-old\n+new\n";
// 1) With preflight enabled, nothing should be changed (even though ok.txt could be added)
let req1 = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: true,
};
let r1 = apply_git_patch(&req1).expect("preflight apply");
assert_ne!(r1.exit_code, 0, "preflight reports failure");
assert!(
!root.join("ok.txt").exists(),
"preflight must prevent adding ok.txt"
);
assert!(
r1.cmd_for_log.contains("--check"),
"preflight path recorded --check"
);
// 2) Without preflight, we should see no --check in the executed command
let req2 = ApplyGitRequest {
cwd: root.to_path_buf(),
diff: diff.to_string(),
revert: false,
preflight: false,
};
let r2 = apply_git_patch(&req2).expect("direct apply");
assert_ne!(r2.exit_code, 0, "apply is expected to fail overall");
assert!(
!r2.cmd_for_log.contains("--check"),
"non-preflight path should not use --check"
);
}
}

View File

@@ -0,0 +1,35 @@
use std::path::PathBuf;
use std::process::ExitStatus;
use std::string::FromUtf8Error;
use thiserror::Error;
use walkdir::Error as WalkdirError;
/// Errors returned while managing git worktree snapshots.
#[derive(Debug, Error)]
pub enum GitToolingError {
#[error("git command `{command}` failed with status {status}: {stderr}")]
GitCommand {
command: String,
status: ExitStatus,
stderr: String,
},
#[error("git command `{command}` produced non-UTF-8 output")]
GitOutputUtf8 {
command: String,
#[source]
source: FromUtf8Error,
},
#[error("{path:?} is not a git repository")]
NotAGitRepository { path: PathBuf },
#[error("path {path:?} must be relative to the repository root")]
NonRelativePath { path: PathBuf },
#[error("path {path:?} escapes the repository root")]
PathEscapesRepository { path: PathBuf },
#[error("failed to process path inside worktree")]
PathPrefix(#[from] std::path::StripPrefixError),
#[error(transparent)]
Walkdir(#[from] WalkdirError),
#[error(transparent)]
Io(#[from] std::io::Error),
}

View File

@@ -0,0 +1,709 @@
use std::collections::HashSet;
use std::ffi::OsString;
use std::fs;
use std::io;
use std::path::Path;
use std::path::PathBuf;
use tempfile::Builder;
use crate::GhostCommit;
use crate::GitToolingError;
use crate::operations::apply_repo_prefix_to_force_include;
use crate::operations::ensure_git_repository;
use crate::operations::normalize_relative_path;
use crate::operations::repo_subdir;
use crate::operations::resolve_head;
use crate::operations::resolve_repository_root;
use crate::operations::run_git_for_status;
use crate::operations::run_git_for_stdout;
use crate::operations::run_git_for_stdout_all;
/// Default commit message used for ghost commits when none is provided.
const DEFAULT_COMMIT_MESSAGE: &str = "llmx snapshot";
/// Options to control ghost commit creation.
pub struct CreateGhostCommitOptions<'a> {
pub repo_path: &'a Path,
pub message: Option<&'a str>,
pub force_include: Vec<PathBuf>,
}
impl<'a> CreateGhostCommitOptions<'a> {
/// Creates options scoped to the provided repository path.
pub fn new(repo_path: &'a Path) -> Self {
Self {
repo_path,
message: None,
force_include: Vec::new(),
}
}
/// Sets a custom commit message for the ghost commit.
pub fn message(mut self, message: &'a str) -> Self {
self.message = Some(message);
self
}
/// Supplies the entire force-include path list at once.
pub fn force_include<I>(mut self, paths: I) -> Self
where
I: IntoIterator<Item = PathBuf>,
{
self.force_include = paths.into_iter().collect();
self
}
/// Adds a single path to the force-include list.
pub fn push_force_include<P>(mut self, path: P) -> Self
where
P: Into<PathBuf>,
{
self.force_include.push(path.into());
self
}
}
/// Create a ghost commit capturing the current state of the repository's working tree.
pub fn create_ghost_commit(
options: &CreateGhostCommitOptions<'_>,
) -> Result<GhostCommit, GitToolingError> {
ensure_git_repository(options.repo_path)?;
let repo_root = resolve_repository_root(options.repo_path)?;
let repo_prefix = repo_subdir(repo_root.as_path(), options.repo_path);
let parent = resolve_head(repo_root.as_path())?;
let existing_untracked =
capture_existing_untracked(repo_root.as_path(), repo_prefix.as_deref())?;
let normalized_force = options
.force_include
.iter()
.map(|path| normalize_relative_path(path))
.collect::<Result<Vec<_>, _>>()?;
let force_include =
apply_repo_prefix_to_force_include(repo_prefix.as_deref(), &normalized_force);
let index_tempdir = Builder::new().prefix("llmx-git-index-").tempdir()?;
let index_path = index_tempdir.path().join("index");
let base_env = vec![(
OsString::from("GIT_INDEX_FILE"),
OsString::from(index_path.as_os_str()),
)];
// Pre-populate the temporary index with HEAD so unchanged tracked files
// are included in the snapshot tree.
if let Some(parent_sha) = parent.as_deref() {
run_git_for_status(
repo_root.as_path(),
vec![OsString::from("read-tree"), OsString::from(parent_sha)],
Some(base_env.as_slice()),
)?;
}
let mut add_args = vec![OsString::from("add"), OsString::from("--all")];
if let Some(prefix) = repo_prefix.as_deref() {
add_args.extend([OsString::from("--"), prefix.as_os_str().to_os_string()]);
}
run_git_for_status(repo_root.as_path(), add_args, Some(base_env.as_slice()))?;
if !force_include.is_empty() {
let mut args = Vec::with_capacity(force_include.len() + 2);
args.push(OsString::from("add"));
args.push(OsString::from("--force"));
args.extend(
force_include
.iter()
.map(|path| OsString::from(path.as_os_str())),
);
run_git_for_status(repo_root.as_path(), args, Some(base_env.as_slice()))?;
}
let tree_id = run_git_for_stdout(
repo_root.as_path(),
vec![OsString::from("write-tree")],
Some(base_env.as_slice()),
)?;
let mut commit_env = base_env;
commit_env.extend(default_commit_identity());
let message = options.message.unwrap_or(DEFAULT_COMMIT_MESSAGE);
let commit_args = {
let mut result = vec![OsString::from("commit-tree"), OsString::from(&tree_id)];
if let Some(parent) = parent.as_deref() {
result.extend([OsString::from("-p"), OsString::from(parent)]);
}
result.extend([OsString::from("-m"), OsString::from(message)]);
result
};
// Retrieve commit ID.
let commit_id = run_git_for_stdout(
repo_root.as_path(),
commit_args,
Some(commit_env.as_slice()),
)?;
Ok(GhostCommit::new(
commit_id,
parent,
existing_untracked.files,
existing_untracked.dirs,
))
}
/// Restore the working tree to match the provided ghost commit.
pub fn restore_ghost_commit(repo_path: &Path, commit: &GhostCommit) -> Result<(), GitToolingError> {
ensure_git_repository(repo_path)?;
let repo_root = resolve_repository_root(repo_path)?;
let repo_prefix = repo_subdir(repo_root.as_path(), repo_path);
let current_untracked =
capture_existing_untracked(repo_root.as_path(), repo_prefix.as_deref())?;
restore_to_commit_inner(repo_root.as_path(), repo_prefix.as_deref(), commit.id())?;
remove_new_untracked(
repo_root.as_path(),
commit.preexisting_untracked_files(),
commit.preexisting_untracked_dirs(),
current_untracked,
)
}
/// Restore the working tree to match the given commit ID.
pub fn restore_to_commit(repo_path: &Path, commit_id: &str) -> Result<(), GitToolingError> {
ensure_git_repository(repo_path)?;
let repo_root = resolve_repository_root(repo_path)?;
let repo_prefix = repo_subdir(repo_root.as_path(), repo_path);
restore_to_commit_inner(repo_root.as_path(), repo_prefix.as_deref(), commit_id)
}
/// Restores the working tree and index to the given commit using `git restore`.
/// The repository root and optional repository-relative prefix limit the restore scope.
fn restore_to_commit_inner(
repo_root: &Path,
repo_prefix: Option<&Path>,
commit_id: &str,
) -> Result<(), GitToolingError> {
let mut restore_args = vec![
OsString::from("restore"),
OsString::from("--source"),
OsString::from(commit_id),
OsString::from("--worktree"),
OsString::from("--staged"),
OsString::from("--"),
];
if let Some(prefix) = repo_prefix {
restore_args.push(prefix.as_os_str().to_os_string());
} else {
restore_args.push(OsString::from("."));
}
run_git_for_status(repo_root, restore_args, None)?;
Ok(())
}
#[derive(Default)]
struct UntrackedSnapshot {
files: Vec<PathBuf>,
dirs: Vec<PathBuf>,
}
/// Captures the untracked and ignored entries under `repo_root`, optionally limited by `repo_prefix`.
/// Returns the result as an `UntrackedSnapshot`.
fn capture_existing_untracked(
repo_root: &Path,
repo_prefix: Option<&Path>,
) -> Result<UntrackedSnapshot, GitToolingError> {
// Ask git for the zero-delimited porcelain status so we can enumerate
// every untracked or ignored path (including ones filtered by prefix).
let mut args = vec![
OsString::from("status"),
OsString::from("--porcelain=2"),
OsString::from("-z"),
OsString::from("--ignored=matching"),
OsString::from("--untracked-files=all"),
];
if let Some(prefix) = repo_prefix {
args.push(OsString::from("--"));
args.push(prefix.as_os_str().to_os_string());
}
let output = run_git_for_stdout_all(repo_root, args, None)?;
if output.is_empty() {
return Ok(UntrackedSnapshot::default());
}
let mut snapshot = UntrackedSnapshot::default();
// Each entry is of the form "<code> <path>" where code is '?' (untracked)
// or '!' (ignored); everything else is irrelevant to this snapshot.
for entry in output.split('\0') {
if entry.is_empty() {
continue;
}
let mut parts = entry.splitn(2, ' ');
let code = parts.next();
let path_part = parts.next();
let (Some(code), Some(path_part)) = (code, path_part) else {
continue;
};
if code != "?" && code != "!" {
continue;
}
if path_part.is_empty() {
continue;
}
let normalized = normalize_relative_path(Path::new(path_part))?;
let absolute = repo_root.join(&normalized);
let is_dir = absolute.is_dir();
if is_dir {
snapshot.dirs.push(normalized);
} else {
snapshot.files.push(normalized);
}
}
Ok(snapshot)
}
/// Removes untracked files and directories that were not present when the snapshot was captured.
fn remove_new_untracked(
repo_root: &Path,
preserved_files: &[PathBuf],
preserved_dirs: &[PathBuf],
current: UntrackedSnapshot,
) -> Result<(), GitToolingError> {
if current.files.is_empty() && current.dirs.is_empty() {
return Ok(());
}
let preserved_file_set: HashSet<PathBuf> = preserved_files.iter().cloned().collect();
let preserved_dirs_vec: Vec<PathBuf> = preserved_dirs.to_vec();
for path in current.files {
if should_preserve(&path, &preserved_file_set, &preserved_dirs_vec) {
continue;
}
remove_path(&repo_root.join(&path))?;
}
for dir in current.dirs {
if should_preserve(&dir, &preserved_file_set, &preserved_dirs_vec) {
continue;
}
remove_path(&repo_root.join(&dir))?;
}
Ok(())
}
/// Determines whether an untracked path should be kept because it existed in the snapshot.
fn should_preserve(
path: &Path,
preserved_files: &HashSet<PathBuf>,
preserved_dirs: &[PathBuf],
) -> bool {
if preserved_files.contains(path) {
return true;
}
preserved_dirs
.iter()
.any(|dir| path.starts_with(dir.as_path()))
}
/// Deletes the file or directory at the provided path, ignoring if it is already absent.
fn remove_path(path: &Path) -> Result<(), GitToolingError> {
match fs::symlink_metadata(path) {
Ok(metadata) => {
if metadata.is_dir() {
fs::remove_dir_all(path)?;
} else {
fs::remove_file(path)?;
}
}
Err(err) => {
if err.kind() == io::ErrorKind::NotFound {
return Ok(());
}
return Err(err.into());
}
}
Ok(())
}
/// Returns the default author and committer identity for ghost commits.
fn default_commit_identity() -> Vec<(OsString, OsString)> {
vec![
(
OsString::from("GIT_AUTHOR_NAME"),
OsString::from("Llmx Snapshot"),
),
(
OsString::from("GIT_AUTHOR_EMAIL"),
OsString::from("snapshot@llmx.local"),
),
(
OsString::from("GIT_COMMITTER_NAME"),
OsString::from("Llmx Snapshot"),
),
(
OsString::from("GIT_COMMITTER_EMAIL"),
OsString::from("snapshot@llmx.local"),
),
]
}
#[cfg(test)]
mod tests {
use super::*;
use crate::operations::run_git_for_stdout;
use assert_matches::assert_matches;
use pretty_assertions::assert_eq;
use std::process::Command;
/// Runs a git command in the test repository and asserts success.
fn run_git_in(repo_path: &Path, args: &[&str]) {
let status = Command::new("git")
.current_dir(repo_path)
.args(args)
.status()
.expect("git command");
assert!(status.success(), "git command failed: {args:?}");
}
/// Runs a git command and returns its trimmed stdout output.
fn run_git_stdout(repo_path: &Path, args: &[&str]) -> String {
let output = Command::new("git")
.current_dir(repo_path)
.args(args)
.output()
.expect("git command");
assert!(output.status.success(), "git command failed: {args:?}");
String::from_utf8_lossy(&output.stdout).trim().to_string()
}
/// Initializes a repository with consistent settings for cross-platform tests.
fn init_test_repo(repo: &Path) {
run_git_in(repo, &["init", "--initial-branch=main"]);
run_git_in(repo, &["config", "core.autocrlf", "false"]);
}
#[test]
/// Verifies a ghost commit can be created and restored end to end.
fn create_and_restore_roundtrip() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join("tracked.txt"), "initial\n")?;
std::fs::write(repo.join("delete-me.txt"), "to be removed\n")?;
run_git_in(repo, &["add", "tracked.txt", "delete-me.txt"]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"init",
],
);
let preexisting_untracked = repo.join("notes.txt");
std::fs::write(&preexisting_untracked, "notes before\n")?;
let tracked_contents = "modified contents\n";
std::fs::write(repo.join("tracked.txt"), tracked_contents)?;
std::fs::remove_file(repo.join("delete-me.txt"))?;
let new_file_contents = "hello ghost\n";
std::fs::write(repo.join("new-file.txt"), new_file_contents)?;
std::fs::write(repo.join(".gitignore"), "ignored.txt\n")?;
let ignored_contents = "ignored but captured\n";
std::fs::write(repo.join("ignored.txt"), ignored_contents)?;
let options =
CreateGhostCommitOptions::new(repo).force_include(vec![PathBuf::from("ignored.txt")]);
let ghost = create_ghost_commit(&options)?;
assert!(ghost.parent().is_some());
let cat = run_git_for_stdout(
repo,
vec![
OsString::from("show"),
OsString::from(format!("{}:ignored.txt", ghost.id())),
],
None,
)?;
assert_eq!(cat, ignored_contents.trim());
std::fs::write(repo.join("tracked.txt"), "other state\n")?;
std::fs::write(repo.join("ignored.txt"), "changed\n")?;
std::fs::remove_file(repo.join("new-file.txt"))?;
std::fs::write(repo.join("ephemeral.txt"), "temp data\n")?;
std::fs::write(&preexisting_untracked, "notes after\n")?;
restore_ghost_commit(repo, &ghost)?;
let tracked_after = std::fs::read_to_string(repo.join("tracked.txt"))?;
assert_eq!(tracked_after, tracked_contents);
let ignored_after = std::fs::read_to_string(repo.join("ignored.txt"))?;
assert_eq!(ignored_after, ignored_contents);
let new_file_after = std::fs::read_to_string(repo.join("new-file.txt"))?;
assert_eq!(new_file_after, new_file_contents);
assert_eq!(repo.join("delete-me.txt").exists(), false);
assert!(!repo.join("ephemeral.txt").exists());
let notes_after = std::fs::read_to_string(&preexisting_untracked)?;
assert_eq!(notes_after, "notes before\n");
Ok(())
}
#[test]
/// Ensures ghost commits succeed in repositories without an existing HEAD.
fn create_snapshot_without_existing_head() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
let tracked_contents = "first contents\n";
std::fs::write(repo.join("tracked.txt"), tracked_contents)?;
let ignored_contents = "ignored but captured\n";
std::fs::write(repo.join(".gitignore"), "ignored.txt\n")?;
std::fs::write(repo.join("ignored.txt"), ignored_contents)?;
let options =
CreateGhostCommitOptions::new(repo).force_include(vec![PathBuf::from("ignored.txt")]);
let ghost = create_ghost_commit(&options)?;
assert!(ghost.parent().is_none());
let message = run_git_stdout(repo, &["log", "-1", "--format=%s", ghost.id()]);
assert_eq!(message, DEFAULT_COMMIT_MESSAGE);
let ignored = run_git_stdout(repo, &["show", &format!("{}:ignored.txt", ghost.id())]);
assert_eq!(ignored, ignored_contents.trim());
Ok(())
}
#[test]
/// Confirms custom messages are used when creating ghost commits.
fn create_ghost_commit_uses_custom_message() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join("tracked.txt"), "contents\n")?;
run_git_in(repo, &["add", "tracked.txt"]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"initial",
],
);
let message = "custom message";
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo).message(message))?;
let commit_message = run_git_stdout(repo, &["log", "-1", "--format=%s", ghost.id()]);
assert_eq!(commit_message, message);
Ok(())
}
#[test]
/// Rejects force-included paths that escape the repository.
fn create_ghost_commit_rejects_force_include_parent_path() {
let temp = tempfile::tempdir().expect("tempdir");
let repo = temp.path();
init_test_repo(repo);
let options = CreateGhostCommitOptions::new(repo)
.force_include(vec![PathBuf::from("../outside.txt")]);
let err = create_ghost_commit(&options).unwrap_err();
assert_matches!(err, GitToolingError::PathEscapesRepository { .. });
}
#[test]
/// Restoring a ghost commit from a non-git directory fails.
fn restore_requires_git_repository() {
let temp = tempfile::tempdir().expect("tempdir");
let err = restore_to_commit(temp.path(), "deadbeef").unwrap_err();
assert_matches!(err, GitToolingError::NotAGitRepository { .. });
}
#[test]
/// Restoring from a subdirectory affects only that subdirectory.
fn restore_from_subdirectory_restores_files_relatively() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::create_dir_all(repo.join("workspace"))?;
let workspace = repo.join("workspace");
std::fs::write(repo.join("root.txt"), "root contents\n")?;
std::fs::write(workspace.join("nested.txt"), "nested contents\n")?;
run_git_in(repo, &["add", "."]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"initial",
],
);
std::fs::write(repo.join("root.txt"), "root modified\n")?;
std::fs::write(workspace.join("nested.txt"), "nested modified\n")?;
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(&workspace))?;
std::fs::write(repo.join("root.txt"), "root after\n")?;
std::fs::write(workspace.join("nested.txt"), "nested after\n")?;
restore_ghost_commit(&workspace, &ghost)?;
let root_after = std::fs::read_to_string(repo.join("root.txt"))?;
assert_eq!(root_after, "root after\n");
let nested_after = std::fs::read_to_string(workspace.join("nested.txt"))?;
assert_eq!(nested_after, "nested modified\n");
assert!(!workspace.join("llmx-rs").exists());
Ok(())
}
#[test]
/// Restoring from a subdirectory preserves ignored files in parent folders.
fn restore_from_subdirectory_preserves_parent_vscode() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
let workspace = repo.join("llmx-rs");
std::fs::create_dir_all(&workspace)?;
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
std::fs::write(workspace.join("tracked.txt"), "snapshot version\n")?;
run_git_in(repo, &["add", "."]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"initial",
],
);
std::fs::write(workspace.join("tracked.txt"), "snapshot delta\n")?;
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(&workspace))?;
std::fs::write(workspace.join("tracked.txt"), "post-snapshot\n")?;
let vscode = repo.join(".vscode");
std::fs::create_dir_all(&vscode)?;
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
restore_ghost_commit(&workspace, &ghost)?;
let tracked_after = std::fs::read_to_string(workspace.join("tracked.txt"))?;
assert_eq!(tracked_after, "snapshot delta\n");
assert!(vscode.join("settings.json").exists());
let settings_after = std::fs::read_to_string(vscode.join("settings.json"))?;
assert_eq!(settings_after, "{\n \"after\": true\n}\n");
Ok(())
}
#[test]
/// Restoring from the repository root keeps ignored files intact.
fn restore_preserves_ignored_files() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
std::fs::write(repo.join("tracked.txt"), "snapshot version\n")?;
let vscode = repo.join(".vscode");
std::fs::create_dir_all(&vscode)?;
std::fs::write(vscode.join("settings.json"), "{\n \"before\": true\n}\n")?;
run_git_in(repo, &["add", ".gitignore", "tracked.txt"]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"initial",
],
);
std::fs::write(repo.join("tracked.txt"), "snapshot delta\n")?;
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
std::fs::write(repo.join("tracked.txt"), "post-snapshot\n")?;
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
std::fs::write(repo.join("temp.txt"), "new file\n")?;
restore_ghost_commit(repo, &ghost)?;
let tracked_after = std::fs::read_to_string(repo.join("tracked.txt"))?;
assert_eq!(tracked_after, "snapshot delta\n");
assert!(vscode.join("settings.json").exists());
let settings_after = std::fs::read_to_string(vscode.join("settings.json"))?;
assert_eq!(settings_after, "{\n \"after\": true\n}\n");
assert!(!repo.join("temp.txt").exists());
Ok(())
}
#[test]
/// Restoring removes ignored directories created after the snapshot.
fn restore_removes_new_ignored_directory() -> Result<(), GitToolingError> {
let temp = tempfile::tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join(".gitignore"), ".vscode/\n")?;
std::fs::write(repo.join("tracked.txt"), "snapshot version\n")?;
run_git_in(repo, &["add", ".gitignore", "tracked.txt"]);
run_git_in(
repo,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
"initial",
],
);
let ghost = create_ghost_commit(&CreateGhostCommitOptions::new(repo))?;
let vscode = repo.join(".vscode");
std::fs::create_dir_all(&vscode)?;
std::fs::write(vscode.join("settings.json"), "{\n \"after\": true\n}\n")?;
restore_ghost_commit(repo, &ghost)?;
assert!(!vscode.exists());
Ok(())
}
}

View File

@@ -0,0 +1,79 @@
use std::fmt;
use std::path::PathBuf;
mod apply;
mod errors;
mod ghost_commits;
mod operations;
mod platform;
pub use apply::ApplyGitRequest;
pub use apply::ApplyGitResult;
pub use apply::apply_git_patch;
pub use apply::extract_paths_from_patch;
pub use apply::parse_git_apply_output;
pub use apply::stage_paths;
pub use errors::GitToolingError;
pub use ghost_commits::CreateGhostCommitOptions;
pub use ghost_commits::create_ghost_commit;
pub use ghost_commits::restore_ghost_commit;
pub use ghost_commits::restore_to_commit;
pub use platform::create_symlink;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use ts_rs::TS;
type CommitID = String;
/// Details of a ghost commit created from a repository state.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, JsonSchema, TS)]
pub struct GhostCommit {
id: CommitID,
parent: Option<CommitID>,
preexisting_untracked_files: Vec<PathBuf>,
preexisting_untracked_dirs: Vec<PathBuf>,
}
impl GhostCommit {
/// Create a new ghost commit wrapper from a raw commit ID and optional parent.
pub fn new(
id: CommitID,
parent: Option<CommitID>,
preexisting_untracked_files: Vec<PathBuf>,
preexisting_untracked_dirs: Vec<PathBuf>,
) -> Self {
Self {
id,
parent,
preexisting_untracked_files,
preexisting_untracked_dirs,
}
}
/// Commit ID for the snapshot.
pub fn id(&self) -> &str {
&self.id
}
/// Parent commit ID, if the repository had a `HEAD` at creation time.
pub fn parent(&self) -> Option<&str> {
self.parent.as_deref()
}
/// Untracked or ignored files that already existed when the snapshot was captured.
pub fn preexisting_untracked_files(&self) -> &[PathBuf] {
&self.preexisting_untracked_files
}
/// Untracked or ignored directories that already existed when the snapshot was captured.
pub fn preexisting_untracked_dirs(&self) -> &[PathBuf] {
&self.preexisting_untracked_dirs
}
}
impl fmt::Display for GhostCommit {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.id)
}
}

View File

@@ -0,0 +1,239 @@
use std::ffi::OsStr;
use std::ffi::OsString;
use std::path::Component;
use std::path::Path;
use std::path::PathBuf;
use std::process::Command;
use crate::GitToolingError;
pub(crate) fn ensure_git_repository(path: &Path) -> Result<(), GitToolingError> {
match run_git_for_stdout(
path,
vec![
OsString::from("rev-parse"),
OsString::from("--is-inside-work-tree"),
],
None,
) {
Ok(output) if output.trim() == "true" => Ok(()),
Ok(_) => Err(GitToolingError::NotAGitRepository {
path: path.to_path_buf(),
}),
Err(GitToolingError::GitCommand { status, .. }) if status.code() == Some(128) => {
Err(GitToolingError::NotAGitRepository {
path: path.to_path_buf(),
})
}
Err(err) => Err(err),
}
}
pub(crate) fn resolve_head(path: &Path) -> Result<Option<String>, GitToolingError> {
match run_git_for_stdout(
path,
vec![
OsString::from("rev-parse"),
OsString::from("--verify"),
OsString::from("HEAD"),
],
None,
) {
Ok(sha) => Ok(Some(sha)),
Err(GitToolingError::GitCommand { status, .. }) if status.code() == Some(128) => Ok(None),
Err(other) => Err(other),
}
}
pub(crate) fn normalize_relative_path(path: &Path) -> Result<PathBuf, GitToolingError> {
let mut result = PathBuf::new();
let mut saw_component = false;
for component in path.components() {
saw_component = true;
match component {
Component::Normal(part) => result.push(part),
Component::CurDir => {}
Component::ParentDir => {
if !result.pop() {
return Err(GitToolingError::PathEscapesRepository {
path: path.to_path_buf(),
});
}
}
Component::RootDir | Component::Prefix(_) => {
return Err(GitToolingError::NonRelativePath {
path: path.to_path_buf(),
});
}
}
}
if !saw_component {
return Err(GitToolingError::NonRelativePath {
path: path.to_path_buf(),
});
}
Ok(result)
}
pub(crate) fn resolve_repository_root(path: &Path) -> Result<PathBuf, GitToolingError> {
let root = run_git_for_stdout(
path,
vec![
OsString::from("rev-parse"),
OsString::from("--show-toplevel"),
],
None,
)?;
Ok(PathBuf::from(root))
}
pub(crate) fn apply_repo_prefix_to_force_include(
prefix: Option<&Path>,
paths: &[PathBuf],
) -> Vec<PathBuf> {
if paths.is_empty() {
return Vec::new();
}
match prefix {
Some(prefix) => paths.iter().map(|path| prefix.join(path)).collect(),
None => paths.to_vec(),
}
}
pub(crate) fn repo_subdir(repo_root: &Path, repo_path: &Path) -> Option<PathBuf> {
if repo_root == repo_path {
return None;
}
repo_path
.strip_prefix(repo_root)
.ok()
.and_then(non_empty_path)
.or_else(|| {
let repo_root_canon = repo_root.canonicalize().ok()?;
let repo_path_canon = repo_path.canonicalize().ok()?;
repo_path_canon
.strip_prefix(&repo_root_canon)
.ok()
.and_then(non_empty_path)
})
}
fn non_empty_path(path: &Path) -> Option<PathBuf> {
if path.as_os_str().is_empty() {
None
} else {
Some(path.to_path_buf())
}
}
pub(crate) fn run_git_for_status<I, S>(
dir: &Path,
args: I,
env: Option<&[(OsString, OsString)]>,
) -> Result<(), GitToolingError>
where
I: IntoIterator<Item = S>,
S: AsRef<OsStr>,
{
run_git(dir, args, env)?;
Ok(())
}
pub(crate) fn run_git_for_stdout<I, S>(
dir: &Path,
args: I,
env: Option<&[(OsString, OsString)]>,
) -> Result<String, GitToolingError>
where
I: IntoIterator<Item = S>,
S: AsRef<OsStr>,
{
let run = run_git(dir, args, env)?;
String::from_utf8(run.output.stdout)
.map(|value| value.trim().to_string())
.map_err(|source| GitToolingError::GitOutputUtf8 {
command: run.command,
source,
})
}
/// Executes `git` and returns the full stdout without trimming so callers
/// can parse delimiter-sensitive output, propagating UTF-8 errors with context.
pub(crate) fn run_git_for_stdout_all<I, S>(
dir: &Path,
args: I,
env: Option<&[(OsString, OsString)]>,
) -> Result<String, GitToolingError>
where
I: IntoIterator<Item = S>,
S: AsRef<OsStr>,
{
// Keep the raw stdout untouched so callers can parse delimiter-sensitive
// output (e.g. NUL-separated paths) without trimming artefacts.
let run = run_git(dir, args, env)?;
// Propagate UTF-8 conversion failures with the command context for debugging.
String::from_utf8(run.output.stdout).map_err(|source| GitToolingError::GitOutputUtf8 {
command: run.command,
source,
})
}
fn run_git<I, S>(
dir: &Path,
args: I,
env: Option<&[(OsString, OsString)]>,
) -> Result<GitRun, GitToolingError>
where
I: IntoIterator<Item = S>,
S: AsRef<OsStr>,
{
let iterator = args.into_iter();
let (lower, upper) = iterator.size_hint();
let mut args_vec = Vec::with_capacity(upper.unwrap_or(lower));
for arg in iterator {
args_vec.push(OsString::from(arg.as_ref()));
}
let command_string = build_command_string(&args_vec);
let mut command = Command::new("git");
command.current_dir(dir);
if let Some(envs) = env {
for (key, value) in envs {
command.env(key, value);
}
}
command.args(&args_vec);
let output = command.output()?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
return Err(GitToolingError::GitCommand {
command: command_string,
status: output.status,
stderr,
});
}
Ok(GitRun {
command: command_string,
output,
})
}
fn build_command_string(args: &[OsString]) -> String {
if args.is_empty() {
return "git".to_string();
}
let joined = args
.iter()
.map(|arg| arg.to_string_lossy().into_owned())
.collect::<Vec<_>>()
.join(" ");
format!("git {joined}")
}
struct GitRun {
command: String,
output: std::process::Output,
}

View File

@@ -0,0 +1,37 @@
use std::path::Path;
use crate::GitToolingError;
#[cfg(unix)]
pub fn create_symlink(
_source: &Path,
link_target: &Path,
destination: &Path,
) -> Result<(), GitToolingError> {
use std::os::unix::fs::symlink;
symlink(link_target, destination)?;
Ok(())
}
#[cfg(windows)]
pub fn create_symlink(
source: &Path,
link_target: &Path,
destination: &Path,
) -> Result<(), GitToolingError> {
use std::os::windows::fs::FileTypeExt;
use std::os::windows::fs::symlink_dir;
use std::os::windows::fs::symlink_file;
let metadata = std::fs::symlink_metadata(source)?;
if metadata.file_type().is_symlink_dir() {
symlink_dir(link_target, destination)?;
} else {
symlink_file(link_target, destination)?;
}
Ok(())
}
#[cfg(not(any(unix, windows)))]
compile_error!("llmx-git symlink support is only implemented for Unix and Windows");

View File

@@ -0,0 +1,18 @@
[package]
name = "llmx-utils-image"
version.workspace = true
edition.workspace = true
[lints]
workspace = true
[dependencies]
base64 = { workspace = true }
image = { workspace = true, features = ["jpeg", "png"] }
llmx-utils-cache = { workspace = true }
thiserror = { workspace = true }
tokio = { workspace = true, features = ["fs", "rt", "rt-multi-thread", "macros"] }
[dev-dependencies]
image = { workspace = true, features = ["jpeg", "png"] }
tempfile = { workspace = true }

View File

@@ -0,0 +1,25 @@
use image::ImageFormat;
use std::path::PathBuf;
use thiserror::Error;
#[derive(Debug, Error)]
pub enum ImageProcessingError {
#[error("failed to read image at {path}: {source}")]
Read {
path: PathBuf,
#[source]
source: std::io::Error,
},
#[error("failed to decode image at {path}: {source}")]
Decode {
path: PathBuf,
#[source]
source: image::ImageError,
},
#[error("failed to encode image as {format:?}: {source}")]
Encode {
format: ImageFormat,
#[source]
source: image::ImageError,
},
}

View File

@@ -0,0 +1,252 @@
use std::num::NonZeroUsize;
use std::path::Path;
use std::sync::LazyLock;
use crate::error::ImageProcessingError;
use base64::Engine;
use base64::engine::general_purpose::STANDARD as BASE64_STANDARD;
use image::ColorType;
use image::DynamicImage;
use image::GenericImageView;
use image::ImageEncoder;
use image::ImageFormat;
use image::codecs::jpeg::JpegEncoder;
use image::codecs::png::PngEncoder;
use image::imageops::FilterType;
use llmx_utils_cache::BlockingLruCache;
use llmx_utils_cache::sha1_digest;
/// Maximum width used when resizing images before uploading.
pub const MAX_WIDTH: u32 = 2048;
/// Maximum height used when resizing images before uploading.
pub const MAX_HEIGHT: u32 = 768;
pub mod error;
#[derive(Debug, Clone)]
pub struct EncodedImage {
pub bytes: Vec<u8>,
pub mime: String,
pub width: u32,
pub height: u32,
}
impl EncodedImage {
pub fn into_data_url(self) -> String {
let encoded = BASE64_STANDARD.encode(&self.bytes);
format!("data:{};base64,{}", self.mime, encoded)
}
}
static IMAGE_CACHE: LazyLock<BlockingLruCache<[u8; 20], EncodedImage>> =
LazyLock::new(|| BlockingLruCache::new(NonZeroUsize::new(32).unwrap_or(NonZeroUsize::MIN)));
pub fn load_and_resize_to_fit(path: &Path) -> Result<EncodedImage, ImageProcessingError> {
let path_buf = path.to_path_buf();
let file_bytes = read_file_bytes(path, &path_buf)?;
let key = sha1_digest(&file_bytes);
IMAGE_CACHE.get_or_try_insert_with(key, move || {
let format = match image::guess_format(&file_bytes) {
Ok(ImageFormat::Png) => Some(ImageFormat::Png),
Ok(ImageFormat::Jpeg) => Some(ImageFormat::Jpeg),
_ => None,
};
let dynamic = image::load_from_memory(&file_bytes).map_err(|source| {
ImageProcessingError::Decode {
path: path_buf.clone(),
source,
}
})?;
let (width, height) = dynamic.dimensions();
let encoded = if width <= MAX_WIDTH && height <= MAX_HEIGHT {
if let Some(format) = format {
let mime = format_to_mime(format);
EncodedImage {
bytes: file_bytes,
mime,
width,
height,
}
} else {
let (bytes, output_format) = encode_image(&dynamic, ImageFormat::Png)?;
let mime = format_to_mime(output_format);
EncodedImage {
bytes,
mime,
width,
height,
}
}
} else {
let resized = dynamic.resize(MAX_WIDTH, MAX_HEIGHT, FilterType::Triangle);
let target_format = format.unwrap_or(ImageFormat::Png);
let (bytes, output_format) = encode_image(&resized, target_format)?;
let mime = format_to_mime(output_format);
EncodedImage {
bytes,
mime,
width: resized.width(),
height: resized.height(),
}
};
Ok(encoded)
})
}
fn read_file_bytes(path: &Path, path_for_error: &Path) -> Result<Vec<u8>, ImageProcessingError> {
match tokio::runtime::Handle::try_current() {
// If we're inside a Tokio runtime, avoid block_on (it panics on worker threads).
// Use block_in_place and do a standard blocking read safely.
Ok(_) => tokio::task::block_in_place(|| std::fs::read(path)).map_err(|source| {
ImageProcessingError::Read {
path: path_for_error.to_path_buf(),
source,
}
}),
// Outside a runtime, just read synchronously.
Err(_) => std::fs::read(path).map_err(|source| ImageProcessingError::Read {
path: path_for_error.to_path_buf(),
source,
}),
}
}
fn encode_image(
image: &DynamicImage,
preferred_format: ImageFormat,
) -> Result<(Vec<u8>, ImageFormat), ImageProcessingError> {
let target_format = match preferred_format {
ImageFormat::Jpeg => ImageFormat::Jpeg,
_ => ImageFormat::Png,
};
let mut buffer = Vec::new();
match target_format {
ImageFormat::Png => {
let rgba = image.to_rgba8();
let encoder = PngEncoder::new(&mut buffer);
encoder
.write_image(
rgba.as_raw(),
image.width(),
image.height(),
ColorType::Rgba8.into(),
)
.map_err(|source| ImageProcessingError::Encode {
format: target_format,
source,
})?;
}
ImageFormat::Jpeg => {
let mut encoder = JpegEncoder::new_with_quality(&mut buffer, 85);
encoder
.encode_image(image)
.map_err(|source| ImageProcessingError::Encode {
format: target_format,
source,
})?;
}
_ => unreachable!("unsupported target_format should have been handled earlier"),
}
Ok((buffer, target_format))
}
fn format_to_mime(format: ImageFormat) -> String {
match format {
ImageFormat::Jpeg => "image/jpeg".to_string(),
_ => "image/png".to_string(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use image::GenericImageView;
use image::ImageBuffer;
use image::Rgba;
use tempfile::NamedTempFile;
#[tokio::test(flavor = "multi_thread")]
async fn returns_original_image_when_within_bounds() {
let temp_file = NamedTempFile::new().expect("temp file");
let image = ImageBuffer::from_pixel(64, 32, Rgba([10u8, 20, 30, 255]));
image
.save_with_format(temp_file.path(), ImageFormat::Png)
.expect("write png to temp file");
let original_bytes = std::fs::read(temp_file.path()).expect("read written image");
let encoded = load_and_resize_to_fit(temp_file.path()).expect("process image");
assert_eq!(encoded.width, 64);
assert_eq!(encoded.height, 32);
assert_eq!(encoded.mime, "image/png");
assert_eq!(encoded.bytes, original_bytes);
}
#[tokio::test(flavor = "multi_thread")]
async fn downscales_large_image() {
let temp_file = NamedTempFile::new().expect("temp file");
let image = ImageBuffer::from_pixel(4096, 2048, Rgba([200u8, 10, 10, 255]));
image
.save_with_format(temp_file.path(), ImageFormat::Png)
.expect("write png to temp file");
let processed = load_and_resize_to_fit(temp_file.path()).expect("process image");
assert!(processed.width <= MAX_WIDTH);
assert!(processed.height <= MAX_HEIGHT);
let loaded =
image::load_from_memory(&processed.bytes).expect("read resized bytes back into image");
assert_eq!(loaded.dimensions(), (processed.width, processed.height));
}
#[tokio::test(flavor = "multi_thread")]
async fn fails_cleanly_for_invalid_images() {
let temp_file = NamedTempFile::new().expect("temp file");
std::fs::write(temp_file.path(), b"not an image").expect("write bytes");
let err = load_and_resize_to_fit(temp_file.path()).expect_err("invalid image should fail");
match err {
ImageProcessingError::Decode { .. } => {}
_ => panic!("unexpected error variant"),
}
}
#[tokio::test(flavor = "multi_thread")]
async fn reprocesses_updated_file_contents() {
{
IMAGE_CACHE.clear();
}
let temp_file = NamedTempFile::new().expect("temp file");
let first_image = ImageBuffer::from_pixel(32, 16, Rgba([20u8, 120, 220, 255]));
first_image
.save_with_format(temp_file.path(), ImageFormat::Png)
.expect("write initial image");
let first = load_and_resize_to_fit(temp_file.path()).expect("process first image");
let second_image = ImageBuffer::from_pixel(96, 48, Rgba([50u8, 60, 70, 255]));
second_image
.save_with_format(temp_file.path(), ImageFormat::Png)
.expect("write updated image");
let second = load_and_resize_to_fit(temp_file.path()).expect("process updated image");
assert_eq!(first.width, 32);
assert_eq!(first.height, 16);
assert_eq!(second.width, 96);
assert_eq!(second.height, 48);
assert_ne!(second.bytes, first.bytes);
}
}

View File

@@ -0,0 +1,14 @@
[package]
edition.workspace = true
name = "llmx-utils-json-to-toml"
version.workspace = true
[dependencies]
serde_json = { workspace = true }
toml = { workspace = true }
[dev-dependencies]
pretty_assertions = { workspace = true }
[lints]
workspace = true

View File

@@ -0,0 +1,83 @@
use serde_json::Value as JsonValue;
use toml::Value as TomlValue;
/// Convert a `serde_json::Value` into a semantically equivalent `toml::Value`.
pub fn json_to_toml(v: JsonValue) -> TomlValue {
match v {
JsonValue::Null => TomlValue::String(String::new()),
JsonValue::Bool(b) => TomlValue::Boolean(b),
JsonValue::Number(n) => {
if let Some(i) = n.as_i64() {
TomlValue::Integer(i)
} else if let Some(f) = n.as_f64() {
TomlValue::Float(f)
} else {
TomlValue::String(n.to_string())
}
}
JsonValue::String(s) => TomlValue::String(s),
JsonValue::Array(arr) => TomlValue::Array(arr.into_iter().map(json_to_toml).collect()),
JsonValue::Object(map) => {
let tbl = map
.into_iter()
.map(|(k, v)| (k, json_to_toml(v)))
.collect::<toml::value::Table>();
TomlValue::Table(tbl)
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
use serde_json::json;
#[test]
fn json_number_to_toml() {
let json_value = json!(123);
assert_eq!(TomlValue::Integer(123), json_to_toml(json_value));
}
#[test]
fn json_array_to_toml() {
let json_value = json!([true, 1]);
assert_eq!(
TomlValue::Array(vec![TomlValue::Boolean(true), TomlValue::Integer(1)]),
json_to_toml(json_value)
);
}
#[test]
fn json_bool_to_toml() {
let json_value = json!(false);
assert_eq!(TomlValue::Boolean(false), json_to_toml(json_value));
}
#[test]
fn json_float_to_toml() {
let json_value = json!(1.25);
assert_eq!(TomlValue::Float(1.25), json_to_toml(json_value));
}
#[test]
fn json_null_to_toml() {
let json_value = serde_json::Value::Null;
assert_eq!(TomlValue::String(String::new()), json_to_toml(json_value));
}
#[test]
fn json_object_nested() {
let json_value = json!({ "outer": { "inner": 2 } });
let expected = {
let mut inner = toml::value::Table::new();
inner.insert("inner".into(), TomlValue::Integer(2));
let mut outer = toml::value::Table::new();
outer.insert("outer".into(), TomlValue::Table(inner));
TomlValue::Table(outer)
};
assert_eq!(json_to_toml(json_value), expected);
}
}

View File

@@ -0,0 +1,16 @@
[package]
edition = "2021"
name = "llmx-utils-pty"
version = { workspace = true }
[lints]
workspace = true
[dependencies]
anyhow = { workspace = true }
portable-pty = { workspace = true }
tokio = { workspace = true, features = [
"macros",
"rt-multi-thread",
"sync",
] }

View File

@@ -0,0 +1,211 @@
use std::collections::HashMap;
use std::io::ErrorKind;
use std::path::Path;
use std::sync::atomic::AtomicBool;
use std::sync::Arc;
use std::sync::Mutex as StdMutex;
use std::time::Duration;
use anyhow::Result;
use portable_pty::native_pty_system;
use portable_pty::CommandBuilder;
use portable_pty::PtySize;
use tokio::sync::broadcast;
use tokio::sync::mpsc;
use tokio::sync::oneshot;
use tokio::sync::Mutex as TokioMutex;
use tokio::task::JoinHandle;
#[derive(Debug)]
pub struct ExecCommandSession {
writer_tx: mpsc::Sender<Vec<u8>>,
output_tx: broadcast::Sender<Vec<u8>>,
killer: StdMutex<Option<Box<dyn portable_pty::ChildKiller + Send + Sync>>>,
reader_handle: StdMutex<Option<JoinHandle<()>>>,
writer_handle: StdMutex<Option<JoinHandle<()>>>,
wait_handle: StdMutex<Option<JoinHandle<()>>>,
exit_status: Arc<AtomicBool>,
exit_code: Arc<StdMutex<Option<i32>>>,
}
impl ExecCommandSession {
#[allow(clippy::too_many_arguments)]
pub fn new(
writer_tx: mpsc::Sender<Vec<u8>>,
output_tx: broadcast::Sender<Vec<u8>>,
killer: Box<dyn portable_pty::ChildKiller + Send + Sync>,
reader_handle: JoinHandle<()>,
writer_handle: JoinHandle<()>,
wait_handle: JoinHandle<()>,
exit_status: Arc<AtomicBool>,
exit_code: Arc<StdMutex<Option<i32>>>,
) -> (Self, broadcast::Receiver<Vec<u8>>) {
let initial_output_rx = output_tx.subscribe();
(
Self {
writer_tx,
output_tx,
killer: StdMutex::new(Some(killer)),
reader_handle: StdMutex::new(Some(reader_handle)),
writer_handle: StdMutex::new(Some(writer_handle)),
wait_handle: StdMutex::new(Some(wait_handle)),
exit_status,
exit_code,
},
initial_output_rx,
)
}
pub fn writer_sender(&self) -> mpsc::Sender<Vec<u8>> {
self.writer_tx.clone()
}
pub fn output_receiver(&self) -> broadcast::Receiver<Vec<u8>> {
self.output_tx.subscribe()
}
pub fn has_exited(&self) -> bool {
self.exit_status.load(std::sync::atomic::Ordering::SeqCst)
}
pub fn exit_code(&self) -> Option<i32> {
self.exit_code.lock().ok().and_then(|guard| *guard)
}
}
impl Drop for ExecCommandSession {
fn drop(&mut self) {
if let Ok(mut killer_opt) = self.killer.lock() {
if let Some(mut killer) = killer_opt.take() {
let _ = killer.kill();
}
}
if let Ok(mut h) = self.reader_handle.lock() {
if let Some(handle) = h.take() {
handle.abort();
}
}
if let Ok(mut h) = self.writer_handle.lock() {
if let Some(handle) = h.take() {
handle.abort();
}
}
if let Ok(mut h) = self.wait_handle.lock() {
if let Some(handle) = h.take() {
handle.abort();
}
}
}
}
#[derive(Debug)]
pub struct SpawnedPty {
pub session: ExecCommandSession,
pub output_rx: broadcast::Receiver<Vec<u8>>,
pub exit_rx: oneshot::Receiver<i32>,
}
pub async fn spawn_pty_process(
program: &str,
args: &[String],
cwd: &Path,
env: &HashMap<String, String>,
arg0: &Option<String>,
) -> Result<SpawnedPty> {
if program.is_empty() {
anyhow::bail!("missing program for PTY spawn");
}
let pty_system = native_pty_system();
let pair = pty_system.openpty(PtySize {
rows: 24,
cols: 80,
pixel_width: 0,
pixel_height: 0,
})?;
let mut command_builder = CommandBuilder::new(arg0.as_ref().unwrap_or(&program.to_string()));
command_builder.cwd(cwd);
command_builder.env_clear();
for arg in args {
command_builder.arg(arg);
}
for (key, value) in env {
command_builder.env(key, value);
}
let mut child = pair.slave.spawn_command(command_builder)?;
let killer = child.clone_killer();
let (writer_tx, mut writer_rx) = mpsc::channel::<Vec<u8>>(128);
let (output_tx, _) = broadcast::channel::<Vec<u8>>(256);
let mut reader = pair.master.try_clone_reader()?;
let output_tx_clone = output_tx.clone();
let reader_handle: JoinHandle<()> = tokio::task::spawn_blocking(move || {
let mut buf = [0u8; 8_192];
loop {
match reader.read(&mut buf) {
Ok(0) => break,
Ok(n) => {
let _ = output_tx_clone.send(buf[..n].to_vec());
}
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
std::thread::sleep(Duration::from_millis(5));
continue;
}
Err(_) => break,
}
}
});
let writer = pair.master.take_writer()?;
let writer = Arc::new(TokioMutex::new(writer));
let writer_handle: JoinHandle<()> = tokio::spawn({
let writer = Arc::clone(&writer);
async move {
while let Some(bytes) = writer_rx.recv().await {
let mut guard = writer.lock().await;
use std::io::Write;
let _ = guard.write_all(&bytes);
let _ = guard.flush();
}
}
});
let (exit_tx, exit_rx) = oneshot::channel::<i32>();
let exit_status = Arc::new(AtomicBool::new(false));
let wait_exit_status = Arc::clone(&exit_status);
let exit_code = Arc::new(StdMutex::new(None));
let wait_exit_code = Arc::clone(&exit_code);
let wait_handle: JoinHandle<()> = tokio::task::spawn_blocking(move || {
let code = match child.wait() {
Ok(status) => status.exit_code() as i32,
Err(_) => -1,
};
wait_exit_status.store(true, std::sync::atomic::Ordering::SeqCst);
if let Ok(mut guard) = wait_exit_code.lock() {
*guard = Some(code);
}
let _ = exit_tx.send(code);
});
let (session, output_rx) = ExecCommandSession::new(
writer_tx,
output_tx,
killer,
reader_handle,
writer_handle,
wait_handle,
exit_status,
exit_code,
);
Ok(SpawnedPty {
session,
output_rx,
exit_rx,
})
}

View File

@@ -0,0 +1,17 @@
[package]
name = "llmx-utils-readiness"
version.workspace = true
edition.workspace = true
[dependencies]
async-trait = { workspace = true }
thiserror = { workspace = true }
time = { workspace = true }
tokio = { workspace = true, features = ["sync", "time"] }
[dev-dependencies]
assert_matches = { workspace = true }
tokio = { workspace = true, features = ["macros", "rt", "rt-multi-thread"] }
[lints]
workspace = true

View File

@@ -0,0 +1,292 @@
//! Readiness flag with token-based authorization and async waiting (Tokio).
use std::collections::HashSet;
use std::fmt;
use std::sync::atomic::AtomicBool;
use std::sync::atomic::AtomicI32;
use std::sync::atomic::Ordering;
use std::time::Duration;
use tokio::sync::Mutex;
use tokio::sync::watch;
use tokio::time;
/// Opaque subscription token returned by `subscribe()`.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Token(i32);
const LOCK_TIMEOUT: Duration = Duration::from_millis(1000);
#[async_trait::async_trait]
pub trait Readiness: Send + Sync + 'static {
/// Returns true if the flag is currently marked ready. At least one token needs to be marked
/// as ready before.
/// `true` is not reversible.
fn is_ready(&self) -> bool;
/// Subscribe to readiness and receive an authorization token.
///
/// If the flag is already ready, returns `FlagAlreadyReady`.
async fn subscribe(&self) -> Result<Token, errors::ReadinessError>;
/// Attempt to mark the flag ready, validated by the provided token.
///
/// Returns `true` iff:
/// - `token` is currently subscribed, and
/// - the flag was not already ready.
async fn mark_ready(&self, token: Token) -> Result<bool, errors::ReadinessError>;
/// Asynchronously wait until the flag becomes ready.
async fn wait_ready(&self);
}
pub struct ReadinessFlag {
/// Atomic for cheap reads.
ready: AtomicBool,
/// Used to generate the next i32 token.
next_id: AtomicI32,
/// Set of active subscriptions.
tokens: Mutex<HashSet<Token>>,
/// Broadcasts readiness to async waiters.
tx: watch::Sender<bool>,
}
impl ReadinessFlag {
/// Create a new, not-yet-ready flag.
pub fn new() -> Self {
let (tx, _rx) = watch::channel(false);
Self {
ready: AtomicBool::new(false),
next_id: AtomicI32::new(1), // Reserve 0.
tokens: Mutex::new(HashSet::new()),
tx,
}
}
async fn with_tokens<R>(
&self,
f: impl FnOnce(&mut HashSet<Token>) -> R,
) -> Result<R, errors::ReadinessError> {
let mut guard = time::timeout(LOCK_TIMEOUT, self.tokens.lock())
.await
.map_err(|_| errors::ReadinessError::TokenLockFailed)?;
Ok(f(&mut guard))
}
fn load_ready(&self) -> bool {
self.ready.load(Ordering::Acquire)
}
}
impl Default for ReadinessFlag {
fn default() -> Self {
Self::new()
}
}
impl fmt::Debug for ReadinessFlag {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("ReadinessFlag")
.field("ready", &self.load_ready())
.finish()
}
}
#[async_trait::async_trait]
impl Readiness for ReadinessFlag {
fn is_ready(&self) -> bool {
if self.load_ready() {
return true;
}
if let Ok(tokens) = self.tokens.try_lock()
&& tokens.is_empty()
{
let was_ready = self.ready.swap(true, Ordering::AcqRel);
drop(tokens);
if !was_ready {
let _ = self.tx.send(true);
}
return true;
}
self.load_ready()
}
async fn subscribe(&self) -> Result<Token, errors::ReadinessError> {
if self.load_ready() {
return Err(errors::ReadinessError::FlagAlreadyReady);
}
// Generate a token; ensure it's not 0.
let token = Token(self.next_id.fetch_add(1, Ordering::Relaxed));
// Recheck readiness while holding the lock so mark_ready can't flip the flag between the
// check above and inserting the token.
let inserted = self
.with_tokens(|tokens| {
if self.load_ready() {
return false;
}
tokens.insert(token);
true
})
.await?;
if !inserted {
return Err(errors::ReadinessError::FlagAlreadyReady);
}
Ok(token)
}
async fn mark_ready(&self, token: Token) -> Result<bool, errors::ReadinessError> {
if self.load_ready() {
return Ok(false);
}
if token.0 == 0 {
return Ok(false); // Never authorize.
}
let marked = self
.with_tokens(|set| {
if !set.remove(&token) {
return false; // invalid or already used
}
self.ready.store(true, Ordering::Release);
set.clear(); // no further tokens needed once ready
true
})
.await?;
if !marked {
return Ok(false);
}
// Best-effort broadcast; ignore error if there are no receivers.
let _ = self.tx.send(true);
Ok(true)
}
async fn wait_ready(&self) {
if self.is_ready() {
return;
}
let mut rx = self.tx.subscribe();
// Fast-path check before awaiting.
if *rx.borrow() {
return;
}
// Await changes until true is observed.
while rx.changed().await.is_ok() {
if *rx.borrow() {
break;
}
}
}
}
mod errors {
use thiserror::Error;
#[derive(Debug, Error)]
pub enum ReadinessError {
#[error("Failed to acquire readiness token lock")]
TokenLockFailed,
#[error("Flag is already ready. Impossible to subscribe")]
FlagAlreadyReady,
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use super::Readiness;
use super::ReadinessFlag;
use super::Token;
use super::errors::ReadinessError;
use assert_matches::assert_matches;
#[tokio::test]
async fn subscribe_and_mark_ready_roundtrip() -> Result<(), ReadinessError> {
let flag = ReadinessFlag::new();
let token = flag.subscribe().await?;
assert!(flag.mark_ready(token).await?);
assert!(flag.is_ready());
Ok(())
}
#[tokio::test]
async fn subscribe_after_ready_returns_none() -> Result<(), ReadinessError> {
let flag = ReadinessFlag::new();
let token = flag.subscribe().await?;
assert!(flag.mark_ready(token).await?);
assert!(flag.subscribe().await.is_err());
Ok(())
}
#[tokio::test]
async fn mark_ready_rejects_unknown_token() -> Result<(), ReadinessError> {
let flag = ReadinessFlag::new();
assert!(!flag.mark_ready(Token(42)).await?);
assert!(!flag.load_ready());
assert!(flag.is_ready());
Ok(())
}
#[tokio::test]
async fn wait_ready_unblocks_after_mark_ready() -> Result<(), ReadinessError> {
let flag = Arc::new(ReadinessFlag::new());
let token = flag.subscribe().await?;
let waiter = {
let flag = Arc::clone(&flag);
tokio::spawn(async move {
flag.wait_ready().await;
})
};
assert!(flag.mark_ready(token).await?);
waiter.await.expect("waiting task should not panic");
Ok(())
}
#[tokio::test]
async fn mark_ready_twice_uses_single_token() -> Result<(), ReadinessError> {
let flag = ReadinessFlag::new();
let token = flag.subscribe().await?;
assert!(flag.mark_ready(token).await?);
assert!(!flag.mark_ready(token).await?);
Ok(())
}
#[tokio::test]
async fn is_ready_without_subscribers_marks_flag_ready() -> Result<(), ReadinessError> {
let flag = ReadinessFlag::new();
assert!(flag.is_ready());
assert!(flag.is_ready());
assert_matches!(
flag.subscribe().await,
Err(ReadinessError::FlagAlreadyReady)
);
Ok(())
}
#[tokio::test]
async fn subscribe_returns_error_when_lock_is_held() {
let flag = ReadinessFlag::new();
let _guard = flag
.tokens
.try_lock()
.expect("initial lock acquisition should succeed");
let err = flag
.subscribe()
.await
.expect_err("contended subscribe should report a lock failure");
assert_matches!(err, ReadinessError::TokenLockFailed);
}
}

View File

@@ -0,0 +1,7 @@
[package]
edition.workspace = true
name = "llmx-utils-string"
version.workspace = true
[lints]
workspace = true

View File

@@ -0,0 +1,38 @@
// Truncate a &str to a byte budget at a char boundary (prefix)
#[inline]
pub fn take_bytes_at_char_boundary(s: &str, maxb: usize) -> &str {
if s.len() <= maxb {
return s;
}
let mut last_ok = 0;
for (i, ch) in s.char_indices() {
let nb = i + ch.len_utf8();
if nb > maxb {
break;
}
last_ok = nb;
}
&s[..last_ok]
}
// Take a suffix of a &str within a byte budget at a char boundary
#[inline]
pub fn take_last_bytes_at_char_boundary(s: &str, maxb: usize) -> &str {
if s.len() <= maxb {
return s;
}
let mut start = s.len();
let mut used = 0usize;
for (i, ch) in s.char_indices().rev() {
let nb = ch.len_utf8();
if used + nb > maxb {
break;
}
start = i;
used += nb;
if start == 0 {
break;
}
}
&s[start..]
}

View File

@@ -0,0 +1,15 @@
[package]
edition.workspace = true
name = "llmx-utils-tokenizer"
version.workspace = true
[lints]
workspace = true
[dependencies]
anyhow = { workspace = true }
thiserror = { workspace = true }
tiktoken-rs = "0.7"
[dev-dependencies]
pretty_assertions = { workspace = true }

View File

@@ -0,0 +1,161 @@
use std::fmt;
use anyhow::Context;
use anyhow::Error as AnyhowError;
use thiserror::Error;
use tiktoken_rs::CoreBPE;
/// Supported local encodings.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub enum EncodingKind {
O200kBase,
Cl100kBase,
}
impl fmt::Display for EncodingKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::O200kBase => f.write_str("o200k_base"),
Self::Cl100kBase => f.write_str("cl100k_base"),
}
}
}
/// Tokenizer error type.
#[derive(Debug, Error)]
pub enum TokenizerError {
#[error("failed to load encoding {kind}")]
LoadEncoding {
kind: EncodingKind,
#[source]
source: AnyhowError,
},
#[error("failed to decode tokens")]
Decode {
#[source]
source: AnyhowError,
},
}
/// Thin wrapper around a `tiktoken_rs::CoreBPE` tokenizer.
#[derive(Clone)]
pub struct Tokenizer {
inner: CoreBPE,
}
impl Tokenizer {
/// Build a tokenizer for a specific encoding.
pub fn new(kind: EncodingKind) -> Result<Self, TokenizerError> {
let loader: fn() -> anyhow::Result<CoreBPE> = match kind {
EncodingKind::O200kBase => tiktoken_rs::o200k_base,
EncodingKind::Cl100kBase => tiktoken_rs::cl100k_base,
};
let inner = loader().map_err(|source| TokenizerError::LoadEncoding { kind, source })?;
Ok(Self { inner })
}
/// Default to `O200kBase`
pub fn try_default() -> Result<Self, TokenizerError> {
Self::new(EncodingKind::O200kBase)
}
/// Build a tokenizer using an `OpenAI` model name (maps to an encoding).
/// Falls back to the `O200kBase` encoding when the model is unknown.
pub fn for_model(model: &str) -> Result<Self, TokenizerError> {
match tiktoken_rs::get_bpe_from_model(model) {
Ok(inner) => Ok(Self { inner }),
Err(model_error) => {
let inner = tiktoken_rs::o200k_base()
.with_context(|| {
format!("fallback after model lookup failure for {model}: {model_error}")
})
.map_err(|source| TokenizerError::LoadEncoding {
kind: EncodingKind::O200kBase,
source,
})?;
Ok(Self { inner })
}
}
}
/// Encode text to token IDs. If `with_special_tokens` is true, special
/// tokens are allowed and may appear in the result.
#[must_use]
pub fn encode(&self, text: &str, with_special_tokens: bool) -> Vec<i32> {
let raw = if with_special_tokens {
self.inner.encode_with_special_tokens(text)
} else {
self.inner.encode_ordinary(text)
};
raw.into_iter().map(|t| t as i32).collect()
}
/// Count tokens in `text` as a signed integer.
#[must_use]
pub fn count(&self, text: &str) -> i64 {
// Signed length to satisfy our style preference.
i64::try_from(self.inner.encode_ordinary(text).len()).unwrap_or(i64::MAX)
}
/// Decode token IDs back to text.
pub fn decode(&self, tokens: &[i32]) -> Result<String, TokenizerError> {
let raw: Vec<u32> = tokens.iter().map(|t| *t as u32).collect();
self.inner
.decode(raw)
.map_err(|source| TokenizerError::Decode { source })
}
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
#[test]
fn cl100k_base_roundtrip_simple() -> Result<(), TokenizerError> {
let tok = Tokenizer::new(EncodingKind::Cl100kBase)?;
let s = "hello world";
let ids = tok.encode(s, false);
// Stable expectation for cl100k_base
assert_eq!(ids, vec![15339, 1917]);
let back = tok.decode(&ids)?;
assert_eq!(back, s);
Ok(())
}
#[test]
fn preserves_whitespace_and_special_tokens_flag() -> Result<(), TokenizerError> {
let tok = Tokenizer::new(EncodingKind::Cl100kBase)?;
let s = "This has multiple spaces";
let ids_no_special = tok.encode(s, false);
let round = tok.decode(&ids_no_special)?;
assert_eq!(round, s);
// With special tokens allowed, result may be identical for normal text,
// but the API should still function.
let ids_with_special = tok.encode(s, true);
let round2 = tok.decode(&ids_with_special)?;
assert_eq!(round2, s);
Ok(())
}
#[test]
fn model_mapping_builds_tokenizer() -> Result<(), TokenizerError> {
// Choose a long-standing model alias that maps to cl100k_base.
let tok = Tokenizer::for_model("gpt-5")?;
let ids = tok.encode("ok", false);
let back = tok.decode(&ids)?;
assert_eq!(back, "ok");
Ok(())
}
#[test]
fn unknown_model_defaults_to_o200k_base() -> Result<(), TokenizerError> {
let fallback = Tokenizer::new(EncodingKind::O200kBase)?;
let tok = Tokenizer::for_model("does-not-exist")?;
let text = "fallback please";
assert_eq!(tok.encode(text, false), fallback.encode(text, false));
Ok(())
}
}