Add three major features to artifact_huggingface_download.sh:
1. Category filtering (--category flag):
- Filter models by category (single or comma-separated multiple)
- Validates categories against YAML configuration
- Works with all commands (download, link, both, verify)
2. Cleanup mode (--cleanup flag):
- Removes unreferenced files from HuggingFace cache
- Only deletes files not referenced by symlinks
- Per-category cleanup (safe and isolated)
- Works with link and both commands only
3. Dry-run mode (--dry-run/-n flag):
- Preview operations without making changes
- Shows what would be downloaded, linked, or cleaned
- Includes file counts and size estimates
- Warning banner to indicate dry-run mode
Additional improvements:
- Updated help text with comprehensive examples
- Added validation for invalid flag combinations
- Enhanced user feedback with detailed preview information
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updated find_model_files() to include .index.json files in the file
discovery logic. These files are essential for sharded models like
CogVideoX which split safetensors into multiple parts.
Changes:
- Modified JSON file filtering to allow files ending with .index.json
- Updated comment to document support for sharded model index files
Fixes automatic linking for:
- CogVideoX-5b (3 files including index.json)
- CogVideoX-5b-I2V (4 files including index.json)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
HuggingFace models use symlinks pointing to blobs in the cache. The stat
command was returning the size of the symlink itself (76 bytes) instead of
the actual file it points to.
Fixed by:
- Adding -L flag to stat commands to follow symlinks
- Testing which stat format works (Linux -c vs macOS -f) before executing
- Using proper spacing and quotes in stat commands
- Checking for both regular files and symlinks in the condition
This fixes disk usage showing as 1 KB instead of the actual ~50+ GB for models
like FLUX.
The stat command was producing multi-line output which broke the pipe-delimited
return format of verify_model_download(). Fixed by:
- Testing which stat format works (Linux vs macOS) before executing
- Using proper format flags to get single-line date output
- Defaulting to 'Unknown' instead of relying on fallback chain
This fixes the issue where all models showed as 'NOT DOWNLOADED' even when
present in the cache.
The verify command was showing all models as "NOT DOWNLOADED" because find_model_files()
was exiting silently without diagnostic output. This made debugging impossible.
Changes:
- Added detailed error messages to Python script in find_model_files()
- Reports which directories were checked and why they failed
- Shows actual vs expected paths for model/snapshots directories
- Includes DEBUG messages showing which snapshot is being used
- Warns when no files match the filter
- Modified verify_model_download() to capture and display stderr
- Changed from suppressing stderr (2>/dev/null) to capturing it (2>&1)
- Filters ERROR/WARN/DEBUG prefixes from file paths
- Logs diagnostic messages to stderr for visibility
This will help identify the actual cache structure mismatch causing verification failures.
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added new 'verify' command to artifact_huggingface_download.sh that performs comprehensive health checks on all downloaded models and their symlinks.
Features:
- Download status verification (existence, size, location, timestamps)
- Link status verification (valid, broken, or missing symlinks)
- Size mismatch detection (warns if actual differs >10% from expected)
- Per-model detailed logging with beautiful formatting
- Category-level and global statistics summaries
- Actionable fix suggestions for detected issues
- Disk space usage analysis
New Functions:
- get_model_disk_usage() - Calculate actual model file sizes
- format_bytes() - Human-readable size formatting
- verify_model_download() - Check model download status
- verify_model_links() - Verify symlink integrity
- verify_category() - Process category with verification
- display_verification_summary() - Show global results
Usage:
artifact_huggingface_download.sh verify -c models.yaml
Output includes:
✓ Downloaded/Missing model counts
✓ Properly linked/Broken link statistics
✓ File sizes and locations
✓ Last modified timestamps
⚠️ Size mismatch warnings
📊 Disk space usage per category
💡 Fix suggestions with exact commands
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>