219 Commits

Author SHA1 Message Date
2189697734 refactor: remove type field from models_huggingface.yaml and include type in dest paths
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Prepended ComfyUI model type folder (checkpoints/, clip/, vae/, etc.) to all dest paths
- Removed separate 'type' field from all model entries
- Consolidated SD3.5 duplicate entries (5 → 1)
- Simplified model configuration by embedding directory structure directly in destination paths

This change eliminates the need to parse the 'type' field separately in artifact_huggingface_download.sh,
making the configuration more explicit and easier to understand.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:19:42 +01:00
ff6c1369ae feat: add three new SDXL/SD1.5 image generation models
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add the following models to models_huggingface.yaml:
- John6666/diving-illustrious-real-asian-v50-sdxl: SDXL fine-tune for photorealistic Asian subjects
- playgroundai/playground-v2.5-1024px-aesthetic: High-aesthetic 1024px SDXL-based model
- Lykon/dreamshaper-8: Versatile SD1.5 fine-tune with multi-style support

All models marked as non-essential and will download .safetensors files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 17:29:04 +01:00
aa2cc5973b fix: update CogVideoX models to link sharded transformer files
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- CogVideoX-5b: Link 2 shards + index.json instead of single non-existent file
- CogVideoX-5b-I2V: Link 3 shards + index.json instead of single non-existent file  - Fixes link command failures for these video generation models
- Shards are now properly symlinked to ComfyUI diffusion_models directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:47:09 +01:00
3c6904a253 feat: add comfyui-workspace-manager custom node
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added comfyui-workspace-manager plugin to arty.yml for ComfyUI workflow and model management.

Repository: https://github.com/11cafe/comfyui-workspace-manager

Features:
- Workflow management with version history
- Model browser with one-click CivitAI downloads
- Image gallery per workflow
- Auto-save and keyboard shortcuts

Note: Plugin is marked as obsolete (April 2025) as ComfyUI now has built-in workspace features, but added per user request.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 12:33:00 +01:00
6efb55c59f feat: add complete HunyuanVideo and Wan2.2 video generation integration
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Integrated 35+ video generation models and 13 production workflows from ComfyUI docs tutorials for state-of-the-art text-to-video and image-to-video generation.

Models Added (models_huggingface.yaml):
- HunyuanVideo (5 models): Original T2V/I2V (720p), v1.5 (720p/1080p) with Qwen 2.5 VL
- Wan2.2 diffusion models (18 models):
  - 5B TI2V hybrid (8GB VRAM, efficient)
  - 14B variants: T2V, I2V (high/low noise), Animate, S2V (FP8/BF16), Fun Camera/Control (high/low noise)
- Support models (12): VAEs, UMT5-XXL, CLIP Vision H, Wav2Vec2, LLaVA encoders
- LoRA accelerators (4): Lightx2v 4-step distillation for 5x speedup

Workflows Added (comfyui/workflows/image-to-video/):
- HunyuanVideo (5 workflows): T2V original, I2V v1/v2 (webp embedded), v1.5 T2V/I2V (JSON)
- Wan2.2 (8 workflows): 5B TI2V, 14B T2V/I2V/FLF2V/Animate/S2V/Fun Camera/Fun Control
- Asset files (10): Reference images, videos, audio for workflow testing

Custom Nodes Added (arty.yml):
- ComfyUI-KJNodes: Kijai optimizations for HunyuanVideo/Wan2.2 (FP8 scaling, video helpers)
- comfyui_controlnet_aux: ControlNet preprocessors (Canny, Depth, OpenPose, MLSD) for Fun Control
- ComfyUI-GGUF: GGUF quantization support for memory optimization

VRAM Requirements:
- HunyuanVideo original: 24GB (720p T2V/I2V, 129 frames, 5s generation)
- HunyuanVideo 1.5: 30-60GB (720p/1080p, improved quality with Qwen 2.5 VL)
- Wan2.2 5B: 8GB (efficient dual-expert architecture with native offloading)
- Wan2.2 14B: 24GB (high-quality video generation, all modes)

Note: Wan2.2 Fun Inpaint workflow not available in official templates repository (404).

Tutorial Sources:
- https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video
- https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5
- https://docs.comfy.org/tutorials/video/wan/wan2_2
- https://docs.comfy.org/tutorials/video/wan/wan2-2-animate
- https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-camera
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 10:43:39 +01:00
06b8ec0064 refactor: remove simple ACE Step workflow in favor of official workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Removed acestep-simple-t2m-v1.json as the official Comfy-Org workflows provide better quality:
- acestep-official-t2m-v1.json - Advanced T2M with specialized nodes
- acestep-m2m-editing-v1.json - Music-to-music editing capability

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:46:24 +01:00
e610330b91 feat: add official Comfy-Org ACE Step workflows and example assets
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added 2 official workflows from Comfy-Org/example_workflows:
- acestep-official-t2m-v1.json - Advanced T2M with specialized nodes (50 steps, multiple formats)
- acestep-m2m-editing-v1.json - Music-to-music editing with denoise control

Added 3 audio example assets:
- acestep-m2m-input.mp3 (973 KB) - Example input for M2M editing
- acestep-t2m-output.flac (3.4 MB) - T2M output reference
- acestep-m2m-output.mp3 (998 KB) - M2M output reference

Total: 3 workflows (simple + official T2M + M2M editing) with audio examples

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:34:57 +01:00
55b37894b1 fix: remove empty ACE Step workflow placeholders
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Removed 3 empty placeholder workflows that only contained metadata:
- acestep-multilang-t2m-v1.json
- acestep-remix-m2m-v1.json
- acestep-chinese-rap-v1.json

Kept only the functional workflow:
- acestep-simple-t2m-v1.json (6 nodes, fully operational)

Users can use the simple workflow and modify the prompt for different use cases:
- Multi-language: prefix lyrics with language tags like [zh], [ja], [ko]
- Remixing: load audio input and adjust denoise strength (0.1-0.7)
- Chinese RAP: use Chinese RAP LoRA with strength 0.8-1.0

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 08:51:37 +01:00
513062623c feat: integrate ACE Step music generation with 19-language support
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added ACE Step v1 3.5B model for state-of-the-art music generation:
- 15x faster than LLM baselines with superior structural coherence
- Supports 19 languages (en, zh, ja, ko, fr, es, de, it, pt, ru + 9 more)
- Voice cloning, lyric alignment, and multi-genre capabilities

Changes:
- Added ACE Step models to models_huggingface.yaml (checkpoint + Chinese RAP LoRA)
- Added ComfyUI_ACE-Step custom node to arty.yml with installation script
- Created 4 comprehensive workflows in comfyui/workflows/text-to-music/:
  * acestep-simple-t2m-v1.json - Basic 60s text-to-music generation
  * acestep-multilang-t2m-v1.json - 19-language music generation
  * acestep-remix-m2m-v1.json - Music-to-music remixing with style transfer
  * acestep-chinese-rap-v1.json - Chinese hip-hop with specialized LoRA

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 08:40:17 +01:00
5af3eeb333 feat: add BGE embedding service and reorganize supervisor groups
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Add vLLM embedding server for BAAI/bge-large-en-v1.5 (port 8002)
- Reorganize supervisor into two logical groups:
  - comfyui-services: comfyui, webdav-sync
  - vllm-services: vllm-qwen, vllm-llama, vllm-embedding
- Update arty.yml service management scripts for new group structure
- Add individual service control scripts for all vLLM models

Note: Embedding server currently uses placeholder implementation
For production use, switch to sentence-transformers or native vLLM embedding mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:32:01 +01:00
e12a8add61 feat: add vLLM models configuration file
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 35s
Add models_huggingface_vllm.yaml with three vLLM models:
- Qwen/Qwen2.5-7B-Instruct (14GB) - Advanced multilingual reasoning
- meta-llama/Llama-3.1-8B-Instruct (17GB) - Extended 128K context
- BAAI/bge-large-en-v1.5 (1.3GB) - High-quality text embeddings

Total storage: ~32GB

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:12:18 +01:00
6ce989dd91 Remove unused diffrhythm-random-generation workflow
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Removed diffrhythm-random-generation-v1.json as it's no longer needed.
Keeping only the essential DiffRhythm workflows:
- simple text-to-music (95s)
- full-length generation (4m45s)
- reference-based style transfer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 20:34:53 +01:00
d74a7cb7cb fix: replace custom Pivoine node with direct DiffRhythm patch
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Remove custom PivoineDiffRhythmRun wrapper node
- Add git patch file for ComfyUI_DiffRhythm __init__.py
- Patch adds LlamaConfig fix at import time
- Add arty script 'fix/diffrhythm-patch' to apply patch
- Revert all workflows to use original DiffRhythmRun
- Remove startup_patch.py and revert start.sh

This approach is cleaner and more maintainable than wrapping the node.
The patch directly fixes the tensor dimension mismatch (32 vs 64) in
DiffRhythm's rotary position embeddings by ensuring num_attention_heads
and num_key_value_heads are properly set based on hidden_size.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48
2025-11-24 19:27:18 +01:00
f74457b049 fix: apply LlamaConfig patch globally at import time
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Previous approach patched DiT.__init__ at runtime, but models were already
instantiated and cached. This version patches LlamaConfig globally BEFORE
any DiffRhythm imports, ensuring all model instances use the correct config.

Key changes:
- Created PatchedLlamaConfig subclass that auto-calculates attention heads
- Replaced LlamaConfig in transformers.models.llama module at import time
- Patch applies to all LlamaConfig instances, including pre-loaded models

This should finally fix the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 19:00:29 +01:00
91f6e9bd59 fix: patch DiffRhythm DIT to add missing LlamaConfig attention head parameters
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Adds monkey-patch for DiT.__init__() to properly configure LlamaConfig with
num_attention_heads and num_key_value_heads parameters, which are missing
in the upstream DiffRhythm code.

Root cause: transformers 4.49.0+ requires these parameters but DiffRhythm's
dit.py only specifies hidden_size, causing the library to incorrectly infer
head_dim as 32 instead of 64, leading to tensor dimension mismatches.

Solution:
- Sets num_attention_heads = hidden_size // 64 (standard Llama architecture)
- Sets num_key_value_heads = num_attention_heads // 4 (GQA configuration)
- Ensures head_dim = 64, fixing the "tensor a (32) vs tensor b (64)" error

This is a proper fix rather than just downgrading transformers version.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:53:18 +01:00
60ca8b08d0 feat: add arty script to fix DiffRhythm transformers version
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Adds 'fix/diffrhythm-transformers' command to quickly downgrade
transformers library to 4.49.0 for DiffRhythm compatibility.

Usage: arty fix/diffrhythm-transformers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:15:30 +01:00
8c4eb8c3f1 fix: pin transformers to 4.49.0 for DiffRhythm compatibility
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Resolves tensor dimension mismatch error in rotary position embeddings.
DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause
"The size of tensor a (32) must match the size of tensor b (64)" error
due to transformer block initialization changes.

Updated pivoine_diffrhythm.py documentation to reflect actual root cause
and link to upstream GitHub issues #44 and #48.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:14:40 +01:00
67d41c3923 fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
The correct function to patch is decode_audio from infer_utils module,
which is where chunked VAE decoding actually happens. This intercepts
the call at the right level to force chunked=False.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:28:30 +01:00
1981b7b256 fix: monkey-patch DiffRhythm infer function to force chunked=False
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
The previous approach of overriding diffrhythmgen wasn't working because
ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES.
This fix monkey-patches the infer() function at module level to always
force chunked=False, preventing the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:24:22 +01:00
5096e3ffb5 feat: add Pivoine custom ComfyUI nodes for DiffRhythm
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension
mismatch error by disabling chunked VAE decoding. The original DiffRhythm
node's overlap=32 parameter conflicts with the VAE's 64-channel architecture.

Changes:
- Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper
- Add comfyui/nodes/__init__.py: Package initialization
- Add arty.yml setup/pivoine-nodes: Deployment script for symlink
- Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun

Technical details:
- Inherits from DiffRhythmRun to avoid upstream patching
- Forces chunked=False in diffrhythmgen() override
- Requires more VRAM (~12-16GB) but RTX 4090 has 24GB
- Category: 🌸Pivoine/Audio for easy identification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 16:28:54 +01:00
073711c017 fix: use correct DiffRhythm parameter order from UI testing
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Correct widgets_values order (11 parameters):
0: model (string)
1: prompt/style_prompt (text)
2: unload_model (boolean)
3: odeint_method (enum)
4: steps (int)
5: cfg (int)
6: quality_or_speed (enum)
7: seed (int)
8: control_after_generate (string)
9: edit (boolean)
10: segments/edit_segments (text)

Updated all four workflows:
- diffrhythm-simple-t2m-v1.json
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:57:25 +01:00
279f703591 fix: correct DiffRhythm workflow parameter order to match function signature
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
The parameters must match the diffrhythmgen() function signature order,
not the INPUT_TYPES order. The function has 'edit' as the first parameter.

Correct widgets_values order (11 parameters):
0: edit (boolean)
1: model (string)
2: style_prompt (string)
3: lyrics_or_edit_lyrics (string)
4: edit_segments (string)
5: odeint_method (enum)
6: steps (int)
7: cfg (int)
8: quality_or_speed (enum)
9: unload_model (boolean)
10: seed (int)

Note: style_audio_or_edit_song comes from input connection (not in widgets)
Note: chunked parameter is hidden (not in widgets)

Updated workflows:
- diffrhythm-simple-t2m-v1.json
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:53:15 +01:00
64db634ab5 fix: correct DiffRhythm workflow parameter order for all three workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Changed edit_segments from "[-1, 20], [60, -1]" to empty string "" at position 11.
This fixes validation errors where parameters were being interpreted as wrong types.

The correct 12-parameter structure is:
0: model (string)
1: style_prompt (string)
2: unload_model (boolean)
3: odeint_method (enum)
4: steps (int)
5: cfg (int)
6: quality_or_speed (enum)
7: seed (int)
8: edit (boolean)
9: edit_lyrics (string, empty)
10: edit_song (string, empty)
11: edit_segments (string, empty)

Updated workflows:
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:48:56 +01:00
56476f4230 fix: add missing edit_song and edit_lyrics parameters to DiffRhythm workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Fix "edit song, edit lyrics, edit segments must be provided" error by adding
the two missing parameters to all three DiffRhythm workflow files:

- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

Added empty string parameters at positions 9 and 10 in widgets_values array:
- edit_song: "" (empty when edit=false)
- edit_lyrics: "" (empty when edit=false)

The DiffRhythmRun node requires 12 parameters total, not 10. These workflows
use edit=false (no editing), so the edit parameters should be empty strings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:55:58 +01:00
744bbd0190 fix: correct FFmpeg version extraction in verification
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-24 12:49:41 +01:00
b011c192f8 feat: add FFmpeg dependencies to system packages
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add FFmpeg and its development libraries to setup/system-packages script:
- ffmpeg: Main FFmpeg executable
- libavcodec-dev: Audio/video codec library
- libavformat-dev: Audio/video format library
- libavutil-dev: Utility library for FFmpeg
- libswscale-dev: Video scaling library

These libraries are required for torchcodec to function properly with
DiffRhythm audio generation. Also added FFmpeg version verification
after installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:48:57 +01:00
a249dfc941 feat: add torchcodec dependency for DiffRhythm audio caching
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add torchcodec to ComfyUI requirements.txt to fix audio tensor caching
error in DiffRhythm. This package is required for save_with_torchcodec
function used by DiffRhythm audio nodes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:44:05 +01:00
19376d90a7 feat: add DiffRhythm eval-model download script
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Add arty script to download eval.yaml and eval.safetensors files from
HuggingFace space for DiffRhythm node support. These files are required
for DiffRhythm evaluation model functionality.

- Add models/diffrhythm-eval script to download eval-model files
- Update setup/comfyui-nodes to create eval-model directory
- Files downloaded from ASLP-lab/DiffRhythm HuggingFace space
- Script includes file verification and size reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:38:44 +01:00
cf3fcafbae feat: add DiffRhythm music generation support
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
- Add DiffRhythm dependencies to requirements.txt (19 packages)
- Add reference audio placeholder for style transfer workflow
- DiffRhythm nodes now loading in ComfyUI
- All four workflows ready for music generation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:17:46 +01:00
8fe87064f8 feat: add DiffRhythm dependencies to ComfyUI requirements
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added all required packages for ComfyUI_DiffRhythm extension:
- torchdiffeq: ODE solvers for diffusion models
- x-transformers: Transformer architecture components
- librosa: Audio analysis and feature extraction
- pandas, pyarrow: Data handling
- ema-pytorch, prefigure: Training utilities
- muq: Music quality model
- mutagen: Audio metadata handling
- pykakasi, jieba, cn2an, pypinyin: Chinese/Japanese text processing
- Unidecode, phonemizer, inflect: Text normalization and phonetic conversion
- py3langid: Language identification

These dependencies enable the DiffRhythm node to load and function properly in ComfyUI, fixing the "ModuleNotFoundError: No module named 'infer_utils'" error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:51:50 +01:00
44762a063c fix: update DiffRhythm workflows with correct node names and parameters
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Updated all 4 DiffRhythm workflow JSON files to use actual node class names from ComfyUI_DiffRhythm:

**Node Name Changes:**
- DiffRhythmTextToMusic → DiffRhythmRun
- DiffRhythmRandomGeneration → DiffRhythmRun (with empty style_prompt)
- DiffRhythmReferenceBasedGeneration → DiffRhythmRun (with audio input)

**Corrected Parameter Structure:**
All workflows now use proper widgets_values array matching DiffRhythmRun INPUT_TYPES:
1. model (string: "cfm_model_v1_2.pt", "cfm_model.pt", or "cfm_full_model.pt")
2. style_prompt (string: multiline text or empty for random)
3. unload_model (boolean: default true)
4. odeint_method (string: "euler", "midpoint", "rk4", "implicit_adams")
5. steps (int: 1-100, default 30)
6. cfg (int: 1-10, default 4)
7. quality_or_speed (string: "quality" or "speed")
8. seed (int: -1 for random, or specific number)
9. edit (boolean: default false)
10. edit_segments (string: "[-1, 20], [60, -1]")

**Workflow-Specific Updates:**

**diffrhythm-simple-t2m-v1.json:**
- Text-to-music workflow for 95s generation
- Uses cfm_model_v1_2.pt with text prompt guidance
- Default settings: steps=30, cfg=4, speed mode, seed=42

**diffrhythm-full-length-t2m-v1.json:**
- Full-length 4m45s (285s) generation
- Uses cfm_full_model.pt for extended compositions
- Quality mode enabled for better results
- Default seed=123

**diffrhythm-reference-based-v1.json:**
- Reference audio + text prompt workflow
- Uses LoadAudio node connected to style_audio_or_edit_song input
- Higher cfg=5 for stronger prompt adherence
- Demonstrates optional audio input connection

**diffrhythm-random-generation-v1.json:**
- Pure random generation (no prompt/guidance)
- Empty style_prompt string
- Minimal cfg=1 for maximum randomness
- Random seed=-1 for unique output each time

**Documentation Updates:**
- Removed PLACEHOLDER notes
- Updated usage sections with correct parameter descriptions
- Added notes about optional MultiLineLyricsDR node for lyrics
- Clarified parameter behavior and recommendations

These workflows are now ready to use in ComfyUI with the installed DiffRhythm extension.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:46:31 +01:00
e9a1536f1d chore: clean up arty.yml - remove unused scripts and envs
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Remove deprecated legacy setup scripts
- Remove unused environment definitions (prod, dev, minimal)
- Remove WebDAV setup script
- Remove redundant model linking script
- Streamline configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:26:52 +01:00
f2186db78e feat: integrate ComfyUI_DiffRhythm extension with 7 models and 4 workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
- Add DiffRhythm to arty.yml references and setup/comfyui-nodes
- Install espeak-ng system dependency for phoneme processing
- Add 7 DiffRhythm models to models_huggingface.yaml with file mappings:
  * ASLP-lab/DiffRhythm-1_2 (95s generation)
  * ASLP-lab/DiffRhythm-full (4m45s generation)
  * ASLP-lab/DiffRhythm-base
  * ASLP-lab/DiffRhythm-vae
  * OpenMuQ/MuQ-MuLan-large
  * OpenMuQ/MuQ-large-msd-iter
  * FacebookAI/xlm-roberta-base
- Create 4 comprehensive workflows:
  * diffrhythm-simple-t2m-v1.json (basic 95s text-to-music)
  * diffrhythm-full-length-t2m-v1.json (4m45s full-length)
  * diffrhythm-reference-based-v1.json (style transfer with reference audio)
  * diffrhythm-random-generation-v1.json (no-prompt random generation)
- Update storage requirements: 90GB essential, 149GB total

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:50:45 +01:00
9439185b3d fix: update Docker registry from Docker Hub to dev.pivoine.art
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 2m8s
- Use Gitea container registry instead of Docker Hub
- Update workflow to use gitea.actor and REGISTRY_TOKEN
- Update documentation to reflect correct registry URL
- Match supervisor-ui workflow configuration
2025-11-23 21:57:14 +01:00
571431955d feat: add RunPod Docker template with automated build workflow
- Add Dockerfile with minimal setup (supervisor, tailscale)
- Add start.sh bootstrap script for container initialization
- Add Gitea workflow for automated Docker image builds
- Add comprehensive RUNPOD_TEMPLATE.md documentation
- Add bootstrap-venvs.sh for Python venv health checks

This enables deployment of the AI orchestrator on RunPod using:
- Minimal Docker image (~2-3GB) for fast deployment
- Network volume for models and data persistence (~80-200GB)
- Automated builds on push to main or version tags
- Full Tailscale VPN integration
- Supervisor process management
2025-11-23 21:53:56 +01:00
0e3150e26c fix: correct Pony Diffusion workflow checkpoint reference
- Changed checkpoint from waiIllustriousSDXL_v150.safetensors to ponyDiffusionV6XL_v6StartWithThisOne.safetensors
- Fixed metadata model reference (was incorrectly referencing LoRA)
- Added files field to models_civitai.yaml for explicit filename mapping
- Aligns workflow with actual Pony Diffusion V6 XL model
2025-11-23 19:57:45 +01:00
f6de19bec1 feat: add files field for embeddings with different filenames
- Add files field to badx-sdxl, pony-pdxl-hq-v3, pony-pdxl-xxx
- Specifies actual downloaded filenames (BadX-neg.pt, zPDXL3.safetensors, zPDXLxxx.pt)
- Allows script to properly link embeddings where YAML name != filename
2025-11-23 19:54:41 +01:00
5770563d9a feat: add comprehensive negative embeddings support (SD 1.5, SDXL, Pony)
- Add 3 new embedding categories to models_civitai.yaml:
  - embeddings_sd15: 6 embeddings (BadDream, UnrealisticDream, badhandv4, EasyNegative, FastNegativeV2, BadNegAnatomyV1-neg)
  - embeddings_sdxl: 1 embedding (BadX v1.1)
  - embeddings_pony: 2 embeddings (zPDXL3, zPDXLxxx)
- Total storage: ~1.1 MB (9 embeddings)
- Add comprehensive embeddings documentation to NSFW README
- Include usage examples, compatibility notes, and syntax guide
- Document embedding weights and recommended combinations
2025-11-23 19:39:18 +01:00
68d3606cab fix: use WAI-NSFW-Illustrious checkpoint instead of non-existent Pony model
Changed checkpoint from 'add-detail-xl.safetensors' (which is a LoRA) to
'waiIllustriousSDXL_v150.safetensors' which is the downloaded anime NSFW model
2025-11-23 19:13:22 +01:00
9ca62724d0 feat: add LoRA models category to CivitAI config with add-detail-xl and siesta 2025-11-23 19:04:19 +01:00
5e7c65a95c feat: add NSFW workflow linking to arty workflow manager
Updated arty.yml workflow linking script to include NSFW workflows:
- Added nsfw_ prefix for NSFW workflow category
- Links 4 NSFW workflows (LUSTIFY, Pony, RealVisXL, Ultimate Upscale)
- Updated workflow count from 20 to 25 total production workflows
- Updated documentation to list all 7 workflow categories

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 18:48:46 +01:00
1d851bb11c feat: add NSFW ComfyUI workflow suite with LoRA fusion and upscaling
Added 5 production-ready workflows to leverage downloaded CivitAI NSFW models:

**NSFW Text-to-Image Workflows (3):**
- lustify-realistic-t2i-production-v1.json - Photorealistic NSFW with LUSTIFY v7.0
  - DPM++ 2M SDE, Exponential scheduler, 30 steps, CFG 6.0
  - Optimized for women in realistic scenarios with professional photography quality
- pony-anime-t2i-production-v1.json - Anime/cartoon/furry with Pony Diffusion V6 XL
  - Euler Ancestral, Normal scheduler, 35 steps, CFG 7.5
  - Danbooru tag support, balanced safe/questionable/explicit content
- realvisxl-lightning-t2i-production-v1.json - Ultra-fast photorealistic with RealVisXL V5.0 Lightning
  - DPM++ SDE Karras, 6 steps (vs 30+), CFG 2.0
  - 4-6 step generation for rapid high-quality output

**Enhancement Workflows (2):**
- lora-fusion-t2i-production-v1.json - Multi-LoRA stacking (text-to-image directory)
  - Stack up to 3 LoRAs with adjustable weights (0.2-1.0)
  - Compatible with all SDXL checkpoints including NSFW models
  - Hierarchical strength control for style mixing and enhancement
- nsfw-ultimate-upscale-production-v1.json - Professional 2x upscaling with LUSTIFY
  - RealESRGAN_x2 + diffusion refinement via Ultimate SD Upscale
  - Tiled processing, optimized for detailed skin texture
  - Denoise 0.25 preserves original composition

**Documentation:**
- Comprehensive README.md with usage examples, API integration, model comparison
- Optimized settings for each workflow based on model recommendations
- Advanced usage guide for LoRA stacking and upscaling pipelines
- Version history tracking

**Total additions:** 1,768 lines across 6 files

These workflows complement the 27GB of CivitAI NSFW models downloaded in previous commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 18:46:22 +01:00
5944767d3f fix: update CivitAI model version IDs to latest versions
- RealVisXL: 904692 → 798204 (V5.0 Lightning)
- WAI-NSFW-illustrious: 1239648 → 2167369 (v15.0)
- Big Lust: 649023 → 1081768 (v1.6)

Fixes 404 errors when downloading these models.
2025-11-23 18:16:32 +01:00
e29f77c90b feat: add dedicated CivitAI NSFW model downloader
- Add models_civitai.yaml with 6 NSFW SDXL checkpoints
- Create artifact_civitai_download.sh with beautiful purple/magenta CLI
- Update .env.example with CIVITAI_API_KEY documentation
- Update CLAUDE.md with CivitAI usage instructions
- Rename comfyui_models.yaml to models_huggingface.yaml for clarity

Features:
- Dedicated config and downloader for CivitAI models
- Same elegant architecture as HuggingFace downloader
- Retry logic, rate limiting, progress bars
- Models: LUSTIFY, Pony Diffusion V6, RealVisXL, etc.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 17:58:25 +01:00
76cf5b5e31 docs: update CLAUDE.md to reflect direct vLLM architecture
- Remove all orchestrator references
- Update to dedicated vLLM server model
- Update service management commands
- Update LiteLLM integration details
- Update testing examples
2025-11-23 16:26:59 +01:00
479201d338 chore: remove orchestrator - replaced with dedicated vLLM servers 2025-11-23 16:24:43 +01:00
1ad99cdb53 refactor: replace orchestrator with dedicated vLLM servers for Qwen and Llama 2025-11-23 16:00:03 +01:00
cc0f55df38 fix: reduce max_model_len to 20000 to fit in 24GB VRAM 2025-11-23 15:43:37 +01:00
5cfd03f1ef fix: improve streaming with proper delta format and increase max_model_len to 32768 2025-11-23 15:38:18 +01:00
3f812704a2 fix: use venv python for vLLM service startup 2025-11-23 15:21:52 +01:00