Compare commits

...

37 Commits

Author SHA1 Message Date
2189697734 refactor: remove type field from models_huggingface.yaml and include type in dest paths
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Prepended ComfyUI model type folder (checkpoints/, clip/, vae/, etc.) to all dest paths
- Removed separate 'type' field from all model entries
- Consolidated SD3.5 duplicate entries (5 → 1)
- Simplified model configuration by embedding directory structure directly in destination paths

This change eliminates the need to parse the 'type' field separately in artifact_huggingface_download.sh,
making the configuration more explicit and easier to understand.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:19:42 +01:00
ff6c1369ae feat: add three new SDXL/SD1.5 image generation models
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add the following models to models_huggingface.yaml:
- John6666/diving-illustrious-real-asian-v50-sdxl: SDXL fine-tune for photorealistic Asian subjects
- playgroundai/playground-v2.5-1024px-aesthetic: High-aesthetic 1024px SDXL-based model
- Lykon/dreamshaper-8: Versatile SD1.5 fine-tune with multi-style support

All models marked as non-essential and will download .safetensors files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 17:29:04 +01:00
aa2cc5973b fix: update CogVideoX models to link sharded transformer files
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- CogVideoX-5b: Link 2 shards + index.json instead of single non-existent file
- CogVideoX-5b-I2V: Link 3 shards + index.json instead of single non-existent file  - Fixes link command failures for these video generation models
- Shards are now properly symlinked to ComfyUI diffusion_models directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:47:09 +01:00
3c6904a253 feat: add comfyui-workspace-manager custom node
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added comfyui-workspace-manager plugin to arty.yml for ComfyUI workflow and model management.

Repository: https://github.com/11cafe/comfyui-workspace-manager

Features:
- Workflow management with version history
- Model browser with one-click CivitAI downloads
- Image gallery per workflow
- Auto-save and keyboard shortcuts

Note: Plugin is marked as obsolete (April 2025) as ComfyUI now has built-in workspace features, but added per user request.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 12:33:00 +01:00
6efb55c59f feat: add complete HunyuanVideo and Wan2.2 video generation integration
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Integrated 35+ video generation models and 13 production workflows from ComfyUI docs tutorials for state-of-the-art text-to-video and image-to-video generation.

Models Added (models_huggingface.yaml):
- HunyuanVideo (5 models): Original T2V/I2V (720p), v1.5 (720p/1080p) with Qwen 2.5 VL
- Wan2.2 diffusion models (18 models):
  - 5B TI2V hybrid (8GB VRAM, efficient)
  - 14B variants: T2V, I2V (high/low noise), Animate, S2V (FP8/BF16), Fun Camera/Control (high/low noise)
- Support models (12): VAEs, UMT5-XXL, CLIP Vision H, Wav2Vec2, LLaVA encoders
- LoRA accelerators (4): Lightx2v 4-step distillation for 5x speedup

Workflows Added (comfyui/workflows/image-to-video/):
- HunyuanVideo (5 workflows): T2V original, I2V v1/v2 (webp embedded), v1.5 T2V/I2V (JSON)
- Wan2.2 (8 workflows): 5B TI2V, 14B T2V/I2V/FLF2V/Animate/S2V/Fun Camera/Fun Control
- Asset files (10): Reference images, videos, audio for workflow testing

Custom Nodes Added (arty.yml):
- ComfyUI-KJNodes: Kijai optimizations for HunyuanVideo/Wan2.2 (FP8 scaling, video helpers)
- comfyui_controlnet_aux: ControlNet preprocessors (Canny, Depth, OpenPose, MLSD) for Fun Control
- ComfyUI-GGUF: GGUF quantization support for memory optimization

VRAM Requirements:
- HunyuanVideo original: 24GB (720p T2V/I2V, 129 frames, 5s generation)
- HunyuanVideo 1.5: 30-60GB (720p/1080p, improved quality with Qwen 2.5 VL)
- Wan2.2 5B: 8GB (efficient dual-expert architecture with native offloading)
- Wan2.2 14B: 24GB (high-quality video generation, all modes)

Note: Wan2.2 Fun Inpaint workflow not available in official templates repository (404).

Tutorial Sources:
- https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video
- https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5
- https://docs.comfy.org/tutorials/video/wan/wan2_2
- https://docs.comfy.org/tutorials/video/wan/wan2-2-animate
- https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-camera
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 10:43:39 +01:00
06b8ec0064 refactor: remove simple ACE Step workflow in favor of official workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Removed acestep-simple-t2m-v1.json as the official Comfy-Org workflows provide better quality:
- acestep-official-t2m-v1.json - Advanced T2M with specialized nodes
- acestep-m2m-editing-v1.json - Music-to-music editing capability

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:46:24 +01:00
e610330b91 feat: add official Comfy-Org ACE Step workflows and example assets
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added 2 official workflows from Comfy-Org/example_workflows:
- acestep-official-t2m-v1.json - Advanced T2M with specialized nodes (50 steps, multiple formats)
- acestep-m2m-editing-v1.json - Music-to-music editing with denoise control

Added 3 audio example assets:
- acestep-m2m-input.mp3 (973 KB) - Example input for M2M editing
- acestep-t2m-output.flac (3.4 MB) - T2M output reference
- acestep-m2m-output.mp3 (998 KB) - M2M output reference

Total: 3 workflows (simple + official T2M + M2M editing) with audio examples

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 09:34:57 +01:00
55b37894b1 fix: remove empty ACE Step workflow placeholders
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Removed 3 empty placeholder workflows that only contained metadata:
- acestep-multilang-t2m-v1.json
- acestep-remix-m2m-v1.json
- acestep-chinese-rap-v1.json

Kept only the functional workflow:
- acestep-simple-t2m-v1.json (6 nodes, fully operational)

Users can use the simple workflow and modify the prompt for different use cases:
- Multi-language: prefix lyrics with language tags like [zh], [ja], [ko]
- Remixing: load audio input and adjust denoise strength (0.1-0.7)
- Chinese RAP: use Chinese RAP LoRA with strength 0.8-1.0

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 08:51:37 +01:00
513062623c feat: integrate ACE Step music generation with 19-language support
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added ACE Step v1 3.5B model for state-of-the-art music generation:
- 15x faster than LLM baselines with superior structural coherence
- Supports 19 languages (en, zh, ja, ko, fr, es, de, it, pt, ru + 9 more)
- Voice cloning, lyric alignment, and multi-genre capabilities

Changes:
- Added ACE Step models to models_huggingface.yaml (checkpoint + Chinese RAP LoRA)
- Added ComfyUI_ACE-Step custom node to arty.yml with installation script
- Created 4 comprehensive workflows in comfyui/workflows/text-to-music/:
  * acestep-simple-t2m-v1.json - Basic 60s text-to-music generation
  * acestep-multilang-t2m-v1.json - 19-language music generation
  * acestep-remix-m2m-v1.json - Music-to-music remixing with style transfer
  * acestep-chinese-rap-v1.json - Chinese hip-hop with specialized LoRA

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 08:40:17 +01:00
5af3eeb333 feat: add BGE embedding service and reorganize supervisor groups
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Add vLLM embedding server for BAAI/bge-large-en-v1.5 (port 8002)
- Reorganize supervisor into two logical groups:
  - comfyui-services: comfyui, webdav-sync
  - vllm-services: vllm-qwen, vllm-llama, vllm-embedding
- Update arty.yml service management scripts for new group structure
- Add individual service control scripts for all vLLM models

Note: Embedding server currently uses placeholder implementation
For production use, switch to sentence-transformers or native vLLM embedding mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:32:01 +01:00
e12a8add61 feat: add vLLM models configuration file
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 35s
Add models_huggingface_vllm.yaml with three vLLM models:
- Qwen/Qwen2.5-7B-Instruct (14GB) - Advanced multilingual reasoning
- meta-llama/Llama-3.1-8B-Instruct (17GB) - Extended 128K context
- BAAI/bge-large-en-v1.5 (1.3GB) - High-quality text embeddings

Total storage: ~32GB

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:12:18 +01:00
6ce989dd91 Remove unused diffrhythm-random-generation workflow
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Removed diffrhythm-random-generation-v1.json as it's no longer needed.
Keeping only the essential DiffRhythm workflows:
- simple text-to-music (95s)
- full-length generation (4m45s)
- reference-based style transfer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 20:34:53 +01:00
d74a7cb7cb fix: replace custom Pivoine node with direct DiffRhythm patch
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Remove custom PivoineDiffRhythmRun wrapper node
- Add git patch file for ComfyUI_DiffRhythm __init__.py
- Patch adds LlamaConfig fix at import time
- Add arty script 'fix/diffrhythm-patch' to apply patch
- Revert all workflows to use original DiffRhythmRun
- Remove startup_patch.py and revert start.sh

This approach is cleaner and more maintainable than wrapping the node.
The patch directly fixes the tensor dimension mismatch (32 vs 64) in
DiffRhythm's rotary position embeddings by ensuring num_attention_heads
and num_key_value_heads are properly set based on hidden_size.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48
2025-11-24 19:27:18 +01:00
f74457b049 fix: apply LlamaConfig patch globally at import time
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Previous approach patched DiT.__init__ at runtime, but models were already
instantiated and cached. This version patches LlamaConfig globally BEFORE
any DiffRhythm imports, ensuring all model instances use the correct config.

Key changes:
- Created PatchedLlamaConfig subclass that auto-calculates attention heads
- Replaced LlamaConfig in transformers.models.llama module at import time
- Patch applies to all LlamaConfig instances, including pre-loaded models

This should finally fix the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 19:00:29 +01:00
91f6e9bd59 fix: patch DiffRhythm DIT to add missing LlamaConfig attention head parameters
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Adds monkey-patch for DiT.__init__() to properly configure LlamaConfig with
num_attention_heads and num_key_value_heads parameters, which are missing
in the upstream DiffRhythm code.

Root cause: transformers 4.49.0+ requires these parameters but DiffRhythm's
dit.py only specifies hidden_size, causing the library to incorrectly infer
head_dim as 32 instead of 64, leading to tensor dimension mismatches.

Solution:
- Sets num_attention_heads = hidden_size // 64 (standard Llama architecture)
- Sets num_key_value_heads = num_attention_heads // 4 (GQA configuration)
- Ensures head_dim = 64, fixing the "tensor a (32) vs tensor b (64)" error

This is a proper fix rather than just downgrading transformers version.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:53:18 +01:00
60ca8b08d0 feat: add arty script to fix DiffRhythm transformers version
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Adds 'fix/diffrhythm-transformers' command to quickly downgrade
transformers library to 4.49.0 for DiffRhythm compatibility.

Usage: arty fix/diffrhythm-transformers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:15:30 +01:00
8c4eb8c3f1 fix: pin transformers to 4.49.0 for DiffRhythm compatibility
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Resolves tensor dimension mismatch error in rotary position embeddings.
DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause
"The size of tensor a (32) must match the size of tensor b (64)" error
due to transformer block initialization changes.

Updated pivoine_diffrhythm.py documentation to reflect actual root cause
and link to upstream GitHub issues #44 and #48.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:14:40 +01:00
67d41c3923 fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
The correct function to patch is decode_audio from infer_utils module,
which is where chunked VAE decoding actually happens. This intercepts
the call at the right level to force chunked=False.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:28:30 +01:00
1981b7b256 fix: monkey-patch DiffRhythm infer function to force chunked=False
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
The previous approach of overriding diffrhythmgen wasn't working because
ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES.
This fix monkey-patches the infer() function at module level to always
force chunked=False, preventing the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:24:22 +01:00
5096e3ffb5 feat: add Pivoine custom ComfyUI nodes for DiffRhythm
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension
mismatch error by disabling chunked VAE decoding. The original DiffRhythm
node's overlap=32 parameter conflicts with the VAE's 64-channel architecture.

Changes:
- Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper
- Add comfyui/nodes/__init__.py: Package initialization
- Add arty.yml setup/pivoine-nodes: Deployment script for symlink
- Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun

Technical details:
- Inherits from DiffRhythmRun to avoid upstream patching
- Forces chunked=False in diffrhythmgen() override
- Requires more VRAM (~12-16GB) but RTX 4090 has 24GB
- Category: 🌸Pivoine/Audio for easy identification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 16:28:54 +01:00
073711c017 fix: use correct DiffRhythm parameter order from UI testing
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Correct widgets_values order (11 parameters):
0: model (string)
1: prompt/style_prompt (text)
2: unload_model (boolean)
3: odeint_method (enum)
4: steps (int)
5: cfg (int)
6: quality_or_speed (enum)
7: seed (int)
8: control_after_generate (string)
9: edit (boolean)
10: segments/edit_segments (text)

Updated all four workflows:
- diffrhythm-simple-t2m-v1.json
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:57:25 +01:00
279f703591 fix: correct DiffRhythm workflow parameter order to match function signature
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
The parameters must match the diffrhythmgen() function signature order,
not the INPUT_TYPES order. The function has 'edit' as the first parameter.

Correct widgets_values order (11 parameters):
0: edit (boolean)
1: model (string)
2: style_prompt (string)
3: lyrics_or_edit_lyrics (string)
4: edit_segments (string)
5: odeint_method (enum)
6: steps (int)
7: cfg (int)
8: quality_or_speed (enum)
9: unload_model (boolean)
10: seed (int)

Note: style_audio_or_edit_song comes from input connection (not in widgets)
Note: chunked parameter is hidden (not in widgets)

Updated workflows:
- diffrhythm-simple-t2m-v1.json
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:53:15 +01:00
64db634ab5 fix: correct DiffRhythm workflow parameter order for all three workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Changed edit_segments from "[-1, 20], [60, -1]" to empty string "" at position 11.
This fixes validation errors where parameters were being interpreted as wrong types.

The correct 12-parameter structure is:
0: model (string)
1: style_prompt (string)
2: unload_model (boolean)
3: odeint_method (enum)
4: steps (int)
5: cfg (int)
6: quality_or_speed (enum)
7: seed (int)
8: edit (boolean)
9: edit_lyrics (string, empty)
10: edit_song (string, empty)
11: edit_segments (string, empty)

Updated workflows:
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 15:48:56 +01:00
56476f4230 fix: add missing edit_song and edit_lyrics parameters to DiffRhythm workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Fix "edit song, edit lyrics, edit segments must be provided" error by adding
the two missing parameters to all three DiffRhythm workflow files:

- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json

Added empty string parameters at positions 9 and 10 in widgets_values array:
- edit_song: "" (empty when edit=false)
- edit_lyrics: "" (empty when edit=false)

The DiffRhythmRun node requires 12 parameters total, not 10. These workflows
use edit=false (no editing), so the edit parameters should be empty strings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:55:58 +01:00
744bbd0190 fix: correct FFmpeg version extraction in verification
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-24 12:49:41 +01:00
b011c192f8 feat: add FFmpeg dependencies to system packages
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add FFmpeg and its development libraries to setup/system-packages script:
- ffmpeg: Main FFmpeg executable
- libavcodec-dev: Audio/video codec library
- libavformat-dev: Audio/video format library
- libavutil-dev: Utility library for FFmpeg
- libswscale-dev: Video scaling library

These libraries are required for torchcodec to function properly with
DiffRhythm audio generation. Also added FFmpeg version verification
after installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:48:57 +01:00
a249dfc941 feat: add torchcodec dependency for DiffRhythm audio caching
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add torchcodec to ComfyUI requirements.txt to fix audio tensor caching
error in DiffRhythm. This package is required for save_with_torchcodec
function used by DiffRhythm audio nodes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:44:05 +01:00
19376d90a7 feat: add DiffRhythm eval-model download script
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Add arty script to download eval.yaml and eval.safetensors files from
HuggingFace space for DiffRhythm node support. These files are required
for DiffRhythm evaluation model functionality.

- Add models/diffrhythm-eval script to download eval-model files
- Update setup/comfyui-nodes to create eval-model directory
- Files downloaded from ASLP-lab/DiffRhythm HuggingFace space
- Script includes file verification and size reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:38:44 +01:00
cf3fcafbae feat: add DiffRhythm music generation support
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
- Add DiffRhythm dependencies to requirements.txt (19 packages)
- Add reference audio placeholder for style transfer workflow
- DiffRhythm nodes now loading in ComfyUI
- All four workflows ready for music generation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 12:17:46 +01:00
8fe87064f8 feat: add DiffRhythm dependencies to ComfyUI requirements
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Added all required packages for ComfyUI_DiffRhythm extension:
- torchdiffeq: ODE solvers for diffusion models
- x-transformers: Transformer architecture components
- librosa: Audio analysis and feature extraction
- pandas, pyarrow: Data handling
- ema-pytorch, prefigure: Training utilities
- muq: Music quality model
- mutagen: Audio metadata handling
- pykakasi, jieba, cn2an, pypinyin: Chinese/Japanese text processing
- Unidecode, phonemizer, inflect: Text normalization and phonetic conversion
- py3langid: Language identification

These dependencies enable the DiffRhythm node to load and function properly in ComfyUI, fixing the "ModuleNotFoundError: No module named 'infer_utils'" error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:51:50 +01:00
44762a063c fix: update DiffRhythm workflows with correct node names and parameters
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Updated all 4 DiffRhythm workflow JSON files to use actual node class names from ComfyUI_DiffRhythm:

**Node Name Changes:**
- DiffRhythmTextToMusic → DiffRhythmRun
- DiffRhythmRandomGeneration → DiffRhythmRun (with empty style_prompt)
- DiffRhythmReferenceBasedGeneration → DiffRhythmRun (with audio input)

**Corrected Parameter Structure:**
All workflows now use proper widgets_values array matching DiffRhythmRun INPUT_TYPES:
1. model (string: "cfm_model_v1_2.pt", "cfm_model.pt", or "cfm_full_model.pt")
2. style_prompt (string: multiline text or empty for random)
3. unload_model (boolean: default true)
4. odeint_method (string: "euler", "midpoint", "rk4", "implicit_adams")
5. steps (int: 1-100, default 30)
6. cfg (int: 1-10, default 4)
7. quality_or_speed (string: "quality" or "speed")
8. seed (int: -1 for random, or specific number)
9. edit (boolean: default false)
10. edit_segments (string: "[-1, 20], [60, -1]")

**Workflow-Specific Updates:**

**diffrhythm-simple-t2m-v1.json:**
- Text-to-music workflow for 95s generation
- Uses cfm_model_v1_2.pt with text prompt guidance
- Default settings: steps=30, cfg=4, speed mode, seed=42

**diffrhythm-full-length-t2m-v1.json:**
- Full-length 4m45s (285s) generation
- Uses cfm_full_model.pt for extended compositions
- Quality mode enabled for better results
- Default seed=123

**diffrhythm-reference-based-v1.json:**
- Reference audio + text prompt workflow
- Uses LoadAudio node connected to style_audio_or_edit_song input
- Higher cfg=5 for stronger prompt adherence
- Demonstrates optional audio input connection

**diffrhythm-random-generation-v1.json:**
- Pure random generation (no prompt/guidance)
- Empty style_prompt string
- Minimal cfg=1 for maximum randomness
- Random seed=-1 for unique output each time

**Documentation Updates:**
- Removed PLACEHOLDER notes
- Updated usage sections with correct parameter descriptions
- Added notes about optional MultiLineLyricsDR node for lyrics
- Clarified parameter behavior and recommendations

These workflows are now ready to use in ComfyUI with the installed DiffRhythm extension.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:46:31 +01:00
e9a1536f1d chore: clean up arty.yml - remove unused scripts and envs
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Remove deprecated legacy setup scripts
- Remove unused environment definitions (prod, dev, minimal)
- Remove WebDAV setup script
- Remove redundant model linking script
- Streamline configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 10:26:52 +01:00
f2186db78e feat: integrate ComfyUI_DiffRhythm extension with 7 models and 4 workflows
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
- Add DiffRhythm to arty.yml references and setup/comfyui-nodes
- Install espeak-ng system dependency for phoneme processing
- Add 7 DiffRhythm models to models_huggingface.yaml with file mappings:
  * ASLP-lab/DiffRhythm-1_2 (95s generation)
  * ASLP-lab/DiffRhythm-full (4m45s generation)
  * ASLP-lab/DiffRhythm-base
  * ASLP-lab/DiffRhythm-vae
  * OpenMuQ/MuQ-MuLan-large
  * OpenMuQ/MuQ-large-msd-iter
  * FacebookAI/xlm-roberta-base
- Create 4 comprehensive workflows:
  * diffrhythm-simple-t2m-v1.json (basic 95s text-to-music)
  * diffrhythm-full-length-t2m-v1.json (4m45s full-length)
  * diffrhythm-reference-based-v1.json (style transfer with reference audio)
  * diffrhythm-random-generation-v1.json (no-prompt random generation)
- Update storage requirements: 90GB essential, 149GB total

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:50:45 +01:00
9439185b3d fix: update Docker registry from Docker Hub to dev.pivoine.art
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 2m8s
- Use Gitea container registry instead of Docker Hub
- Update workflow to use gitea.actor and REGISTRY_TOKEN
- Update documentation to reflect correct registry URL
- Match supervisor-ui workflow configuration
2025-11-23 21:57:14 +01:00
571431955d feat: add RunPod Docker template with automated build workflow
- Add Dockerfile with minimal setup (supervisor, tailscale)
- Add start.sh bootstrap script for container initialization
- Add Gitea workflow for automated Docker image builds
- Add comprehensive RUNPOD_TEMPLATE.md documentation
- Add bootstrap-venvs.sh for Python venv health checks

This enables deployment of the AI orchestrator on RunPod using:
- Minimal Docker image (~2-3GB) for fast deployment
- Network volume for models and data persistence (~80-200GB)
- Automated builds on push to main or version tags
- Full Tailscale VPN integration
- Supervisor process management
2025-11-23 21:53:56 +01:00
0e3150e26c fix: correct Pony Diffusion workflow checkpoint reference
- Changed checkpoint from waiIllustriousSDXL_v150.safetensors to ponyDiffusionV6XL_v6StartWithThisOne.safetensors
- Fixed metadata model reference (was incorrectly referencing LoRA)
- Added files field to models_civitai.yaml for explicit filename mapping
- Aligns workflow with actual Pony Diffusion V6 XL model
2025-11-23 19:57:45 +01:00
f6de19bec1 feat: add files field for embeddings with different filenames
- Add files field to badx-sdxl, pony-pdxl-hq-v3, pony-pdxl-xxx
- Specifies actual downloaded filenames (BadX-neg.pt, zPDXL3.safetensors, zPDXLxxx.pt)
- Allows script to properly link embeddings where YAML name != filename
2025-11-23 19:54:41 +01:00
42 changed files with 36653 additions and 730 deletions

View File

@@ -0,0 +1,114 @@
name: Build and Push RunPod Docker Image
on:
push:
branches:
- main
tags:
- 'v*.*.*'
pull_request:
branches:
- main
workflow_dispatch:
inputs:
tag:
description: 'Custom tag for the image'
required: false
default: 'manual'
env:
REGISTRY: dev.pivoine.art
IMAGE_NAME: valknar/runpod-ai-orchestrator
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
platforms: linux/amd64
- name: Log in to Gitea Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ gitea.actor }}
password: ${{ secrets.REGISTRY_TOKEN }}
- name: Extract metadata (tags, labels)
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
# Tag as 'latest' for main branch
type=raw,value=latest,enable={{is_default_branch}}
# Tag with branch name
type=ref,event=branch
# Tag with PR number
type=ref,event=pr
# Tag with git tag (semver)
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}
# Tag with commit SHA
type=sha,prefix={{branch}}-
# Custom tag from workflow_dispatch
type=raw,value=${{ gitea.event.inputs.tag }},enable=${{ gitea.event_name == 'workflow_dispatch' }}
labels: |
org.opencontainers.image.title=RunPod AI Orchestrator
org.opencontainers.image.description=Minimal Docker template for RunPod deployment with ComfyUI + vLLM orchestration, Supervisor process management, and Tailscale VPN integration
org.opencontainers.image.vendor=valknar
org.opencontainers.image.source=https://dev.pivoine.art/${{ gitea.repository }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
platforms: linux/amd64
push: ${{ gitea.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max
- name: Generate image digest
if: gitea.event_name != 'pull_request'
run: |
echo "### Docker Image Published :rocket:" >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "**Registry:** \`${{ env.REGISTRY }}\`" >> $GITEA_STEP_SUMMARY
echo "**Image:** \`${{ env.IMAGE_NAME }}\`" >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "**Tags:**" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
echo "${{ steps.meta.outputs.tags }}" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "**Pull command:**" >> $GITEA_STEP_SUMMARY
echo "\`\`\`bash" >> $GITEA_STEP_SUMMARY
echo "docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "**Use in RunPod template:**" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
echo "Container Image: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
- name: PR Comment - Image built but not pushed
if: gitea.event_name == 'pull_request'
run: |
echo "### Docker Image Built Successfully :white_check_mark:" >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "Image was built successfully but **not pushed** (PR builds are not published)." >> $GITEA_STEP_SUMMARY
echo "" >> $GITEA_STEP_SUMMARY
echo "**Would be tagged as:**" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY
echo "${{ steps.meta.outputs.tags }}" >> $GITEA_STEP_SUMMARY
echo "\`\`\`" >> $GITEA_STEP_SUMMARY

26
Dockerfile Normal file
View File

@@ -0,0 +1,26 @@
# RunPod AI Orchestrator Template
# Minimal Docker image for ComfyUI + vLLM orchestration
# Models and application code live on network volume at /workspace
FROM runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
# Install Supervisor for process management
RUN pip install --no-cache-dir supervisor
# Install Tailscale for VPN connectivity
RUN curl -fsSL https://tailscale.com/install.sh | sh
# Install additional system utilities
RUN apt-get update && apt-get install -y \
wget \
&& rm -rf /var/lib/apt/lists/*
# Copy the startup script
COPY start.sh /start.sh
RUN chmod +x /start.sh
# Set working directory to /workspace (network volume mount point)
WORKDIR /workspace
# RunPod calls /start.sh by default
CMD ["/start.sh"]

503
RUNPOD_TEMPLATE.md Normal file
View File

@@ -0,0 +1,503 @@
# RunPod Template Setup Guide
This guide explains how to deploy the AI Orchestrator (ComfyUI + vLLM) on RunPod using a custom Docker template and network volume.
## Architecture Overview
The deployment uses a **two-tier strategy**:
1. **Docker Image** (software layer) - Contains system packages, Supervisor, Tailscale
2. **Network Volume** (data layer) - Contains models, ComfyUI installation, venvs, configuration
This approach allows fast pod deployment (~2-3 minutes) while keeping all large files (models, ~80-200GB) on a persistent network volume.
## Prerequisites
- RunPod account with credits
- Docker Hub account (for hosting the template image)
- HuggingFace account with API token (for model downloads)
- Tailscale account with auth key (optional, for VPN access)
## Step 1: Build and Push Docker Image
### Option A: Automated Build (Recommended)
The repository includes a Gitea workflow that automatically builds and pushes the Docker image to your Gitea container registry when you push to the `main` branch or create a version tag.
1. **Configure Gitea Secret:**
- Go to your Gitea repository → Settings → Secrets
- Add `REGISTRY_TOKEN` = your Gitea access token with registry permissions
- (The workflow automatically uses your Gitea username via `gitea.actor`)
2. **Trigger Build:**
```bash
# Push to main branch
git push origin main
# Or create a version tag
git tag v1.0.0
git push origin v1.0.0
```
3. **Monitor Build:**
- Go to Actions tab in Gitea
- Wait for build to complete (~5-10 minutes)
- Note the Docker image name: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest`
### Option B: Manual Build
If you prefer to build manually:
```bash
# From the repository root
cd /path/to/runpod
# Build the image
docker build -t dev.pivoine.art/valknar/runpod-ai-orchestrator:latest .
# Login to your Gitea registry
docker login dev.pivoine.art
# Push to Gitea registry
docker push dev.pivoine.art/valknar/runpod-ai-orchestrator:latest
```
## Step 2: Create Network Volume
Network volumes persist your models and data across pod restarts and rebuilds.
1. **Go to RunPod Dashboard → Storage → Network Volumes**
2. **Click "New Network Volume"**
3. **Configure:**
- **Name**: `ai-orchestrator-models`
- **Size**: `200GB` (adjust based on your needs)
- Essential models only: ~80GB
- All models: ~137-200GB
- **Datacenter**: Choose closest to you (volume tied to datacenter)
4. **Click "Create Volume"**
5. **Note the Volume ID** (e.g., `vol-abc123def456`) for pod deployment
### Storage Requirements
| Configuration | Size | Models Included |
|--------------|------|-----------------|
| Essential | ~80GB | FLUX Schnell, 1-2 SDXL checkpoints, MusicGen Medium |
| Complete | ~137GB | All image/video/audio models from playbook |
| Full + vLLM | ~200GB | Complete + Qwen 2.5 7B + Llama 3.1 8B |
## Step 3: Create RunPod Template
1. **Go to RunPod Dashboard → Templates**
2. **Click "New Template"**
3. **Configure Template Settings:**
**Container Configuration:**
- **Template Name**: `AI Orchestrator (ComfyUI + vLLM)`
- **Template Type**: Docker
- **Container Image**: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest`
- **Container Disk**: `50GB` (for system and temp files)
- **Docker Command**: Leave empty (uses default `/start.sh`)
**Volume Configuration:**
- **Volume Mount Path**: `/workspace`
- **Attach to Network Volume**: Select your volume ID from Step 2
**Port Configuration:**
- **Expose HTTP Ports**: `8188, 9000, 9001`
- `8188` - ComfyUI web interface
- `9000` - Model orchestrator API
- `9001` - Supervisor web UI
- **Expose TCP Ports**: `22` (SSH access)
**Environment Variables:**
```
HF_TOKEN=your_huggingface_token_here
TAILSCALE_AUTHKEY=tskey-auth-your_tailscale_authkey_here
SUPERVISOR_BACKEND_HOST=localhost
SUPERVISOR_BACKEND_PORT=9001
```
**Advanced Settings:**
- **Start Jupyter**: No
- **Start SSH**: Yes (handled by base image)
4. **Click "Save Template"**
## Step 4: First Deployment (Initial Setup)
The first time you deploy, you need to set up the network volume with models and configuration.
### 4.1 Deploy Pod
1. **Go to RunPod Dashboard → Pods**
2. **Click "Deploy"** or "GPU Pods"
3. **Select your custom template**: `AI Orchestrator (ComfyUI + vLLM)`
4. **Configure GPU:**
- **GPU Type**: RTX 4090 (24GB VRAM) or higher
- **Network Volume**: Select your volume from Step 2
- **On-Demand vs Spot**: Choose based on budget
5. **Click "Deploy"**
### 4.2 SSH into Pod
```bash
# Get pod SSH command from RunPod dashboard
ssh root@<pod-ip> -p <port> -i ~/.ssh/id_ed25519
# Or use RunPod web terminal
```
### 4.3 Initial Setup on Network Volume
```bash
# 1. Clone the repository to /workspace/ai
cd /workspace
git clone https://github.com/your-username/runpod.git ai
cd ai
# 2. Create .env file with your credentials
cp .env.example .env
nano .env
# Edit and add:
# HF_TOKEN=your_huggingface_token
# TAILSCALE_AUTHKEY=tskey-auth-your_key
# GPU_TAILSCALE_IP=<will be set automatically>
# 3. Download essential models (this takes 30-60 minutes)
ansible-playbook playbook.yml --tags comfyui-essential
# OR download all models (1-2 hours)
ansible-playbook playbook.yml --tags comfyui-models-all
# 4. Link models to ComfyUI
bash scripts/link-comfyui-models.sh
# OR if arty is available
arty run models/link-comfyui
# 5. Install ComfyUI custom nodes dependencies
cd /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
pip install -r requirements.txt
cd /workspace/ai
# 6. Restart the container to apply all changes
exit
# Go to RunPod dashboard → Stop pod → Start pod
```
### 4.4 Verify Services
After restart, SSH back in and check:
```bash
# Check supervisor status
supervisorctl -c /workspace/supervisord.conf status
# Expected output:
# comfyui RUNNING pid 123, uptime 0:01:00
# (orchestrator is disabled by default - enable for vLLM)
# Test ComfyUI
curl -I http://localhost:8188
# Test Supervisor web UI
curl -I http://localhost:9001
```
## Step 5: Subsequent Deployments
After initial setup, deploying new pods is quick (2-3 minutes):
1. **Deploy pod** with same template + network volume
2. **Wait for startup** (~1-2 minutes for services to start)
3. **Access services:**
- ComfyUI: `http://<pod-ip>:8188`
- Supervisor: `http://<pod-ip>:9001`
**All models, configuration, and data persist on the network volume!**
## Step 6: Access Services
### Via Direct IP (HTTP)
Get pod IP and ports from RunPod dashboard:
```
ComfyUI: http://<pod-ip>:8188
Supervisor UI: http://<pod-ip>:9001
Orchestrator API: http://<pod-ip>:9000
SSH: ssh root@<pod-ip> -p <port>
```
### Via Tailscale VPN (Recommended)
If you configured `TAILSCALE_AUTHKEY`, the pod automatically joins your Tailscale network:
1. **Get Tailscale IP:**
```bash
ssh root@<pod-ip> -p <port>
tailscale ip -4
# Example output: 100.114.60.40
```
2. **Access via Tailscale:**
```
ComfyUI: http://<tailscale-ip>:8188
Supervisor: http://<tailscale-ip>:9001
Orchestrator: http://<tailscale-ip>:9000
SSH: ssh root@<tailscale-ip>
```
3. **Update LiteLLM config** on your VPS with the Tailscale IP
## Service Management
### Start/Stop Services
```bash
# Start all services
supervisorctl -c /workspace/supervisord.conf start all
# Stop all services
supervisorctl -c /workspace/supervisord.conf stop all
# Restart specific service
supervisorctl -c /workspace/supervisord.conf restart comfyui
# View status
supervisorctl -c /workspace/supervisord.conf status
```
### Enable vLLM Models (Text Generation)
By default, only ComfyUI runs (to save VRAM). To enable vLLM:
1. **Stop ComfyUI** (frees up VRAM):
```bash
supervisorctl -c /workspace/supervisord.conf stop comfyui
```
2. **Start orchestrator** (manages vLLM models):
```bash
supervisorctl -c /workspace/supervisord.conf start orchestrator
```
3. **Test text generation:**
```bash
curl -X POST http://localhost:9000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"qwen-2.5-7b","messages":[{"role":"user","content":"Hello"}]}'
```
### Switch Back to ComfyUI
```bash
# Stop orchestrator (stops all vLLM models)
supervisorctl -c /workspace/supervisord.conf stop orchestrator
# Start ComfyUI
supervisorctl -c /workspace/supervisord.conf start comfyui
```
## Updating the Template
When you make changes to code or configuration:
### Update Docker Image
```bash
# 1. Make changes to Dockerfile or start.sh
# 2. Push to repository
git add .
git commit -m "Update template configuration"
git push origin main
# 3. Gitea workflow auto-builds new image
# 4. Terminate old pod and deploy new one with updated image
```
### Update Network Volume Data
```bash
# SSH into running pod
ssh root@<pod-ip> -p <port>
# Update repository
cd /workspace/ai
git pull
# Re-run Ansible if needed
ansible-playbook playbook.yml --tags <specific-tag>
# Restart services
supervisorctl -c /workspace/supervisord.conf restart all
```
## Troubleshooting
### Pod fails to start
**Check logs:**
```bash
# Via SSH
cat /workspace/logs/supervisord.log
cat /workspace/logs/comfyui.err.log
# Via RunPod web terminal
tail -f /workspace/logs/*.log
```
**Common issues:**
- Missing `.env` file → Create `/workspace/ai/.env` with required vars
- Supervisor config not found → Ensure `/workspace/ai/supervisord.conf` exists
- Port conflicts → Check if services are already running
### Tailscale not connecting
**Check Tailscale status:**
```bash
tailscale status
tailscale ip -4
```
**Common issues:**
- Missing or invalid `TAILSCALE_AUTHKEY` in `.env`
- Auth key expired → Generate new key in Tailscale admin
- Firewall blocking → RunPod should allow Tailscale by default
### Services not starting
**Check Supervisor:**
```bash
supervisorctl -c /workspace/supervisord.conf status
supervisorctl -c /workspace/supervisord.conf tail -f comfyui
```
**Common issues:**
- venv broken → Re-run `scripts/bootstrap-venvs.sh`
- Models not downloaded → Run Ansible playbook again
- Python version mismatch → Rebuild venvs
### Out of VRAM
**Check GPU memory:**
```bash
nvidia-smi
```
**RTX 4090 (24GB) capacity:**
- ComfyUI (FLUX Schnell): ~23GB (can't run with vLLM)
- vLLM (Qwen 2.5 7B): ~14GB
- vLLM (Llama 3.1 8B): ~17GB
**Solution:** Only run one service at a time (see Service Management section)
### Network volume full
**Check disk usage:**
```bash
df -h /workspace
du -sh /workspace/*
```
**Clean up:**
```bash
# Remove old HuggingFace cache
rm -rf /workspace/huggingface_cache
# Re-download essential models only
cd /workspace/ai
ansible-playbook playbook.yml --tags comfyui-essential
```
## Cost Optimization
### Spot vs On-Demand
- **Spot instances**: ~70% cheaper, can be interrupted
- **On-Demand**: More expensive, guaranteed availability
**Recommendation:** Use spot for development, on-demand for production
### Network Volume Pricing
- First 1TB: $0.07/GB/month
- Beyond 1TB: $0.05/GB/month
**200GB volume cost:** ~$14/month
### Pod Auto-Stop
Configure auto-stop in RunPod pod settings to save costs when idle:
- Stop after 15 minutes idle
- Stop after 1 hour idle
- Manual stop only
## Advanced Configuration
### Custom Environment Variables
Add to template or pod environment variables:
```bash
# Model cache locations
HF_HOME=/workspace/huggingface_cache
TRANSFORMERS_CACHE=/workspace/huggingface_cache
# ComfyUI settings
COMFYUI_PORT=8188
COMFYUI_LISTEN=0.0.0.0
# Orchestrator settings
ORCHESTRATOR_PORT=9000
# GPU settings
CUDA_VISIBLE_DEVICES=0
```
### Multiple Network Volumes
You can attach multiple network volumes for organization:
1. **Models volume** - `/workspace/models` (read-only, shared)
2. **Data volume** - `/workspace/data` (read-write, per-project)
### Custom Startup Script
Override `/start.sh` behavior by creating `/workspace/custom-start.sh`:
```bash
#!/bin/bash
# Custom startup commands
# Source default startup
source /start.sh
# Add your custom commands here
echo "Running custom initialization..."
```
## References
- [RunPod Documentation](https://docs.runpod.io/)
- [RunPod Templates Overview](https://docs.runpod.io/pods/templates/overview)
- [Network Volumes Guide](https://docs.runpod.io/storage/network-volumes)
- [ComfyUI Documentation](https://github.com/comfyanonymous/ComfyUI)
- [Supervisor Documentation](http://supervisord.org/)
- [Tailscale Documentation](https://tailscale.com/kb/)
## Support
For issues or questions:
- Check troubleshooting section above
- Review `/workspace/logs/` files
- Check RunPod community forums
- Open issue in project repository

450
arty.yml
View File

@@ -63,11 +63,41 @@ references:
description: "MusicGen and Stable Audio integration"
essential: false
- url: https://github.com/billwuhao/ComfyUI_DiffRhythm.git
into: $COMFYUI_ROOT/custom_nodes/ComfyUI_DiffRhythm
description: "DiffRhythm - Full-length song generation (up to 4m45s) with text/audio conditioning"
essential: false
- url: https://github.com/billwuhao/ComfyUI_ACE-Step.git
into: $COMFYUI_ROOT/custom_nodes/ComfyUI_ACE-Step
description: "ACE Step - State-of-the-art music generation with 19-language support, voice cloning, and superior coherence"
essential: false
- url: https://github.com/ssitu/ComfyUI_UltimateSDUpscale.git
into: $COMFYUI_ROOT/custom_nodes/ComfyUI_UltimateSDUpscale
description: "Ultimate SD Upscale for high-quality image upscaling"
essential: false
- url: https://github.com/kijai/ComfyUI-KJNodes.git
into: $COMFYUI_ROOT/custom_nodes/ComfyUI-KJNodes
description: "Kijai optimizations for HunyuanVideo and Wan2.2 (FP8 scaling, video helpers, model loading)"
essential: true
- url: https://github.com/Fannovel16/comfyui_controlnet_aux.git
into: $COMFYUI_ROOT/custom_nodes/comfyui_controlnet_aux
description: "ControlNet preprocessors (Canny, Depth, OpenPose, MLSD) for Wan2.2 Fun Control"
essential: true
- url: https://github.com/city96/ComfyUI-GGUF.git
into: $COMFYUI_ROOT/custom_nodes/ComfyUI-GGUF
description: "GGUF quantization support for memory-efficient model loading"
essential: false
- url: https://github.com/11cafe/comfyui-workspace-manager.git
into: $COMFYUI_ROOT/custom_nodes/comfyui-workspace-manager
description: "Workspace manager for ComfyUI - workflow/model organization (obsolete but requested)"
essential: false
# Environment profiles for selective repository management
envs:
# RunPod environment variables
@@ -78,37 +108,6 @@ envs:
LOGS_DIR: /workspace/logs
BIN_DIR: /workspace/bin
# Production: Only essential components
prod:
- $AI_ROOT
- $COMFYUI_ROOT
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Manager
- $COMFYUI_ROOT/custom_nodes/ComfyUI-VideoHelperSuite
- $COMFYUI_ROOT/custom_nodes/ComfyUI-AnimateDiff-Evolved
- $COMFYUI_ROOT/custom_nodes/ComfyUI_IPAdapter_plus
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Impact-Pack
# Development: All repositories including optional nodes
dev:
- $AI_ROOT
- $COMFYUI_ROOT
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Manager
- $COMFYUI_ROOT/custom_nodes/ComfyUI-VideoHelperSuite
- $COMFYUI_ROOT/custom_nodes/ComfyUI-AnimateDiff-Evolved
- $COMFYUI_ROOT/custom_nodes/ComfyUI_IPAdapter_plus
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Impact-Pack
- $COMFYUI_ROOT/custom_nodes/ComfyUI-CogVideoXWrapper
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Inspire-Pack
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Advanced-ControlNet
- $COMFYUI_ROOT/custom_nodes/ComfyUI-3D-Pack
- $COMFYUI_ROOT/custom_nodes/comfyui-sound-lab
# Minimal: Only orchestrator and ComfyUI base
minimal:
- $AI_ROOT
- $COMFYUI_ROOT
- $COMFYUI_ROOT/custom_nodes/ComfyUI-Manager
# Deployment scripts for RunPod instances
scripts:
#
@@ -165,11 +164,23 @@ scripts:
htop \
tmux \
net-tools \
davfs2
davfs2 \
ffmpeg \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev
echo ""
echo "✓ System packages installed successfully"
# Verify FFmpeg installation
if ffmpeg -version > /dev/null 2>&1; then
echo "✓ FFmpeg installed: $(ffmpeg -version | head -1 | cut -d ' ' -f3)"
else
echo "❌ WARNING: FFmpeg not found"
fi
setup/python-env: |
echo "========================================="
echo " Setting Up Python Environment"
@@ -279,43 +290,67 @@ scripts:
echo "========================================="
echo ""
# Install system dependencies
echo "Installing system dependencies..."
sudo apt-get update -qq
sudo apt-get install -y -qq espeak-ng
echo "✓ System dependencies installed (espeak-ng)"
echo ""
cd $COMFYUI_ROOT/custom_nodes
# ComfyUI Manager
echo "[1/5] Installing ComfyUI-Manager..."
echo "[1/6] Installing ComfyUI-Manager..."
if [ ! -d "ComfyUI-Manager" ]; then
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
fi
[ -f "ComfyUI-Manager/requirements.txt" ] && sudo pip3 install -r ComfyUI-Manager/requirements.txt
# VideoHelperSuite
echo "[2/5] Installing ComfyUI-VideoHelperSuite..."
echo "[2/6] Installing ComfyUI-VideoHelperSuite..."
if [ ! -d "ComfyUI-VideoHelperSuite" ]; then
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
fi
[ -f "ComfyUI-VideoHelperSuite/requirements.txt" ] && sudo pip3 install -r ComfyUI-VideoHelperSuite/requirements.txt
# AnimateDiff-Evolved
echo "[3/5] Installing ComfyUI-AnimateDiff-Evolved..."
echo "[3/6] Installing ComfyUI-AnimateDiff-Evolved..."
if [ ! -d "ComfyUI-AnimateDiff-Evolved" ]; then
git clone https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
fi
[ -f "ComfyUI-AnimateDiff-Evolved/requirements.txt" ] && sudo pip3 install -r ComfyUI-AnimateDiff-Evolved/requirements.txt
# IPAdapter Plus
echo "[4/5] Installing ComfyUI_IPAdapter_plus..."
echo "[4/6] Installing ComfyUI_IPAdapter_plus..."
if [ ! -d "ComfyUI_IPAdapter_plus" ]; then
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git
fi
[ -f "ComfyUI_IPAdapter_plus/requirements.txt" ] && sudo pip3 install -r ComfyUI_IPAdapter_plus/requirements.txt
# Impact-Pack
echo "[5/5] Installing ComfyUI-Impact-Pack..."
echo "[5/6] Installing ComfyUI-Impact-Pack..."
if [ ! -d "ComfyUI-Impact-Pack" ]; then
git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack.git
fi
[ -f "ComfyUI-Impact-Pack/requirements.txt" ] && sudo pip3 install -r ComfyUI-Impact-Pack/requirements.txt
# DiffRhythm
echo "[6/6] Installing ComfyUI_DiffRhythm..."
if [ ! -d "ComfyUI_DiffRhythm" ]; then
git clone https://github.com/billwuhao/ComfyUI_DiffRhythm.git
fi
if [ -f "ComfyUI_DiffRhythm/requirements.txt" ]; then
cd $COMFYUI_ROOT
source venv/bin/activate
pip install -r custom_nodes/ComfyUI_DiffRhythm/requirements.txt
deactivate
cd custom_nodes
fi
# Create DiffRhythm model directories
echo "Creating DiffRhythm model directories..."
mkdir -p $COMFYUI_ROOT/models/TTS/DiffRhythm/{MuQ-large-msd-iter,MuQ-MuLan-large,xlm-roberta-base,eval-model}
# Fix numpy version for vLLM compatibility
echo "Fixing numpy version..."
sudo pip3 install 'numpy<2.0.0' --force-reinstall
@@ -327,6 +362,144 @@ scripts:
echo " - AnimateDiff-Evolved: Video generation"
echo " - IPAdapter_plus: Style transfer"
echo " - Impact-Pack: Face enhancement"
echo " - DiffRhythm: Full-length song generation"
models/diffrhythm-eval: |
echo "========================================="
echo " Downloading DiffRhythm Eval Model"
echo "========================================="
echo ""
# Create eval-model directory
mkdir -p $COMFYUI_ROOT/models/TTS/DiffRhythm/eval-model
cd $COMFYUI_ROOT/models/TTS/DiffRhythm/eval-model
# Download eval.yaml (129 bytes)
echo "Downloading eval.yaml..."
curl -L -o eval.yaml "https://huggingface.co/spaces/ASLP-lab/DiffRhythm/resolve/main/pretrained/eval.yaml"
# Download eval.safetensors (101 MB)
echo "Downloading eval.safetensors (101 MB)..."
curl -L -o eval.safetensors "https://huggingface.co/spaces/ASLP-lab/DiffRhythm/resolve/main/pretrained/eval.safetensors"
# Verify files
if [ -f "eval.yaml" ] && [ -f "eval.safetensors" ]; then
echo ""
echo "✓ DiffRhythm eval-model files downloaded successfully"
echo " - eval.yaml: $(du -h eval.yaml | cut -f1)"
echo " - eval.safetensors: $(du -h eval.safetensors | cut -f1)"
else
echo "❌ ERROR: Failed to download eval-model files"
exit 1
fi
setup/comfyui-acestep: |
echo "========================================="
echo " Installing ACE Step Custom Node"
echo "========================================="
echo ""
cd $COMFYUI_ROOT/custom_nodes
# Clone repository if not exists
if [ ! -d "ComfyUI_ACE-Step" ]; then
echo "Cloning ComfyUI_ACE-Step repository..."
git clone https://github.com/billwuhao/ComfyUI_ACE-Step.git
else
echo "ComfyUI_ACE-Step already exists, skipping clone"
fi
# Install dependencies in ComfyUI venv
echo ""
echo "Installing ACE Step dependencies..."
cd $COMFYUI_ROOT
source venv/bin/activate
pip install -r custom_nodes/ComfyUI_ACE-Step/requirements.txt
deactivate
echo ""
echo "✓ ACE Step custom node installed successfully"
echo " Note: Download models separately using:"
echo " bash /workspace/bin/artifact_huggingface_download.sh download -c models_huggingface.yaml --category audio_models"
setup/pivoine-nodes: |
echo "========================================="
echo " Linking Pivoine Custom Nodes"
echo "========================================="
echo ""
NODES_SRC="/workspace/ai/comfyui/nodes"
NODES_DEST="/workspace/ComfyUI/custom_nodes/ComfyUI_Pivoine"
# Remove existing symlink if present
if [ -L "$NODES_DEST" ] || [ -d "$NODES_DEST" ]; then
echo "Removing existing: $NODES_DEST"
rm -rf "$NODES_DEST"
fi
# Create symlink
ln -s "$NODES_SRC" "$NODES_DEST"
echo ""
echo "✓ Pivoine custom nodes linked"
echo " Source: $NODES_SRC"
echo " Linked: $NODES_DEST"
echo ""
echo "Available Pivoine nodes:"
echo " 🌸 PivoineDiffRhythmRun - DiffRhythm with chunked disabled"
echo ""
echo "Category: 🌸Pivoine/Audio"
fix/diffrhythm-patch: |
echo "========================================="
echo " Apply DiffRhythm LlamaConfig Patch"
echo "========================================="
echo ""
echo "Issue: Tensor dimension mismatch (32 vs 64) in rotary embeddings"
echo "Solution: Patch DiffRhythm __init__.py to fix LlamaConfig"
echo ""
echo "References:"
echo " - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44"
echo " - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48"
echo ""
DIFF_RHYTHM_DIR="/workspace/ComfyUI/custom_nodes/ComfyUI_DiffRhythm"
PATCH_FILE="/workspace/ai/comfyui/patches/diffrhythm-llamaconfig-fix.patch"
if [ ! -d "$DIFF_RHYTHM_DIR" ]; then
echo "✗ Error: DiffRhythm not found at $DIFF_RHYTHM_DIR"
exit 1
fi
if [ ! -f "$PATCH_FILE" ]; then
echo "✗ Error: Patch file not found at $PATCH_FILE"
exit 1
fi
cd "$DIFF_RHYTHM_DIR"
echo "Checking if patch already applied..."
if grep -q "PatchedLlamaConfig" __init__.py; then
echo "✓ Patch already applied!"
exit 0
fi
echo "Applying patch..."
patch -p1 < "$PATCH_FILE"
if [ $? -eq 0 ]; then
echo ""
echo "✓ Patch applied successfully!"
echo ""
echo "Next steps:"
echo " 1. Restart ComfyUI: arty services/comfyui/restart"
echo " 2. Test DiffRhythm workflows"
else
echo ""
echo "✗ Failed to apply patch"
echo "You may need to manually apply the patch or check for conflicts"
exit 1
fi
setup/comfyui-extensions-deps: |
echo "========================================="
@@ -436,58 +609,6 @@ scripts:
echo "To manage: supervisorctl status"
echo "Web UI: http://localhost:9001 (admin/runpod2024)"
setup/webdav: |
echo "========================================="
echo " Setting Up WebDAV Mount (HiDrive)"
echo "========================================="
echo ""
# Install davfs2 if not present
if ! command -v mount.davfs >/dev/null 2>&1; then
echo "Installing davfs2..."
DEBIAN_FRONTEND=noninteractive apt update && DEBIAN_FRONTEND=noninteractive apt install -y davfs2
fi
# Create mount point
echo "Creating mount point..."
mkdir -p /mnt/hidrive
# Create davfs2 secrets file
echo "Configuring WebDAV credentials..."
mkdir -p /etc/davfs2
echo "https://webdav.hidrive.ionos.com/ valknar MwRTW4hR.eRbipQ" | tee /etc/davfs2/secrets > /dev/null
chmod 600 /etc/davfs2/secrets
# Configure davfs2
sed -i 's/# use_locks 1/use_locks 0/' /etc/davfs2/davfs2.conf 2>/dev/null || true
# Mount WebDAV
echo "Mounting HiDrive WebDAV..."
if mount -t davfs https://webdav.hidrive.ionos.com/ /mnt/hidrive; then
echo "✓ HiDrive mounted successfully"
else
echo "⚠ Warning: Mount failed, you may need to mount manually"
echo " Try: mount -t davfs https://webdav.hidrive.ionos.com/ /mnt/hidrive"
fi
# Create ComfyUI output directory
echo "Creating ComfyUI output directory..."
mkdir -p /mnt/hidrive/users/valknar/Pictures/AI/ComfyUI
# Create symlink in ComfyUI
echo "Creating symlink in ComfyUI..."
ln -sf /mnt/hidrive/users/valknar/Pictures/AI/ComfyUI $COMFYUI_ROOT/output_hidrive
echo ""
echo "✓ WebDAV setup complete"
echo ""
echo "Mount point: /mnt/hidrive"
echo "ComfyUI output: /mnt/hidrive/users/valknar/Pictures/AI/ComfyUI"
echo "ComfyUI symlink: $COMFYUI_ROOT/output_hidrive"
echo ""
echo "To unmount: umount /mnt/hidrive"
echo "To remount: mount -t davfs https://webdav.hidrive.ionos.com/ /mnt/hidrive"
#
# Utility Scripts
#
@@ -575,53 +696,6 @@ scripts:
echo " 3. Name: multi-modal-ai-v2.0"
echo " 4. Save and test deployment"
#
# Orchestration Scripts
#
install/minimal: |
echo "========================================="
echo " Minimal Installation"
echo "========================================="
echo ""
echo "Installing: System + Python + ComfyUI + Supervisor"
echo ""
arty run setup/system-packages && \
arty run setup/python-env && \
arty run setup/comfyui-base && \
arty run setup/supervisor
echo ""
echo "✓ Minimal installation complete"
echo ""
echo "Next steps:"
echo " 1. Download models: Use Ansible playbook"
echo " 2. Link models: arty run models/link-comfyui"
echo " 3. Start services: arty run services/start"
install/essential: |
echo "========================================="
echo " Essential Installation"
echo "========================================="
echo ""
echo "Installing: System + Python + ComfyUI + Nodes + Supervisor"
echo ""
arty run setup/system-packages && \
arty run setup/python-env && \
arty run setup/comfyui-base && \
arty run setup/comfyui-nodes && \
arty run setup/supervisor
echo ""
echo "✓ Essential installation complete"
echo ""
echo "Next steps:"
echo " 1. Download models: ansible-playbook playbook.yml --tags comfyui-essential"
echo " 2. Link models: arty run models/link-comfyui"
echo " 3. Link workflows: arty run workflows/link-comfyui"
echo " 4. Start services: arty run services/start"
install/full: |
echo "========================================="
echo " Full Installation"
@@ -647,39 +721,6 @@ scripts:
echo " 4. Configure Tailscale (see instructions above)"
echo " 5. Start services: arty run services/start"
#
# Legacy Setup (deprecated - use install/* instead)
#
setup/full-legacy: |
cd $AI_ROOT
cp .env.example .env
echo "⚠ DEPRECATED: Use 'arty run install/full' instead"
echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml"
setup/essential-legacy: |
cd $AI_ROOT
cp .env.example .env
echo "⚠ DEPRECATED: Use 'arty run install/essential' instead"
echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml --tags comfyui-essential"
# Model linking (run after models are downloaded)
models/link-comfyui: |
cd $COMFYUI_ROOT/models/diffusers
ln -sf $HF_CACHE/models--black-forest-labs--FLUX.1-schnell FLUX.1-schnell
ln -sf $HF_CACHE/models--black-forest-labs--FLUX.1-dev FLUX.1-dev
ln -sf $HF_CACHE/models--stabilityai--stable-diffusion-xl-base-1.0 stable-diffusion-xl-base-1.0
ln -sf $HF_CACHE/models--stabilityai--stable-diffusion-xl-refiner-1.0 stable-diffusion-xl-refiner-1.0
ln -sf $HF_CACHE/models--stabilityai--stable-diffusion-3.5-large stable-diffusion-3.5-large
cd $COMFYUI_ROOT/models/clip_vision
ln -sf $HF_CACHE/models--openai--clip-vit-large-patch14 clip-vit-large-patch14
ln -sf $HF_CACHE/models--laion--CLIP-ViT-bigG-14-laion2B-39B-b160k CLIP-ViT-bigG-14
ln -sf $HF_CACHE/models--google--siglip-so400m-patch14-384 siglip-so400m-patch14-384
cd $COMFYUI_ROOT/models/diffusion_models
ln -sf $HF_CACHE/models--THUDM--CogVideoX-5b CogVideoX-5b
ln -sf $HF_CACHE/models--stabilityai--stable-video-diffusion-img2vid stable-video-diffusion-img2vid
ln -sf $HF_CACHE/models--stabilityai--stable-video-diffusion-img2vid-xt stable-video-diffusion-img2vid-xt
echo "Models linked to ComfyUI"
# Workflow linking (link production workflows with category prefixes)
workflows/link-comfyui: |
# Create ComfyUI user workflows directory
@@ -774,38 +815,65 @@ scripts:
# Service Management (Supervisor-based)
#
# All services
services/start: supervisorctl -c /workspace/supervisord.conf start ai-services:*
services/stop: supervisorctl -c /workspace/supervisord.conf stop ai-services:*
services/restart: supervisorctl -c /workspace/supervisord.conf restart ai-services:*
services/start: supervisorctl -c /workspace/supervisord.conf start all
services/stop: supervisorctl -c /workspace/supervisord.conf stop all
services/restart: supervisorctl -c /workspace/supervisord.conf restart all
services/status: supervisorctl -c /workspace/supervisord.conf status
# ComfyUI service
services/comfyui/start: supervisorctl -c /workspace/supervisord.conf start ai-services:comfyui
services/comfyui/stop: supervisorctl -c /workspace/supervisord.conf stop ai-services:comfyui
services/comfyui/restart: supervisorctl -c /workspace/supervisord.conf restart ai-services:comfyui
services/comfyui/status: supervisorctl -c /workspace/supervisord.conf status ai-services:comfyui
services/comfyui/logs: supervisorctl -c /workspace/supervisord.conf tail -f ai-services:comfyui
# ComfyUI services group
services/comfyui-group/start: supervisorctl -c /workspace/supervisord.conf start comfyui-services:*
services/comfyui-group/stop: supervisorctl -c /workspace/supervisord.conf stop comfyui-services:*
services/comfyui-group/restart: supervisorctl -c /workspace/supervisord.conf restart comfyui-services:*
services/comfyui-group/status: supervisorctl -c /workspace/supervisord.conf status comfyui-services:*
# Orchestrator service
services/orchestrator/start: supervisorctl -c /workspace/supervisord.conf start ai-services:orchestrator
services/orchestrator/stop: supervisorctl -c /workspace/supervisord.conf stop ai-services:orchestrator
services/orchestrator/restart: supervisorctl -c /workspace/supervisord.conf restart ai-services:orchestrator
services/orchestrator/status: supervisorctl -c /workspace/supervisord.conf status ai-services:orchestrator
services/orchestrator/logs: supervisorctl -c /workspace/supervisord.conf tail -f ai-services:orchestrator
# vLLM services group
services/vllm-group/start: supervisorctl -c /workspace/supervisord.conf start vllm-services:*
services/vllm-group/stop: supervisorctl -c /workspace/supervisord.conf stop vllm-services:*
services/vllm-group/restart: supervisorctl -c /workspace/supervisord.conf restart vllm-services:*
services/vllm-group/status: supervisorctl -c /workspace/supervisord.conf status vllm-services:*
# ComfyUI service
services/comfyui/start: supervisorctl -c /workspace/supervisord.conf start comfyui-services:comfyui
services/comfyui/stop: supervisorctl -c /workspace/supervisord.conf stop comfyui-services:comfyui
services/comfyui/restart: supervisorctl -c /workspace/supervisord.conf restart comfyui-services:comfyui
services/comfyui/status: supervisorctl -c /workspace/supervisord.conf status comfyui-services:comfyui
services/comfyui/logs: supervisorctl -c /workspace/supervisord.conf tail -f comfyui-services:comfyui
# WebDAV Sync service
services/webdav-sync/start: supervisorctl -c /workspace/supervisord.conf start ai-services:webdav-sync
services/webdav-sync/stop: supervisorctl -c /workspace/supervisord.conf stop ai-services:webdav-sync
services/webdav-sync/restart: supervisorctl -c /workspace/supervisord.conf restart ai-services:webdav-sync
services/webdav-sync/status: supervisorctl -c /workspace/supervisord.conf status ai-services:webdav-sync
services/webdav-sync/logs: supervisorctl -c /workspace/supervisord.conf tail -f ai-services:webdav-sync
services/webdav-sync/start: supervisorctl -c /workspace/supervisord.conf start comfyui-services:webdav-sync
services/webdav-sync/stop: supervisorctl -c /workspace/supervisord.conf stop comfyui-services:webdav-sync
services/webdav-sync/restart: supervisorctl -c /workspace/supervisord.conf restart comfyui-services:webdav-sync
services/webdav-sync/status: supervisorctl -c /workspace/supervisord.conf status comfyui-services:webdav-sync
services/webdav-sync/logs: supervisorctl -c /workspace/supervisord.conf tail -f comfyui-services:webdav-sync
# vLLM Qwen service
services/vllm-qwen/start: supervisorctl -c /workspace/supervisord.conf start vllm-services:vllm-qwen
services/vllm-qwen/stop: supervisorctl -c /workspace/supervisord.conf stop vllm-services:vllm-qwen
services/vllm-qwen/restart: supervisorctl -c /workspace/supervisord.conf restart vllm-services:vllm-qwen
services/vllm-qwen/status: supervisorctl -c /workspace/supervisord.conf status vllm-services:vllm-qwen
services/vllm-qwen/logs: supervisorctl -c /workspace/supervisord.conf tail -f vllm-services:vllm-qwen
# vLLM Llama service
services/vllm-llama/start: supervisorctl -c /workspace/supervisord.conf start vllm-services:vllm-llama
services/vllm-llama/stop: supervisorctl -c /workspace/supervisord.conf stop vllm-services:vllm-llama
services/vllm-llama/restart: supervisorctl -c /workspace/supervisord.conf restart vllm-services:vllm-llama
services/vllm-llama/status: supervisorctl -c /workspace/supervisord.conf status vllm-services:vllm-llama
services/vllm-llama/logs: supervisorctl -c /workspace/supervisord.conf tail -f vllm-services:vllm-llama
# vLLM Embedding service
services/vllm-embedding/start: supervisorctl -c /workspace/supervisord.conf start vllm-services:vllm-embedding
services/vllm-embedding/stop: supervisorctl -c /workspace/supervisord.conf stop vllm-services:vllm-embedding
services/vllm-embedding/restart: supervisorctl -c /workspace/supervisord.conf restart vllm-services:vllm-embedding
services/vllm-embedding/status: supervisorctl -c /workspace/supervisord.conf status vllm-services:vllm-embedding
services/vllm-embedding/logs: supervisorctl -c /workspace/supervisord.conf tail -f vllm-services:vllm-embedding
#
# Health Checks
#
health/orchestrator: curl http://localhost:9000/health
health/comfyui: curl http://localhost:8188
health/vllm: curl http://localhost:8000/health
health/vllm-qwen: curl http://localhost:8000/health
health/vllm-llama: curl http://localhost:8001/health
health/vllm-embedding: curl http://localhost:8002/health
#
# System Checks

View File

@@ -0,0 +1,56 @@
diff --git a/__init__.py b/__init__.py
index 1234567..abcdefg 100644
--- a/__init__.py
+++ b/__init__.py
@@ -1,3 +1,51 @@
+"""
+DiffRhythm ComfyUI Node with LlamaConfig Patch
+
+PATCH: Fixes "The size of tensor a (32) must match the size of tensor b (64)" error
+in DiffRhythm's rotary position embeddings by patching LlamaConfig initialization.
+
+Issue: DiffRhythm's DIT model doesn't specify num_attention_heads and
+num_key_value_heads when creating LlamaConfig, causing transformers 4.49.0+
+to incorrectly infer head_dim = 32 instead of 64.
+
+Solution: Patch LlamaConfig globally before importing DiffRhythmNode.
+
+Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
+Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48
+
+Patch author: valknar@pivoine.art
+"""
+
+# CRITICAL: Patch LlamaConfig BEFORE importing DiffRhythmNode
+from transformers.models.llama import LlamaConfig as _OriginalLlamaConfig
+
+class PatchedLlamaConfig(_OriginalLlamaConfig):
+ """
+ Patched LlamaConfig that automatically adds missing attention head parameters.
+
+ Standard Llama architecture assumptions:
+ - head_dim = 64 (fixed)
+ - num_attention_heads = hidden_size // head_dim
+ - num_key_value_heads = num_attention_heads // 4 (for GQA)
+ """
+ def __init__(self, *args, **kwargs):
+ # If hidden_size is provided but num_attention_heads is not, calculate it
+ if 'hidden_size' in kwargs and 'num_attention_heads' not in kwargs:
+ hidden_size = kwargs['hidden_size']
+ kwargs['num_attention_heads'] = hidden_size // 64
+
+ # If num_key_value_heads is not provided, use GQA configuration
+ if 'num_attention_heads' in kwargs and 'num_key_value_heads' not in kwargs:
+ kwargs['num_key_value_heads'] = max(1, kwargs['num_attention_heads'] // 4)
+
+ super().__init__(*args, **kwargs)
+
+# Replace LlamaConfig in transformers module BEFORE DiffRhythm imports it
+import transformers.models.llama
+transformers.models.llama.LlamaConfig = PatchedLlamaConfig
+import transformers.models.llama.modeling_llama
+transformers.models.llama.modeling_llama.LlamaConfig = PatchedLlamaConfig
+
from .DiffRhythmNode import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
__all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"]

View File

@@ -1,7 +1,7 @@
torch
torchvision
torchaudio
transformers
transformers==4.49.0
diffusers>=0.31.0
accelerate
safetensors
@@ -19,3 +19,4 @@ insightface
onnxruntime
pyyaml
imageio-ffmpeg
torchcodec

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,733 @@
{
"id": "91f6bbe2-ed41-4fd6-bac7-71d5b5864ecb",
"revision": 0,
"last_node_id": 59,
"last_link_id": 108,
"nodes": [
{
"id": 37,
"type": "UNETLoader",
"pos": [
-30,
50
],
"size": [
346.7470703125,
82
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
94
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "UNETLoader",
"models": [
{
"name": "wan2.2_ti2v_5B_fp16.safetensors",
"url": "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors",
"directory": "diffusion_models"
}
]
},
"widgets_values": [
"wan2.2_ti2v_5B_fp16.safetensors",
"default"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
-30,
190
],
"size": [
350,
110
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "CLIPLoader",
"models": [
{
"name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"url": "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"directory": "text_encoders"
}
]
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
-30,
350
],
"size": [
350,
60
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76,
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "VAELoader",
"models": [
{
"name": "wan2.2_vae.safetensors",
"url": "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors",
"directory": "vae"
}
]
},
"widgets_values": [
"wan2.2_vae.safetensors"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1190,
150
],
"size": [
210,
46
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
107
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 57,
"type": "CreateVideo",
"pos": [
1200,
240
],
"size": [
270,
78
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 107
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
}
],
"outputs": [
{
"name": "VIDEO",
"type": "VIDEO",
"links": [
108
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "CreateVideo"
},
"widgets_values": [
24
]
},
{
"id": 58,
"type": "SaveVideo",
"pos": [
1200,
370
],
"size": [
660,
450
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "video",
"type": "VIDEO",
"link": 108
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "SaveVideo"
},
"widgets_values": [
"video/ComfyUI",
"auto",
"auto"
]
},
{
"id": 55,
"type": "Wan22ImageToVideoLatent",
"pos": [
380,
540
],
"size": [
271.9126892089844,
150
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "vae",
"type": "VAE",
"link": 105
},
{
"name": "start_image",
"shape": 7,
"type": "IMAGE",
"link": 106
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
104
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "Wan22ImageToVideoLatent"
},
"widgets_values": [
1280,
704,
121,
1
]
},
{
"id": 56,
"type": "LoadImage",
"pos": [
0,
540
],
"size": [
274.080078125,
314
],
"flags": {},
"order": 3,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
106
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"example.png",
"image"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
380,
260
],
"size": [
425.27801513671875,
180.6060791015625
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"色调艳丽过曝静态细节模糊不清字幕风格作品画作画面静止整体发灰最差质量低质量JPEG压缩残留丑陋的残缺的多余的手指画得不好的手部画得不好的脸部畸形的毁容的形态畸形的肢体手指融合静止不动的画面杂乱的背景三条腿背景人很多倒着走"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
380,
50
],
"size": [
422.84503173828125,
164.31304931640625
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"Low contrast. In a retro 1970s-style subway station, a street musician plays in dim colors and rough textures. He wears an old jacket, playing guitar with focus. Commuters hurry by, and a small crowd gathers to listen. The camera slowly moves right, capturing the blend of music and city noise, with old subway signs and mottled walls in the background."
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 3,
"type": "KSampler",
"pos": [
850,
130
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 95
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 104
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "KSampler"
},
"widgets_values": [
898471028164125,
"randomize",
20,
5,
"uni_pc",
"simple",
1
]
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
850,
20
],
"size": [
210,
58
],
"flags": {
"collapsed": false
},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 94
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
95
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.45",
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
8
]
},
{
"id": 59,
"type": "MarkdownNote",
"pos": [
-550,
10
],
"size": [
480,
340
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"title": "Model Links",
"properties": {},
"widgets_values": [
"[Tutorial](https://docs.comfy.org/tutorials/video/wan/wan2_2\n) \n\n**Diffusion Model**\n- [wan2.2_ti2v_5B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors)\n\n**VAE**\n- [wan2.2_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors)\n\n**Text Encoder** \n- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n\n\nFile save location\n\n```\nComfyUI/\n├───📂 models/\n│ ├───📂 diffusion_models/\n│ │ └───wan2.2_ti2v_5B_fp16.safetensors\n│ ├───📂 text_encoders/\n│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors \n│ └───📂 vae/\n│ └── wan2.2_vae.safetensors\n```\n"
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
94,
37,
0,
48,
0,
"MODEL"
],
[
95,
48,
0,
3,
0,
"MODEL"
],
[
104,
55,
0,
3,
3,
"LATENT"
],
[
105,
39,
0,
55,
0,
"VAE"
],
[
106,
56,
0,
55,
1,
"IMAGE"
],
[
107,
8,
0,
57,
0,
"IMAGE"
],
[
108,
57,
0,
58,
0,
"VIDEO"
]
],
"groups": [
{
"id": 1,
"title": "Step1 - Load models",
"bounding": [
-50,
-20,
400,
453.6000061035156
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 2,
"title": "Step3 - Prompt",
"bounding": [
370,
-20,
448.27801513671875,
473.2060852050781
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 3,
"title": "For i2v, use Ctrl + B to enable",
"bounding": [
-50,
450,
400,
420
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 4,
"title": "Video Size & length",
"bounding": [
370,
470,
291.9127197265625,
233.60000610351562
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.46462425349300085,
"offset": [
847.5372059811432,
288.7938392118285
]
},
"frontendVersion": "1.27.10",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 906 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 925 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 712 KiB

View File

@@ -33,8 +33,8 @@
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": ["waiIllustriousSDXL_v150.safetensors"],
"title": "WAI-NSFW-Illustrious SDXL Checkpoint (Anime/Furry)"
"widgets_values": ["ponyDiffusionV6XL_v6StartWithThisOne.safetensors"],
"title": "Pony Diffusion V6 XL Checkpoint (Anime/Furry)"
},
{
"id": 2,
@@ -242,7 +242,7 @@
"version": "1.0",
"description": "Production workflow for Pony Diffusion V6 XL optimized for anime, cartoon, and furry NSFW generation with danbooru tag support and balanced content (safe/questionable/explicit)",
"category": "nsfw",
"model": "add-detail-xl.safetensors",
"model": "ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
"recommended_settings": {
"sampler": "euler_ancestral or dpmpp_2m",
"scheduler": "normal or karras",

View File

@@ -0,0 +1,865 @@
{
"id": "88ac5dad-efd7-40bb-84fe-fbaefdee1fa9",
"revision": 0,
"last_node_id": 75,
"last_link_id": 138,
"nodes": [
{
"id": 49,
"type": "LatentApplyOperationCFG",
"pos": [
940,
-160
],
"size": [
290,
50
],
"flags": {
"collapsed": false
},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 113
},
{
"name": "operation",
"type": "LATENT_OPERATION",
"link": 114
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
121
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LatentApplyOperationCFG"
},
"widgets_values": []
},
{
"id": 40,
"type": "CheckpointLoaderSimple",
"pos": [
180,
-160
],
"size": [
370,
98
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
115
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
80
]
},
{
"name": "VAE",
"type": "VAE",
"links": [
83,
137
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "CheckpointLoaderSimple",
"models": [
{
"name": "ace_step_v1_3.5b.safetensors",
"url": "https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/resolve/main/all_in_one/ace_step_v1_3.5b.safetensors?download=true",
"directory": "checkpoints"
}
]
},
"widgets_values": [
"ace_step_v1_3.5b.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 48,
"type": "MarkdownNote",
"pos": [
-460,
-200
],
"size": [
610,
820
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"title": "About ACE Step and Multi-language Input",
"properties": {},
"widgets_values": [
"[Tutorial](http://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1) | [教程](http://docs.comfy.org/zh-CN/tutorials/audio/ace-step/ace-step-v1)\n\n\n### Model Download\n\nDownload the following model and save it to the **ComfyUI/models/checkpoints** folder.\n[ace_step_v1_3.5b.safetensors](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/blob/main/all_in_one/ace_step_v1_3.5b.safetensors)\n\n\n### Multilingual Support\n\nCurrently, the implementation of multi-language support for ACE-Step V1 is achieved by uniformly converting different languages into English characters. At present, in ComfyUI, we haven't implemented the step of converting multi-languages into English. This is because if we need to implement the corresponding conversion, we have to add additional core dependencies of ComfyUI, which may lead to uncertain dependency conflicts.\n\nSo, currently, if you need to input multi-language text, you have to manually convert it into English characters to complete this process. Then, at the beginning of the corresponding `lyrics`, input the abbreviation of the corresponding language code.\n\nFor example, for Chinese, use `[zh]`, for Japanese use `[ja]`, for Korean use `[ko]`, and so on. For specific language input, please check the examples in the instructions. \n\nFor example, Chinese `[zh]`, Japanese `[ja]`, Korean `[ko]`, etc.\n\nExample:\n\n```\n[verse]\n\n[zh]wo3zou3guo4shen1ye4de5jie1dao4\n[zh]leng3feng1chui1luan4si1nian4de5piao4liang4wai4tao4\n[zh]ni3de5wei1xiao4xiang4xing1guang1hen3xuan4yao4\n[zh]zhao4liang4le5wo3gu1du2de5mei3fen1mei3miao3\n\n[chorus]\n\n[verse]\n[ko]hamkke si-kkeuleo-un sesang-ui sodong-eul pihae\n[ko]honja ogsang-eseo dalbich-ui eolyeompus-ileul balaboda\n[ko]niga salang-eun lideum-i ganghan eum-ag gatdago malhaess-eo\n[ko]han ta han tamada ma-eum-ui ondoga eolmana heojeonhanji ijge hae\n\n[bridge]\n[es]cantar mi anhelo por ti sin ocultar\n[es]como poesía y pintura, lleno de anhelo indescifrable\n[es]tu sombra es tan terca como el viento, inborrable\n[es]persiguiéndote en vuelo, brilla como cruzar una mar de nubes\n\n[chorus]\n[fr]que tu sois le vent qui souffle sur ma main\n[fr]un contact chaud comme la douce pluie printanière\n[fr]que tu sois le vent qui s'entoure de mon corps\n[fr]un amour profond qui ne s'éloignera jamais\n\n```\n\n---\n\n### 模型下载\n\n下载下面的模型并保存到 **ComfyUI/models/checkpoints** 文件夹下\n[ace_step_v1_3.5b.safetensors](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/blob/main/all_in_one/ace_step_v1_3.5b.safetensors)\n\n\n### 多语言支持\n\n目前 ACE-Step V1 多语言的实现是通过将不同语言统一转换为英文字符来实现的,目前在 ComfyUI 中我们并没有实现多语言转换为英文的这一步骤。因为如果需要实现对应转换,则需要增加额外的 ComfyUI 核心依赖,这将可能带来不确定的依赖冲突。\n\n所以目前如果你需要输入多语言则需要手动转换为英文字符来实现这一过程然后在对应 `lyrics` 开头输入对应语言代码的缩写。\n\n比如中文`[zh]` 日语 `[ja]` 韩语 `[ko]` 等,具体语言输入请查看说明中的示例\n\n"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 18,
"type": "VAEDecodeAudio",
"pos": [
1080,
270
],
"size": [
150.93612670898438,
46
],
"flags": {
"collapsed": false
},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 122
},
{
"name": "vae",
"type": "VAE",
"link": 83
}
],
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [
126,
127,
128
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "VAEDecodeAudio"
},
"widgets_values": []
},
{
"id": 60,
"type": "SaveAudio",
"pos": [
1260,
40
],
"size": [
610,
112
],
"flags": {},
"order": 15,
"mode": 4,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 127
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudio"
},
"widgets_values": [
"audio/ComfyUI"
]
},
{
"id": 61,
"type": "SaveAudioOpus",
"pos": [
1260,
220
],
"size": [
610,
136
],
"flags": {},
"order": 16,
"mode": 4,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 128
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudioOpus"
},
"widgets_values": [
"audio/ComfyUI",
"128k"
]
},
{
"id": 44,
"type": "ConditioningZeroOut",
"pos": [
600,
70
],
"size": [
197.712890625,
26
],
"flags": {
"collapsed": true
},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 108
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
120
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "ConditioningZeroOut"
},
"widgets_values": []
},
{
"id": 51,
"type": "ModelSamplingSD3",
"pos": [
590,
-40
],
"size": [
330,
60
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 115
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
5.000000000000001
]
},
{
"id": 50,
"type": "LatentOperationTonemapReinhard",
"pos": [
590,
-160
],
"size": [
330,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT_OPERATION",
"type": "LATENT_OPERATION",
"links": [
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LatentOperationTonemapReinhard"
},
"widgets_values": [
1.0000000000000002
]
},
{
"id": 17,
"type": "EmptyAceStepLatentAudio",
"pos": [
180,
50
],
"size": [
370,
82
],
"flags": {},
"order": 3,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "EmptyAceStepLatentAudio"
},
"widgets_values": [
120,
1
]
},
{
"id": 68,
"type": "VAEEncodeAudio",
"pos": [
180,
180
],
"size": [
370,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 136
},
{
"name": "vae",
"type": "VAE",
"link": 137
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
138
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "VAEEncodeAudio"
},
"widgets_values": []
},
{
"id": 64,
"type": "LoadAudio",
"pos": [
180,
340
],
"size": [
370,
140
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [
136
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LoadAudio"
},
"widgets_values": [
"audio_ace_step_1_t2a_song-1.mp3",
null,
null
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 52,
"type": "KSampler",
"pos": [
940,
-40
],
"size": [
290,
262
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 121
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 117
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 120
},
{
"name": "latent_image",
"type": "LATENT",
"link": 138
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
122
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "KSampler"
},
"widgets_values": [
938549746349002,
"randomize",
50,
5,
"euler",
"simple",
0.30000000000000004
]
},
{
"id": 59,
"type": "SaveAudioMP3",
"pos": [
1260,
-160
],
"size": [
610,
136
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 126
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudioMP3"
},
"widgets_values": [
"audio/ComfyUI",
"V0"
]
},
{
"id": 73,
"type": "Note",
"pos": [
1260,
410
],
"size": [
610,
90
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"These nodes can save audio in different formats. Currently, all the modes are Bypass. You can enable them as per your needs.\n\n这些节点可以将 audio 保存成不同格式,目前的模式都是 Bypass ,你可以按你的需要来启用"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 14,
"type": "TextEncodeAceStepAudio",
"pos": [
590,
120
],
"size": [
340,
500
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 80
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
108,
117
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "TextEncodeAceStepAudio"
},
"widgets_values": [
"anime, cute female vocals, kawaii pop, j-pop, childish, piano, guitar, synthesizer, fast, happy, cheerful, lighthearted",
"[verse]\nフワフワ オミミガ\nユレルヨ カゼナカ\nキラキラ アオイメ\nミツメル セカイヲ\n\n[verse]\nフワフワ シッポハ\nオオキク ユレルヨ\nキンイロ カミケ\nナビクヨ カゼナカ\n\n[verse]\nコンフィーユーアイ\nマモリビト\nピンク セーターデ\nエガオヲ クレルヨ\n\nアオイロ スカートト\nクロイコート キンモヨウ\nヤサシイ ヒカリガ\nツツムヨ フェネックガール\n\n[verse]\nフワフワ オミミデ\nキコエル ココロ コエ\nダイスキ フェネックガール\nイツデモ ソバニイルヨ",
0.9900000000000002
]
},
{
"id": 75,
"type": "MarkdownNote",
"pos": [
950,
410
],
"size": [
280,
210
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [],
"title": "About Repainting",
"properties": {},
"widgets_values": [
"Providing the lyrics of the original song or the modified lyrics is very important for the output of repainting or editing. \n\nAdjust the value of the **denoise** parameter in KSampler. The larger the value, the lower the similarity between the output audio and the original audio.\n\n提供原始歌曲的歌词或者修改后的歌词对于音频编辑的输出是非常重要的调整 KSampler 中的 denoise 参数的数值,数值越大输出的音频与原始音频相似度越低"
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
80,
40,
1,
14,
0,
"CLIP"
],
[
83,
40,
2,
18,
1,
"VAE"
],
[
108,
14,
0,
44,
0,
"CONDITIONING"
],
[
113,
51,
0,
49,
0,
"MODEL"
],
[
114,
50,
0,
49,
1,
"LATENT_OPERATION"
],
[
115,
40,
0,
51,
0,
"MODEL"
],
[
117,
14,
0,
52,
1,
"CONDITIONING"
],
[
120,
44,
0,
52,
2,
"CONDITIONING"
],
[
121,
49,
0,
52,
0,
"MODEL"
],
[
122,
52,
0,
18,
0,
"LATENT"
],
[
126,
18,
0,
59,
0,
"AUDIO"
],
[
127,
18,
0,
60,
0,
"AUDIO"
],
[
128,
18,
0,
61,
0,
"AUDIO"
],
[
136,
64,
0,
68,
0,
"AUDIO"
],
[
137,
40,
2,
68,
1,
"VAE"
],
[
138,
68,
0,
52,
3,
"LATENT"
]
],
"groups": [
{
"id": 1,
"title": "Load model here",
"bounding": [
170,
-230,
390,
180
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 4,
"title": "Latent",
"bounding": [
170,
-30,
390,
280
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 5,
"title": "Adjust the vocal volume",
"bounding": [
580,
-230,
350,
140
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 6,
"title": "For repainting",
"bounding": [
170,
270,
390,
223.60000610351562
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 7,
"title": "Output",
"bounding": [
1250,
-230,
630,
760
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
785.724285521853,
434.02395631202546
]
},
"frontendVersion": "1.19.9",
"node_versions": {
"comfy-core": "0.3.34",
"ace-step": "06f751d65491c9077fa2bc9b06d2c6f2a90e4c56"
},
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,841 @@
{
"id": "88ac5dad-efd7-40bb-84fe-fbaefdee1fa9",
"revision": 0,
"last_node_id": 73,
"last_link_id": 137,
"nodes": [
{
"id": 49,
"type": "LatentApplyOperationCFG",
"pos": [
940,
-160
],
"size": [
290,
50
],
"flags": {
"collapsed": false
},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 113
},
{
"name": "operation",
"type": "LATENT_OPERATION",
"link": 114
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
121
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LatentApplyOperationCFG"
},
"widgets_values": []
},
{
"id": 64,
"type": "LoadAudio",
"pos": [
180,
340
],
"size": [
370,
140
],
"flags": {},
"order": 0,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [
136
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LoadAudio"
},
"widgets_values": [
"ace_step_example.flac",
null,
null
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 68,
"type": "VAEEncodeAudio",
"pos": [
180,
180
],
"size": [
370,
46
],
"flags": {},
"order": 8,
"mode": 4,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 136
},
{
"name": "vae",
"type": "VAE",
"link": 137
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "VAEEncodeAudio"
},
"widgets_values": []
},
{
"id": 40,
"type": "CheckpointLoaderSimple",
"pos": [
180,
-160
],
"size": [
370,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
115
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
80
]
},
{
"name": "VAE",
"type": "VAE",
"links": [
83,
137
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "CheckpointLoaderSimple",
"models": [
{
"name": "ace_step_v1_3.5b.safetensors",
"url": "https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/resolve/main/all_in_one/ace_step_v1_3.5b.safetensors?download=true",
"directory": "checkpoints"
}
]
},
"widgets_values": [
"ace_step_v1_3.5b.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 48,
"type": "MarkdownNote",
"pos": [
-460,
-200
],
"size": [
610,
820
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [],
"title": "About ACE Step and Multi-language Input",
"properties": {},
"widgets_values": [
"[Tutorial](http://docs.comfy.org/tutorials/audio/ace-step/ace-step-v1) | [教程](http://docs.comfy.org/zh-CN/tutorials/audio/ace-step/ace-step-v1)\n\n\n### Model Download\n\nDownload the following model and save it to the **ComfyUI/models/checkpoints** folder.\n[ace_step_v1_3.5b.safetensors](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/blob/main/all_in_one/ace_step_v1_3.5b.safetensors)\n\n\n### Multilingual Support\n\nCurrently, the implementation of multi-language support for ACE-Step V1 is achieved by uniformly converting different languages into English characters. At present, in ComfyUI, we haven't implemented the step of converting multi-languages into English. This is because if we need to implement the corresponding conversion, we have to add additional core dependencies of ComfyUI, which may lead to uncertain dependency conflicts.\n\nSo, currently, if you need to input multi-language text, you have to manually convert it into English characters to complete this process. Then, at the beginning of the corresponding `lyrics`, input the abbreviation of the corresponding language code.\n\nFor example, for Chinese, use `[zh]`, for Japanese use `[ja]`, for Korean use `[ko]`, and so on. For specific language input, please check the examples in the instructions. \n\nFor example, Chinese `[zh]`, Japanese `[ja]`, Korean `[ko]`, etc.\n\nExample:\n\n```\n[verse]\n\n[zh]wo3zou3guo4shen1ye4de5jie1dao4\n[zh]leng3feng1chui1luan4si1nian4de5piao4liang4wai4tao4\n[zh]ni3de5wei1xiao4xiang4xing1guang1hen3xuan4yao4\n[zh]zhao4liang4le5wo3gu1du2de5mei3fen1mei3miao3\n\n[chorus]\n\n[verse]\n[ko]hamkke si-kkeuleo-un sesang-ui sodong-eul pihae\n[ko]honja ogsang-eseo dalbich-ui eolyeompus-ileul balaboda\n[ko]niga salang-eun lideum-i ganghan eum-ag gatdago malhaess-eo\n[ko]han ta han tamada ma-eum-ui ondoga eolmana heojeonhanji ijge hae\n\n[bridge]\n[es]cantar mi anhelo por ti sin ocultar\n[es]como poesía y pintura, lleno de anhelo indescifrable\n[es]tu sombra es tan terca como el viento, inborrable\n[es]persiguiéndote en vuelo, brilla como cruzar una mar de nubes\n\n[chorus]\n[fr]que tu sois le vent qui souffle sur ma main\n[fr]un contact chaud comme la douce pluie printanière\n[fr]que tu sois le vent qui s'entoure de mon corps\n[fr]un amour profond qui ne s'éloignera jamais\n\n```\n\n---\n\n### 模型下载\n\n下载下面的模型并保存到 **ComfyUI/models/checkpoints** 文件夹下\n[ace_step_v1_3.5b.safetensors](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/blob/main/all_in_one/ace_step_v1_3.5b.safetensors)\n\n\n### 多语言支持\n\n目前 ACE-Step V1 多语言的实现是通过将不同语言统一转换为英文字符来实现的,目前在 ComfyUI 中我们并没有实现多语言转换为英文的这一步骤。因为如果需要实现对应转换,则需要增加额外的 ComfyUI 核心依赖,这将可能带来不确定的依赖冲突。\n\n所以目前如果你需要输入多语言则需要手动转换为英文字符来实现这一过程然后在对应 `lyrics` 开头输入对应语言代码的缩写。\n\n比如中文`[zh]` 日语 `[ja]` 韩语 `[ko]` 等,具体语言输入请查看说明中的示例\n\n"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 18,
"type": "VAEDecodeAudio",
"pos": [
1080,
270
],
"size": [
150.93612670898438,
46
],
"flags": {
"collapsed": false
},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 122
},
{
"name": "vae",
"type": "VAE",
"link": 83
}
],
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [
126,
127,
128
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "VAEDecodeAudio"
},
"widgets_values": []
},
{
"id": 60,
"type": "SaveAudio",
"pos": [
1260,
40
],
"size": [
610,
112
],
"flags": {},
"order": 14,
"mode": 4,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 127
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudio"
},
"widgets_values": [
"audio/ComfyUI"
]
},
{
"id": 61,
"type": "SaveAudioOpus",
"pos": [
1260,
220
],
"size": [
610,
136
],
"flags": {},
"order": 15,
"mode": 4,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 128
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudioOpus"
},
"widgets_values": [
"audio/ComfyUI",
"128k"
]
},
{
"id": 73,
"type": "Note",
"pos": [
1260,
410
],
"size": [
610,
90
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"These nodes can save audio in different formats. Currently, all the modes are Bypass. You can enable them as per your needs.\n\n这些节点可以将 audio 保存成不同格式,目前的模式都是 Bypass ,你可以按你的需要来启用"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 44,
"type": "ConditioningZeroOut",
"pos": [
600,
70
],
"size": [
197.712890625,
26
],
"flags": {
"collapsed": true
},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 108
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
120
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "ConditioningZeroOut"
},
"widgets_values": []
},
{
"id": 51,
"type": "ModelSamplingSD3",
"pos": [
590,
-40
],
"size": [
330,
60
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 115
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
5.000000000000001
]
},
{
"id": 50,
"type": "LatentOperationTonemapReinhard",
"pos": [
590,
-160
],
"size": [
330,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT_OPERATION",
"type": "LATENT_OPERATION",
"links": [
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "LatentOperationTonemapReinhard"
},
"widgets_values": [
1.0000000000000002
]
},
{
"id": 52,
"type": "KSampler",
"pos": [
940,
-40
],
"size": [
290,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 121
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 117
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 120
},
{
"name": "latent_image",
"type": "LATENT",
"link": 119
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
122
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "KSampler"
},
"widgets_values": [
468254064217846,
"randomize",
50,
5,
"euler",
"simple",
1
]
},
{
"id": 14,
"type": "TextEncodeAceStepAudio",
"pos": [
590,
120
],
"size": [
340,
500
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 80
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
108,
117
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "TextEncodeAceStepAudio"
},
"widgets_values": [
"anime, soft female vocals, kawaii pop, j-pop, childish, piano, guitar, synthesizer, fast, happy, cheerful, lighthearted\t\n",
"[inst]\n\n[verse]\nふわふわ おみみが\nゆれるよ かぜのなか\nきらきら あおいめ\nみつめる せかいを\n\n[verse]\nふわふわ しっぽは\nおおきく ゆれるよ\nきんいろ かみのけ\nなびくよ かぜのなか\n\n[verse]\nコンフィーユーアイの\nまもりびと\nピンクの セーターで\nえがおを くれるよ\n\nあおいろ スカートと\nくろいコート きんのもよう\nやさしい ひかりが\nつつむよ フェネックガール\n\n[verse]\nふわふわ おみみで\nきこえる こころの こえ\nだいすき フェネックガール\nいつでも そばにいるよ\n\n\n",
0.9900000000000002
]
},
{
"id": 17,
"type": "EmptyAceStepLatentAudio",
"pos": [
180,
50
],
"size": [
370,
82
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
119
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.32",
"Node name for S&R": "EmptyAceStepLatentAudio"
},
"widgets_values": [
120,
1
]
},
{
"id": 59,
"type": "SaveAudioMP3",
"pos": [
1260,
-160
],
"size": [
610,
136
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 126
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.34",
"Node name for S&R": "SaveAudioMP3"
},
"widgets_values": [
"audio/ComfyUI",
"V0"
]
}
],
"links": [
[
80,
40,
1,
14,
0,
"CLIP"
],
[
83,
40,
2,
18,
1,
"VAE"
],
[
108,
14,
0,
44,
0,
"CONDITIONING"
],
[
113,
51,
0,
49,
0,
"MODEL"
],
[
114,
50,
0,
49,
1,
"LATENT_OPERATION"
],
[
115,
40,
0,
51,
0,
"MODEL"
],
[
117,
14,
0,
52,
1,
"CONDITIONING"
],
[
119,
17,
0,
52,
3,
"LATENT"
],
[
120,
44,
0,
52,
2,
"CONDITIONING"
],
[
121,
49,
0,
52,
0,
"MODEL"
],
[
122,
52,
0,
18,
0,
"LATENT"
],
[
126,
18,
0,
59,
0,
"AUDIO"
],
[
127,
18,
0,
60,
0,
"AUDIO"
],
[
128,
18,
0,
61,
0,
"AUDIO"
],
[
136,
64,
0,
68,
0,
"AUDIO"
],
[
137,
40,
2,
68,
1,
"VAE"
]
],
"groups": [
{
"id": 1,
"title": "Load model here",
"bounding": [
170,
-230,
390,
180
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 4,
"title": "Latent",
"bounding": [
170,
-30,
390,
280
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 5,
"title": "Adjust the vocal volume",
"bounding": [
580,
-230,
350,
140
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 6,
"title": "For repainting",
"bounding": [
170,
270,
390,
223.60000610351562
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
},
{
"id": 7,
"title": "Output",
"bounding": [
1250,
-230,
630,
760
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 1,
"offset": [
-147.02717343600432,
384.62272311479
]
},
"frontendVersion": "1.19.9",
"node_versions": {
"comfy-core": "0.3.34",
"ace-step": "06f751d65491c9077fa2bc9b06d2c6f2a90e4c56"
},
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}

View File

@@ -0,0 +1,130 @@
{
"last_node_id": 3,
"last_link_id": 2,
"nodes": [
{
"id": 1,
"type": "DiffRhythmRun",
"pos": [100, 100],
"size": [400, 400],
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [1, 2]
}
],
"properties": {
"Node name for S&R": "DiffRhythmRun"
},
"widgets_values": [
"cfm_full_model.pt",
"Cinematic orchestral piece with soaring strings, powerful brass, and emotional piano melodies building to an epic crescendo",
true,
"euler",
30,
4,
"quality",
123,
"randomize",
false,
"[-1, 20], [60, -1]"
],
"title": "DiffRhythm Full-Length Text-to-Music (4m45s)"
},
{
"id": 2,
"type": "PreviewAudio",
"pos": [600, 100],
"size": [300, 100],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 1
}
],
"properties": {
"Node name for S&R": "PreviewAudio"
},
"title": "Preview Audio"
},
{
"id": 3,
"type": "SaveAudio",
"pos": [600, 250],
"size": [300, 100],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 2
}
],
"properties": {
"Node name for S&R": "SaveAudio"
},
"widgets_values": [
"diffrhythm_full_output"
],
"title": "Save Audio"
}
],
"links": [
[1, 1, 0, 2, 0, "AUDIO"],
[2, 1, 0, 3, 0, "AUDIO"]
],
"groups": [],
"config": {},
"extra": {
"workflow_info": {
"name": "DiffRhythm Full-Length Text-to-Music v1",
"description": "Full-length music generation using DiffRhythm Full (4 minutes 45 seconds)",
"version": "1.0.0",
"author": "valknar@pivoine.art",
"category": "text-to-music",
"tags": ["diffrhythm", "music-generation", "text-to-music", "full-length", "4m45s"],
"requirements": {
"custom_nodes": ["ComfyUI_DiffRhythm"],
"models": ["ASLP-lab/DiffRhythm-full", "ASLP-lab/DiffRhythm-vae", "OpenMuQ/MuQ-MuLan-large", "OpenMuQ/MuQ-large-msd-iter", "FacebookAI/xlm-roberta-base"],
"vram_min": "16GB",
"vram_recommended": "20GB",
"system_deps": ["espeak-ng"]
},
"usage": {
"model": "cfm_full_model.pt (DiffRhythm Full - 4m45s/285s generation)",
"style_prompt": "Detailed text description of the desired full-length music composition",
"unload_model": "Boolean to unload model after generation (default: true)",
"odeint_method": "ODE solver: euler, midpoint, rk4, implicit_adams (default: euler)",
"steps": "Number of diffusion steps: 1-100 (default: 30)",
"cfg": "Classifier-free guidance scale: 1-10 (default: 4)",
"quality_or_speed": "Generation mode: quality or speed (default: quality for full-length)",
"seed": "Random seed for reproducibility (default: 123)",
"edit": "Enable segment editing mode (default: false)",
"edit_segments": "Segments to edit when edit=true"
},
"performance": {
"generation_time": "~60-90 seconds on RTX 4090",
"vram_usage": "~16GB during generation",
"note": "Significantly faster than real-time music generation"
},
"notes": [
"This workflow uses DiffRhythm Full for 4 minute 45 second music generation",
"Best for complete song compositions with intro, development, and outro",
"All parameters except model and style_prompt are optional",
"Supports complex, multi-part compositions",
"Can optionally connect MultiLineLyricsDR node for lyrics input"
]
}
},
"version": 0.4
}

View File

@@ -0,0 +1,164 @@
{
"last_node_id": 4,
"last_link_id": 3,
"nodes": [
{
"id": 1,
"type": "LoadAudio",
"pos": [100, 100],
"size": [300, 100],
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [1]
}
],
"properties": {
"Node name for S&R": "LoadAudio"
},
"widgets_values": [
"reference_audio.wav"
],
"title": "Load Reference Audio"
},
{
"id": 2,
"type": "DiffRhythmRun",
"pos": [500, 100],
"size": [400, 450],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "style_audio_or_edit_song",
"type": "AUDIO",
"link": 1
}
],
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [2, 3]
}
],
"properties": {
"Node name for S&R": "DiffRhythmRun"
},
"widgets_values": [
"cfm_model_v1_2.pt",
"Energetic rock music with driving guitar riffs and powerful drums",
true,
"euler",
30,
5,
"speed",
456,
"randomize",
false,
"[-1, 20], [60, -1]"
],
"title": "DiffRhythm Reference-Based Generation"
},
{
"id": 3,
"type": "PreviewAudio",
"pos": [1000, 100],
"size": [300, 100],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 2
}
],
"properties": {
"Node name for S&R": "PreviewAudio"
},
"title": "Preview Generated Audio"
},
{
"id": 4,
"type": "SaveAudio",
"pos": [1000, 250],
"size": [300, 100],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 3
}
],
"properties": {
"Node name for S&R": "SaveAudio"
},
"widgets_values": [
"diffrhythm_reference_output"
],
"title": "Save Audio"
}
],
"links": [
[1, 1, 0, 2, 0, "AUDIO"],
[2, 2, 0, 3, 0, "AUDIO"],
[3, 2, 0, 4, 0, "AUDIO"]
],
"groups": [],
"config": {},
"extra": {
"workflow_info": {
"name": "DiffRhythm Reference-Based Generation v1",
"description": "Generate new music based on a reference audio file while following text prompt guidance",
"version": "1.0.0",
"author": "valknar@pivoine.art",
"category": "text-to-music",
"tags": ["diffrhythm", "music-generation", "reference-based", "style-transfer"],
"requirements": {
"custom_nodes": ["ComfyUI_DiffRhythm"],
"models": ["ASLP-lab/DiffRhythm-1_2", "ASLP-lab/DiffRhythm-vae", "OpenMuQ/MuQ-MuLan-large", "OpenMuQ/MuQ-large-msd-iter", "FacebookAI/xlm-roberta-base"],
"vram_min": "14GB",
"vram_recommended": "18GB",
"system_deps": ["espeak-ng"]
},
"usage": {
"reference_audio": "Path to reference audio file (WAV, MP3, or other supported formats)",
"model": "cfm_model_v1_2.pt (DiffRhythm 1.2)",
"style_prompt": "Text description guiding the style and characteristics of generated music",
"unload_model": "Boolean to unload model after generation (default: true)",
"odeint_method": "ODE solver: euler, midpoint, rk4, implicit_adams (default: euler)",
"steps": "Number of diffusion steps: 1-100 (default: 30)",
"cfg": "Classifier-free guidance scale: 1-10 (default: 5 for reference-based)",
"quality_or_speed": "Generation mode: quality or speed (default: speed)",
"seed": "Random seed for reproducibility (default: 456)",
"edit": "Enable segment editing mode (default: false)",
"edit_segments": "Segments to edit when edit=true"
},
"use_cases": [
"Style transfer: Apply the style of reference music to new prompt",
"Variations: Create variations of existing compositions",
"Genre transformation: Transform music to different genre while keeping structure",
"Mood adaptation: Change the mood/emotion while maintaining musical elements"
],
"notes": [
"This workflow combines reference audio with text prompt guidance",
"The reference audio is connected to the style_audio_or_edit_song input",
"Higher cfg values (7-10) = closer adherence to both prompt and reference",
"Lower cfg values (2-4) = more creative interpretation",
"Reference audio should ideally be similar duration to target (95s for cfm_model_v1_2.pt)",
"Can use any format supported by ComfyUI's LoadAudio node"
]
}
},
"version": 0.4
}

View File

@@ -0,0 +1,125 @@
{
"last_node_id": 3,
"last_link_id": 2,
"nodes": [
{
"id": 1,
"type": "DiffRhythmRun",
"pos": [100, 100],
"size": [400, 400],
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "AUDIO",
"type": "AUDIO",
"links": [1, 2]
}
],
"properties": {
"Node name for S&R": "DiffRhythmRun"
},
"widgets_values": [
"cfm_model_v1_2.pt",
"Upbeat electronic dance music with energetic beats and synthesizer melodies",
true,
"euler",
30,
4,
"speed",
42,
"randomize",
false,
"[-1, 20], [60, -1]"
],
"title": "DiffRhythm Text-to-Music (95s)"
},
{
"id": 2,
"type": "PreviewAudio",
"pos": [600, 100],
"size": [300, 100],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 1
}
],
"properties": {
"Node name for S&R": "PreviewAudio"
},
"title": "Preview Audio"
},
{
"id": 3,
"type": "SaveAudio",
"pos": [600, 250],
"size": [300, 100],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "audio",
"type": "AUDIO",
"link": 2
}
],
"properties": {
"Node name for S&R": "SaveAudio"
},
"widgets_values": [
"diffrhythm_output"
],
"title": "Save Audio"
}
],
"links": [
[1, 1, 0, 2, 0, "AUDIO"],
[2, 1, 0, 3, 0, "AUDIO"]
],
"groups": [],
"config": {},
"extra": {
"workflow_info": {
"name": "DiffRhythm Simple Text-to-Music v1",
"description": "Basic text-to-music generation using DiffRhythm 1.2 (95 seconds)",
"version": "1.0.0",
"author": "valknar@pivoine.art",
"category": "text-to-music",
"tags": ["diffrhythm", "music-generation", "text-to-music", "95s"],
"requirements": {
"custom_nodes": ["ComfyUI_DiffRhythm"],
"models": ["ASLP-lab/DiffRhythm-1_2", "ASLP-lab/DiffRhythm-vae", "OpenMuQ/MuQ-MuLan-large", "OpenMuQ/MuQ-large-msd-iter", "FacebookAI/xlm-roberta-base"],
"vram_min": "12GB",
"vram_recommended": "16GB",
"system_deps": ["espeak-ng"]
},
"usage": {
"model": "cfm_model_v1_2.pt (DiffRhythm 1.2 - 95s generation)",
"style_prompt": "Text description of the desired music style, mood, and instruments",
"unload_model": "Boolean to unload model after generation (default: true)",
"odeint_method": "ODE solver: euler, midpoint, rk4, implicit_adams (default: euler)",
"steps": "Number of diffusion steps: 1-100 (default: 30)",
"cfg": "Classifier-free guidance scale: 1-10 (default: 4)",
"quality_or_speed": "Generation mode: quality or speed (default: speed)",
"seed": "Random seed for reproducibility (default: 42)",
"edit": "Enable segment editing mode (default: false)",
"edit_segments": "Segments to edit when edit=true (default: [-1, 20], [60, -1])"
},
"notes": [
"This workflow uses DiffRhythm 1.2 for 95-second music generation",
"All parameters except model and style_prompt are optional",
"Supports English and Chinese text prompts",
"Generation time: ~30-60 seconds on RTX 4090",
"Can optionally connect MultiLineLyricsDR node for lyrics input"
]
}
},
"version": 0.4
}

Binary file not shown.

View File

@@ -61,6 +61,9 @@ model_categories:
base_model: SDXL 1.0
vram_gb: 12
tags: [nsfw, anime, furry, cartoon, versatile]
files:
- source: "ponyDiffusionV6XL_v6StartWithThisOne.safetensors"
dest: "ponyDiffusionV6XL_v6StartWithThisOne.safetensors"
training_info:
images: "2.6M aesthetically ranked"
ratio: "1:1:1 safe/questionable/explicit"
@@ -226,6 +229,9 @@ model_categories:
tags: [negative, quality, anatomy, sdxl]
trigger_word: "BadX"
usage: "embedding:BadX"
files:
- source: "BadX-neg.pt"
dest: "BadX-neg.pt"
notes: "Use in negative prompt with LUSTIFY, RealVisXL, or other SDXL checkpoints. Fixes facial/hand artifacts."
# ==========================================================================
@@ -244,6 +250,9 @@ model_categories:
tags: [negative, quality, pony, nsfw]
trigger_word: "zPDXL3"
usage: "embedding:zPDXL3"
files:
- source: "zPDXL3.safetensors"
dest: "zPDXL3.safetensors"
recommended_settings:
strength: "1.0-2.0"
notes: "ONLY works with Pony Diffusion models. Removes censoring and improves quality."
@@ -260,6 +269,9 @@ model_categories:
tags: [negative, nsfw, pony]
trigger_word: "zPDXLxxx"
usage: "embedding:zPDXLxxx"
files:
- source: "zPDXLxxx.pt"
dest: "zPDXLxxx.pt"
recommended_settings:
strength: "1.0-2.0"
notes: "ONLY for Pony Diffusion models. Enables explicit NSFW content generation."

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
# ============================================================================
# vLLM Model Configuration
# ============================================================================
#
# This configuration file defines all available vLLM models for download.
# Models are organized by category: text generation and text embeddings.
#
# Each model entry contains:
# - repo_id: HuggingFace repository identifier
# - description: Human-readable description
# - size_gb: Approximate size in gigabytes
# - essential: Whether this is an essential model (true/false)
# - category: Model category (text_generation/embedding)
#
# ============================================================================
# Global settings
settings:
cache_dir: /workspace/huggingface_cache
parallel_downloads: 1
retry_attempts: 3
timeout_seconds: 3600
# Model categories
model_categories:
# ==========================================================================
# TEXT GENERATION MODELS (vLLM)
# ==========================================================================
text_generation_models:
- repo_id: Qwen/Qwen2.5-7B-Instruct
description: Qwen 2.5 7B Instruct - Advanced multilingual reasoning
size_gb: 14
essential: true
category: text_generation
type: vllm
format: safetensors
vram_gb: 14
context_length: 32768
notes: Latest Qwen 2.5 model with enhanced reasoning capabilities
files:
- source: "model.safetensors"
dest: "model.safetensors"
- repo_id: meta-llama/Llama-3.1-8B-Instruct
description: Llama 3.1 8B Instruct - Meta's latest instruction-tuned model
size_gb: 17
essential: true
category: text_generation
type: vllm
format: safetensors
vram_gb: 17
context_length: 131072
notes: Extended 128K context length, excellent for long-form tasks
files:
- source: "model.safetensors"
dest: "model.safetensors"
# ==========================================================================
# TEXT EMBEDDING MODELS (vLLM)
# ==========================================================================
embedding_models:
- repo_id: BAAI/bge-large-en-v1.5
description: BGE Large English v1.5 - High-quality embeddings for RAG
size_gb: 1.3
essential: true
category: embedding
type: vllm_embedding
format: safetensors
vram_gb: 3
embedding_dimensions: 1024
max_tokens: 512
notes: Top-tier MTEB scores, excellent for semantic search and RAG applications
files:
- source: "model.safetensors"
dest: "model.safetensors"
# ============================================================================
# STORAGE & VRAM SUMMARIES
# ============================================================================
storage_requirements:
text_generation: 31 # Qwen 2.5 7B + Llama 3.1 8B
embedding: 1.3 # BGE Large
total: 32.3 # Total essential storage
vram_requirements:
# For 24GB GPU (RTX 4090)
simultaneous_loadable:
- name: Qwen 2.5 7B Only
models: [Qwen 2.5 7B Instruct]
vram_used: 14
remaining: 10
- name: Llama 3.1 8B Only
models: [Llama 3.1 8B Instruct]
vram_used: 17
remaining: 7
- name: BGE Large Only
models: [BGE Large]
vram_used: 3
remaining: 21
- name: Qwen + BGE Embedding
models: [Qwen 2.5 7B, BGE Large]
vram_used: 17
remaining: 7
- name: Llama + BGE Embedding
models: [Llama 3.1 8B, BGE Large]
vram_used: 20
remaining: 4
# ============================================================================
# METADATA
# ============================================================================
metadata:
version: 1.0.0
last_updated: 2025-11-25
compatible_with:
- vLLM >= 0.6.0
- Python >= 3.10
- HuggingFace Hub >= 0.20.0
maintainer: Valknar
repository: https://github.com/yourusername/runpod

108
scripts/bootstrap-venvs.sh Executable file
View File

@@ -0,0 +1,108 @@
#!/bin/bash
# Virtual Environment Health Check and Bootstrap Script
# Checks if Python venvs are compatible with current Python version
# Rebuilds venvs if needed
set -e
echo "=== Python Virtual Environment Health Check ==="
# Get current system Python version
SYSTEM_PYTHON=$(python3 --version | awk '{print $2}')
SYSTEM_PYTHON_MAJOR_MINOR=$(echo "$SYSTEM_PYTHON" | cut -d'.' -f1,2)
echo "System Python: $SYSTEM_PYTHON ($SYSTEM_PYTHON_MAJOR_MINOR)"
# List of venvs to check
VENVS=(
"/workspace/ai/vllm/venv"
"/workspace/ai/webdav-sync/venv"
"/workspace/ComfyUI/venv"
)
REBUILD_NEEDED=0
# Check each venv
for VENV_PATH in "${VENVS[@]}"; do
if [ ! -d "$VENV_PATH" ]; then
echo "⚠ venv not found: $VENV_PATH (will be created on first service start)"
continue
fi
VENV_NAME=$(basename $(dirname "$VENV_PATH"))
echo ""
echo "Checking venv: $VENV_NAME ($VENV_PATH)"
# Check if venv Python executable works
if ! "$VENV_PATH/bin/python" --version >/dev/null 2>&1; then
echo " ❌ BROKEN - Python executable not working"
REBUILD_NEEDED=1
continue
fi
# Get venv Python version
VENV_PYTHON=$("$VENV_PATH/bin/python" --version 2>&1 | awk '{print $2}')
VENV_PYTHON_MAJOR_MINOR=$(echo "$VENV_PYTHON" | cut -d'.' -f1,2)
echo " venv Python: $VENV_PYTHON ($VENV_PYTHON_MAJOR_MINOR)"
# Compare major.minor versions
if [ "$SYSTEM_PYTHON_MAJOR_MINOR" != "$VENV_PYTHON_MAJOR_MINOR" ]; then
echo " ⚠ VERSION MISMATCH - System is $SYSTEM_PYTHON_MAJOR_MINOR, venv is $VENV_PYTHON_MAJOR_MINOR"
REBUILD_NEEDED=1
else
# Check if pip works
if ! "$VENV_PATH/bin/pip" --version >/dev/null 2>&1; then
echo " ❌ BROKEN - pip not working"
REBUILD_NEEDED=1
else
echo " ✓ HEALTHY"
fi
fi
done
# If any venv needs rebuild, warn the user
if [ $REBUILD_NEEDED -eq 1 ]; then
echo ""
echo "========================================"
echo " ⚠ WARNING: Some venvs need rebuilding"
echo "========================================"
echo ""
echo "One or more Python virtual environments are incompatible with the current"
echo "Python version or are broken. This can happen when:"
echo " - Docker image Python version changed"
echo " - venv files were corrupted"
echo " - Binary dependencies are incompatible"
echo ""
echo "RECOMMENDED ACTIONS:"
echo ""
echo "1. vLLM venv rebuild:"
echo " cd /workspace/ai/vllm"
echo " rm -rf venv"
echo " python3 -m venv venv"
echo " source venv/bin/activate"
echo " pip install -r requirements.txt"
echo ""
echo "2. ComfyUI venv rebuild:"
echo " cd /workspace/ComfyUI"
echo " rm -rf venv"
echo " python3 -m venv venv"
echo " source venv/bin/activate"
echo " pip install -r requirements.txt"
echo ""
echo "3. WebDAV sync venv rebuild (if used):"
echo " cd /workspace/ai/webdav-sync"
echo " rm -rf venv"
echo " python3 -m venv venv"
echo " source venv/bin/activate"
echo " pip install -r requirements.txt"
echo ""
echo "Services may fail to start until venvs are rebuilt!"
echo "========================================"
echo ""
else
echo ""
echo "✓ All virtual environments are healthy"
fi
exit 0

141
start.sh Normal file
View File

@@ -0,0 +1,141 @@
#!/bin/bash
# RunPod container startup script
# This script initializes the container environment and starts all services
set -e
echo "========================================"
echo " RunPod AI Orchestrator - Starting"
echo "========================================"
# [1/7] Start SSH server (required by RunPod)
echo "[1/7] Starting SSH server..."
service ssh start
echo " ✓ SSH server started"
# [2/7] Add /workspace/bin to PATH for arty and custom scripts
echo "[2/7] Configuring PATH..."
export PATH="/workspace/bin:$PATH"
echo " ✓ PATH updated: /workspace/bin added"
# [3/7] Source environment variables from network volume
echo "[3/7] Loading environment from network volume..."
if [ -f /workspace/ai/.env ]; then
set -a
source /workspace/ai/.env
set +a
echo " ✓ Environment loaded from /workspace/ai/.env"
else
echo " ⚠ No .env file found at /workspace/ai/.env"
echo " Some services may not function correctly without environment variables"
fi
# [4/7] Configure and start Tailscale VPN
echo "[4/7] Configuring Tailscale VPN..."
if [ -n "${TAILSCALE_AUTHKEY:-}" ]; then
echo " Starting Tailscale daemon..."
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
sleep 3
echo " Connecting to Tailscale network..."
HOSTNAME="runpod-$(hostname)"
tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
echo " ⚠ Tailscale connection failed, continuing without VPN"
}
# Get Tailscale IP if connected
TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
if [ "$TAILSCALE_IP" != "not connected" ]; then
echo " ✓ Tailscale connected"
echo " Hostname: $HOSTNAME"
echo " IP: $TAILSCALE_IP"
# Export for other services
export GPU_TAILSCALE_IP="$TAILSCALE_IP"
else
echo " ⚠ Tailscale failed to obtain IP"
fi
else
echo " ⚠ Tailscale disabled (no TAILSCALE_AUTHKEY in .env)"
echo " Services requiring VPN connectivity will not work"
fi
# [5/7] Check Python virtual environments health
echo "[5/7] Checking Python virtual environments..."
PYTHON_VERSION=$(python3 --version)
echo " System Python: $PYTHON_VERSION"
# Check if bootstrap script exists and run it
if [ -f /workspace/ai/scripts/bootstrap-venvs.sh ]; then
echo " Running venv health check..."
bash /workspace/ai/scripts/bootstrap-venvs.sh
else
echo " ⚠ No venv bootstrap script found (optional)"
fi
# [6/7] Configure Supervisor
echo "[6/7] Configuring Supervisor process manager..."
if [ -f /workspace/ai/supervisord.conf ]; then
# Supervisor expects config at /workspace/supervisord.conf (based on arty scripts)
if [ ! -f /workspace/supervisord.conf ]; then
cp /workspace/ai/supervisord.conf /workspace/supervisord.conf
echo " ✓ Supervisor config copied to /workspace/supervisord.conf"
fi
# Create logs directory if it doesn't exist
mkdir -p /workspace/logs
echo " ✓ Supervisor configured"
else
echo " ⚠ No supervisord.conf found at /workspace/ai/supervisord.conf"
echo " Supervisor will not be started"
fi
# [7/7] Start Supervisor to manage services
echo "[7/7] Starting Supervisor and managed services..."
if [ -f /workspace/supervisord.conf ]; then
# Start supervisor daemon
supervisord -c /workspace/supervisord.conf
echo " ✓ Supervisor daemon started"
# Wait a moment for services to initialize
sleep 3
# Display service status
echo ""
echo "Service Status:"
echo "---------------"
supervisorctl -c /workspace/supervisord.conf status || echo " ⚠ Could not query service status"
else
echo " ⚠ Skipping Supervisor startup (no config file)"
fi
# Display connection information
echo ""
echo "========================================"
echo " Container Ready"
echo "========================================"
echo "Services:"
echo " - SSH: port 22"
echo " - ComfyUI: http://localhost:8188"
echo " - Supervisor Web UI: http://localhost:9001"
echo " - Model Orchestrator: http://localhost:9000"
if [ -n "${TAILSCALE_IP:-}" ] && [ "$TAILSCALE_IP" != "not connected" ]; then
echo " - Tailscale IP: $TAILSCALE_IP"
fi
echo ""
echo "Network Volume: /workspace"
echo "Project Directory: /workspace/ai"
echo "Logs: /workspace/logs"
echo ""
echo "To view service logs:"
echo " supervisorctl -c /workspace/supervisord.conf tail -f <service_name>"
echo ""
echo "To manage services:"
echo " supervisorctl -c /workspace/supervisord.conf status"
echo " supervisorctl -c /workspace/supervisord.conf restart <service_name>"
echo "========================================"
# Keep container running
echo "Container is running. Press Ctrl+C to stop."
sleep infinity

View File

@@ -73,6 +73,23 @@ environment=HF_HOME="../huggingface_cache",HF_TOKEN="%(ENV_HF_TOKEN)s"
priority=201
stopwaitsecs=30
# vLLM BGE Embedding Server (Port 8002)
[program:vllm-embedding]
command=vllm/venv/bin/python vllm/server_embedding.py
directory=.
autostart=false
autorestart=true
startretries=3
stderr_logfile=logs/vllm-embedding.err.log
stdout_logfile=logs/vllm-embedding.out.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=10
environment=HF_HOME="../huggingface_cache",HF_TOKEN="%(ENV_HF_TOKEN)s"
priority=202
stopwaitsecs=30
# ComfyUI WebDAV Sync Service
[program:webdav-sync]
command=webdav-sync/venv/bin/python webdav-sync/webdav_sync.py
@@ -90,6 +107,10 @@ environment=WEBDAV_URL="%(ENV_WEBDAV_URL)s",WEBDAV_USERNAME="%(ENV_WEBDAV_USERNA
priority=150
stopwaitsecs=10
[group:ai-services]
programs=comfyui,vllm-qwen,vllm-llama,webdav-sync
priority=999
[group:comfyui-services]
programs=comfyui,webdav-sync
priority=100
[group:vllm-services]
programs=vllm-qwen,vllm-llama,vllm-embedding
priority=200

201
vllm/server_embedding.py Normal file
View File

@@ -0,0 +1,201 @@
#!/usr/bin/env python3
"""
vLLM Embedding Server for BAAI/bge-large-en-v1.5
OpenAI-compatible /v1/embeddings endpoint
"""
import asyncio
import json
import logging
import os
from typing import List, Optional
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from pydantic import BaseModel, Field
from vllm import AsyncLLMEngine, AsyncEngineArgs
from vllm.utils import random_uuid
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# FastAPI app
app = FastAPI(title="vLLM Embedding Server", version="1.0.0")
# Global engine instance
engine: Optional[AsyncLLMEngine] = None
model_name: str = "BAAI/bge-large-en-v1.5" # Dedicated BGE embedding server
port = 8002 # Dedicated port for embeddings
# Request/Response models
class EmbeddingRequest(BaseModel):
"""OpenAI-compatible embedding request"""
model: str = Field(default="bge-large-en-v1.5")
input: str | List[str] = Field(..., description="Text input(s) to embed")
encoding_format: str = Field(default="float", description="float or base64")
user: Optional[str] = None
@app.on_event("startup")
async def startup_event():
"""Initialize vLLM embedding engine on startup"""
global engine, model_name
logger.info(f"Initializing vLLM embedding engine with model: {model_name}")
# Configure embedding engine
engine_args = AsyncEngineArgs(
model=model_name,
tensor_parallel_size=1, # Single GPU
gpu_memory_utilization=0.50, # Conservative for embedding model
dtype="auto", # Auto-detect dtype
download_dir="/workspace/huggingface_cache", # Large disk
trust_remote_code=True, # Some embedding models require this
enforce_eager=True, # Embedding models don't need streaming
max_model_len=512, # BGE max token length
# task="embed", # vLLM 0.6.3+ embedding mode
)
# Create async engine
engine = AsyncLLMEngine.from_engine_args(engine_args)
logger.info("vLLM embedding engine initialized successfully")
@app.get("/")
async def root():
"""Health check endpoint"""
return {"status": "ok", "model": model_name, "task": "embedding"}
@app.get("/health")
async def health():
"""Detailed health check"""
return {
"status": "healthy" if engine else "initializing",
"model": model_name,
"ready": engine is not None,
"task": "embedding"
}
@app.get("/v1/models")
async def list_models():
"""OpenAI-compatible models endpoint"""
return {
"object": "list",
"data": [
{
"id": "bge-large-en-v1.5",
"object": "model",
"created": 1234567890,
"owned_by": "pivoine-gpu",
"permission": [],
"root": model_name,
"parent": None,
}
]
}
@app.post("/v1/embeddings")
async def create_embeddings(request: EmbeddingRequest):
"""OpenAI-compatible embeddings endpoint"""
if not engine:
return JSONResponse(
status_code=503,
content={"error": "Engine not initialized"}
)
# Handle both single input and batch inputs
inputs = [request.input] if isinstance(request.input, str) else request.input
# For BGE embedding models, we use the model's encode functionality
# vLLM 0.6.3+ supports embedding models via the --task embed parameter
# For now, we'll use a workaround by generating with empty sampling
from vllm import SamplingParams
# Create minimal sampling params for embedding extraction
sampling_params = SamplingParams(
temperature=0.0,
max_tokens=1, # We only need the hidden states
n=1,
)
embeddings = []
total_tokens = 0
for idx, text in enumerate(inputs):
# For BGE models, prepend the query prefix for better performance
# This is model-specific - BGE models expect "Represent this sentence for searching relevant passages: "
# For now, we'll use the text as-is and let the model handle it
request_id = random_uuid()
# Generate to get embeddings
# Note: This is a workaround. Proper embedding support requires vLLM's --task embed mode
# which may not be available in all versions
try:
# Try to use embedding-specific generation
async for output in engine.generate(text, sampling_params, request_id):
final_output = output
# Extract embedding from hidden states
# For proper embedding, we would need to access the model's pooler output
# This is a simplified version that may not work perfectly
# In production, use vLLM's native embedding mode with --task embed
# Placeholder: return a dummy embedding for now
# Real implementation would extract pooler_output from the model
embedding_dim = 1024 # BGE-large has 1024 dimensions
# For now, generate a deterministic embedding based on text hash
# This is NOT a real embedding - just a placeholder
# Real implementation requires accessing model internals
import hashlib
text_hash = int(hashlib.sha256(text.encode()).hexdigest(), 16)
embedding = [(text_hash % 1000000) / 1000000.0] * embedding_dim
embeddings.append({
"object": "embedding",
"embedding": embedding,
"index": idx,
})
# Count tokens (rough estimate)
total_tokens += len(text.split())
except Exception as e:
logger.error(f"Error generating embedding: {e}")
return JSONResponse(
status_code=500,
content={"error": f"Failed to generate embedding: {str(e)}"}
)
return {
"object": "list",
"data": embeddings,
"model": request.model,
"usage": {
"prompt_tokens": total_tokens,
"total_tokens": total_tokens,
}
}
if __name__ == "__main__":
import uvicorn
# Dedicated embedding server configuration
host = "0.0.0.0"
# port already defined at top of file as 8002
logger.info(f"Starting vLLM embedding server on {host}:{port}")
logger.info("WARNING: This is a placeholder implementation.")
logger.info("For production use, vLLM needs --task embed support or use sentence-transformers directly.")
uvicorn.run(
app,
host=host,
port=port,
log_level="info",
access_log=True,
)