Removed acestep-simple-t2m-v1.json as the official Comfy-Org workflows provide better quality:
- acestep-official-t2m-v1.json - Advanced T2M with specialized nodes
- acestep-m2m-editing-v1.json - Music-to-music editing capability
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed 3 empty placeholder workflows that only contained metadata:
- acestep-multilang-t2m-v1.json
- acestep-remix-m2m-v1.json
- acestep-chinese-rap-v1.json
Kept only the functional workflow:
- acestep-simple-t2m-v1.json (6 nodes, fully operational)
Users can use the simple workflow and modify the prompt for different use cases:
- Multi-language: prefix lyrics with language tags like [zh], [ja], [ko]
- Remixing: load audio input and adjust denoise strength (0.1-0.7)
- Chinese RAP: use Chinese RAP LoRA with strength 0.8-1.0
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added ACE Step v1 3.5B model for state-of-the-art music generation:
- 15x faster than LLM baselines with superior structural coherence
- Supports 19 languages (en, zh, ja, ko, fr, es, de, it, pt, ru + 9 more)
- Voice cloning, lyric alignment, and multi-genre capabilities
Changes:
- Added ACE Step models to models_huggingface.yaml (checkpoint + Chinese RAP LoRA)
- Added ComfyUI_ACE-Step custom node to arty.yml with installation script
- Created 4 comprehensive workflows in comfyui/workflows/text-to-music/:
* acestep-simple-t2m-v1.json - Basic 60s text-to-music generation
* acestep-multilang-t2m-v1.json - 19-language music generation
* acestep-remix-m2m-v1.json - Music-to-music remixing with style transfer
* acestep-chinese-rap-v1.json - Chinese hip-hop with specialized LoRA
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed diffrhythm-random-generation-v1.json as it's no longer needed.
Keeping only the essential DiffRhythm workflows:
- simple text-to-music (95s)
- full-length generation (4m45s)
- reference-based style transfer
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove custom PivoineDiffRhythmRun wrapper node
- Add git patch file for ComfyUI_DiffRhythm __init__.py
- Patch adds LlamaConfig fix at import time
- Add arty script 'fix/diffrhythm-patch' to apply patch
- Revert all workflows to use original DiffRhythmRun
- Remove startup_patch.py and revert start.sh
This approach is cleaner and more maintainable than wrapping the node.
The patch directly fixes the tensor dimension mismatch (32 vs 64) in
DiffRhythm's rotary position embeddings by ensuring num_attention_heads
and num_key_value_heads are properly set based on hidden_size.
References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48
The parameters must match the diffrhythmgen() function signature order,
not the INPUT_TYPES order. The function has 'edit' as the first parameter.
Correct widgets_values order (11 parameters):
0: edit (boolean)
1: model (string)
2: style_prompt (string)
3: lyrics_or_edit_lyrics (string)
4: edit_segments (string)
5: odeint_method (enum)
6: steps (int)
7: cfg (int)
8: quality_or_speed (enum)
9: unload_model (boolean)
10: seed (int)
Note: style_audio_or_edit_song comes from input connection (not in widgets)
Note: chunked parameter is hidden (not in widgets)
Updated workflows:
- diffrhythm-simple-t2m-v1.json
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Fix "edit song, edit lyrics, edit segments must be provided" error by adding
the two missing parameters to all three DiffRhythm workflow files:
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json
Added empty string parameters at positions 9 and 10 in widgets_values array:
- edit_song: "" (empty when edit=false)
- edit_lyrics: "" (empty when edit=false)
The DiffRhythmRun node requires 12 parameters total, not 10. These workflows
use edit=false (no editing), so the edit parameters should be empty strings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add DiffRhythm dependencies to requirements.txt (19 packages)
- Add reference audio placeholder for style transfer workflow
- DiffRhythm nodes now loading in ComfyUI
- All four workflows ready for music generation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Changed checkpoint from waiIllustriousSDXL_v150.safetensors to ponyDiffusionV6XL_v6StartWithThisOne.safetensors
- Fixed metadata model reference (was incorrectly referencing LoRA)
- Added files field to models_civitai.yaml for explicit filename mapping
- Aligns workflow with actual Pony Diffusion V6 XL model
Changed checkpoint from 'add-detail-xl.safetensors' (which is a LoRA) to
'waiIllustriousSDXL_v150.safetensors' which is the downloaded anime NSFW model
The upscale_model input was at index 5 instead of index 12, causing all
widget parameters to be misaligned. Fixed by:
- Updating link target index from 5 to 12 for upscale_model
- Adding explicit entries for widget parameters in inputs array
- Maintaining correct parameter order per custom node definition
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added example images for testing workflows:
- input_image.png (512x512) - for general upscaling workflows
- input_portrait.png (512x768) - for portrait/face upscaling workflows
Sound Lab's Musicgen_ node outputs AUDIO format that is only compatible with Sound Lab nodes like AudioPlay, not the built-in ComfyUI audio nodes (SaveAudio/PreviewAudio).
SaveAudio was erroring on 'waveform' key - the AUDIO output from
Musicgen_ node has a different internal structure than what SaveAudio
expects. PreviewAudio is more compatible with Sound Lab's AUDIO format.
Files are still saved to ComfyUI output directory, just through
PreviewAudio instead of SaveAudio.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed medium, small, and melody workflows:
- Replaced non-existent nodes with Musicgen_ from Sound Lab
- Added missing links arrays to connect nodes properly
- Updated all metadata and performance specs
Note: Melody workflow simplified to text-only as Sound Lab doesn't
currently support melody conditioning via audio input.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed from non-existent nodes to actual Sound Lab nodes:
- Replaced MusicGenLoader/MusicGenTextEncode/MusicGenSampler with Musicgen_
- Replaced custom SaveAudio with standard SaveAudio node
- Added missing links array to connect nodes
- All parameters: prompt, duration, guidance_scale, seed, device
Node is called "Musicgen_" (with underscore) from comfyui-sound-lab.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
SD3.5 checkpoint doesn't contain CLIP encoders. Now using:
- CheckpointLoaderSimple for MODEL and VAE
- TripleCLIPLoader for CLIP-L, CLIP-G, and T5-XXL
- Standard CLIPTextEncode for prompts
This fixes the "clip input is invalid: None" error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replaced CheckpointLoaderSimple with UNETLoader + DualCLIPLoader.
Replaced CLIPTextEncode with CLIPTextEncodeFlux.
Added proper VAELoader with ae.safetensors.
Added ConditioningZeroOut for empty negative conditioning.
Removed old negative prompt input (FLUX doesn't use it).
Changes match FLUX Dev workflow structure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added FLUX VAE (ae.safetensors) to model configuration and updated
workflow to use it instead of non-existent pixel_space VAE.
This fixes the SaveImage data type error (1, 1, 16), |u1.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Node 3 (CLIPTextEncodeFlux) output feeds both KSampler (link 3) and
ConditioningZeroOut (link 8), so the output links array must include
both links.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
FLUX models require negative conditioning even though they don't use it.
Added ConditioningZeroOut node to create empty negative conditioning from
positive output, satisfying KSampler's required negative input.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace CheckpointLoaderSimple with UNETLoader
- Replace CLIPTextEncode with DualCLIPLoader + CLIPTextEncodeFlux
- Add VAELoader with pixel_space
- Remove negative prompt (FLUX uses guidance differently)
- Set CFG to 1.0, guidance in text encoder (3.5)
- Add all node connections in links array
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace DiffusersLoader with ImageOnlyCheckpointLoader
- Replace SVDSampler with SVD_img2vid_Conditioning + KSampler
- Add VideoLinearCFGGuidance for temporal consistency
- Add all node connections in links array
- Configure VHS_VideoCombine with correct parameters (25 frames)
- Increase steps to 30 for better quality with longer video
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove format-specific parameters from widgets_values array.
Only base parameters should be in widgets_values:
- frame_rate, loop_count, filename_prefix, format, pingpong, save_output
Format-specific params (pix_fmt, crf) are added dynamically by ComfyUI.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace DiffusersLoader with ImageOnlyCheckpointLoader
- Replace SVDSampler with SVD_img2vid_Conditioning + KSampler
- Add VideoLinearCFGGuidance for temporal consistency
- Add all node connections in links array
- Configure VHS_VideoCombine with H.264 parameters
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- CogVideoX-5b-I2V requires specific resolution (720x480)
- Cannot generate videos at different resolutions
- Update placeholder image to match model requirements