Fix "edit song, edit lyrics, edit segments must be provided" error by adding
the two missing parameters to all three DiffRhythm workflow files:
- diffrhythm-random-generation-v1.json
- diffrhythm-reference-based-v1.json
- diffrhythm-full-length-t2m-v1.json
Added empty string parameters at positions 9 and 10 in widgets_values array:
- edit_song: "" (empty when edit=false)
- edit_lyrics: "" (empty when edit=false)
The DiffRhythmRun node requires 12 parameters total, not 10. These workflows
use edit=false (no editing), so the edit parameters should be empty strings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add torchcodec to ComfyUI requirements.txt to fix audio tensor caching
error in DiffRhythm. This package is required for save_with_torchcodec
function used by DiffRhythm audio nodes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add DiffRhythm dependencies to requirements.txt (19 packages)
- Add reference audio placeholder for style transfer workflow
- DiffRhythm nodes now loading in ComfyUI
- All four workflows ready for music generation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added all required packages for ComfyUI_DiffRhythm extension:
- torchdiffeq: ODE solvers for diffusion models
- x-transformers: Transformer architecture components
- librosa: Audio analysis and feature extraction
- pandas, pyarrow: Data handling
- ema-pytorch, prefigure: Training utilities
- muq: Music quality model
- mutagen: Audio metadata handling
- pykakasi, jieba, cn2an, pypinyin: Chinese/Japanese text processing
- Unidecode, phonemizer, inflect: Text normalization and phonetic conversion
- py3langid: Language identification
These dependencies enable the DiffRhythm node to load and function properly in ComfyUI, fixing the "ModuleNotFoundError: No module named 'infer_utils'" error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Changed checkpoint from waiIllustriousSDXL_v150.safetensors to ponyDiffusionV6XL_v6StartWithThisOne.safetensors
- Fixed metadata model reference (was incorrectly referencing LoRA)
- Added files field to models_civitai.yaml for explicit filename mapping
- Aligns workflow with actual Pony Diffusion V6 XL model
Changed checkpoint from 'add-detail-xl.safetensors' (which is a LoRA) to
'waiIllustriousSDXL_v150.safetensors' which is the downloaded anime NSFW model
The upscale_model input was at index 5 instead of index 12, causing all
widget parameters to be misaligned. Fixed by:
- Updating link target index from 5 to 12 for upscale_model
- Adding explicit entries for widget parameters in inputs array
- Maintaining correct parameter order per custom node definition
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added example images for testing workflows:
- input_image.png (512x512) - for general upscaling workflows
- input_portrait.png (512x768) - for portrait/face upscaling workflows
Sound Lab's Musicgen_ node outputs AUDIO format that is only compatible with Sound Lab nodes like AudioPlay, not the built-in ComfyUI audio nodes (SaveAudio/PreviewAudio).
SaveAudio was erroring on 'waveform' key - the AUDIO output from
Musicgen_ node has a different internal structure than what SaveAudio
expects. PreviewAudio is more compatible with Sound Lab's AUDIO format.
Files are still saved to ComfyUI output directory, just through
PreviewAudio instead of SaveAudio.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed medium, small, and melody workflows:
- Replaced non-existent nodes with Musicgen_ from Sound Lab
- Added missing links arrays to connect nodes properly
- Updated all metadata and performance specs
Note: Melody workflow simplified to text-only as Sound Lab doesn't
currently support melody conditioning via audio input.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed from non-existent nodes to actual Sound Lab nodes:
- Replaced MusicGenLoader/MusicGenTextEncode/MusicGenSampler with Musicgen_
- Replaced custom SaveAudio with standard SaveAudio node
- Added missing links array to connect nodes
- All parameters: prompt, duration, guidance_scale, seed, device
Node is called "Musicgen_" (with underscore) from comfyui-sound-lab.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
SD3.5 checkpoint doesn't contain CLIP encoders. Now using:
- CheckpointLoaderSimple for MODEL and VAE
- TripleCLIPLoader for CLIP-L, CLIP-G, and T5-XXL
- Standard CLIPTextEncode for prompts
This fixes the "clip input is invalid: None" error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replaced CheckpointLoaderSimple with UNETLoader + DualCLIPLoader.
Replaced CLIPTextEncode with CLIPTextEncodeFlux.
Added proper VAELoader with ae.safetensors.
Added ConditioningZeroOut for empty negative conditioning.
Removed old negative prompt input (FLUX doesn't use it).
Changes match FLUX Dev workflow structure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added FLUX VAE (ae.safetensors) to model configuration and updated
workflow to use it instead of non-existent pixel_space VAE.
This fixes the SaveImage data type error (1, 1, 16), |u1.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Node 3 (CLIPTextEncodeFlux) output feeds both KSampler (link 3) and
ConditioningZeroOut (link 8), so the output links array must include
both links.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
FLUX models require negative conditioning even though they don't use it.
Added ConditioningZeroOut node to create empty negative conditioning from
positive output, satisfying KSampler's required negative input.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace CheckpointLoaderSimple with UNETLoader
- Replace CLIPTextEncode with DualCLIPLoader + CLIPTextEncodeFlux
- Add VAELoader with pixel_space
- Remove negative prompt (FLUX uses guidance differently)
- Set CFG to 1.0, guidance in text encoder (3.5)
- Add all node connections in links array
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace DiffusersLoader with ImageOnlyCheckpointLoader
- Replace SVDSampler with SVD_img2vid_Conditioning + KSampler
- Add VideoLinearCFGGuidance for temporal consistency
- Add all node connections in links array
- Configure VHS_VideoCombine with correct parameters (25 frames)
- Increase steps to 30 for better quality with longer video
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove format-specific parameters from widgets_values array.
Only base parameters should be in widgets_values:
- frame_rate, loop_count, filename_prefix, format, pingpong, save_output
Format-specific params (pix_fmt, crf) are added dynamically by ComfyUI.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace DiffusersLoader with ImageOnlyCheckpointLoader
- Replace SVDSampler with SVD_img2vid_Conditioning + KSampler
- Add VideoLinearCFGGuidance for temporal consistency
- Add all node connections in links array
- Configure VHS_VideoCombine with H.264 parameters
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- CogVideoX-5b-I2V requires specific resolution (720x480)
- Cannot generate videos at different resolutions
- Update placeholder image to match model requirements
- Add enable_sequential_cpu_offload=true to DownloadAndLoadCogVideoModel
- Reduces VRAM from ~20GB to ~12GB at cost of slower inference
- Widget values: [model, precision, quantization, cpu_offload] = ['THUDM/CogVideoX-5b-I2V', 'bf16', 'disabled', true]
- Necessary for 24GB GPU with other services running
- Create 1024x1024 white placeholder with 'Input Frame' text
- Allows workflow validation without external image upload
- Will be replaced by API input in production use
- Change checkpoint from 'diffusers/stable-diffusion-xl-base-1.0' to 'sd_xl_base_1.0.safetensors'
- Change IPAdapter weight_type from 'original' to 'style transfer' (valid option)
- Fixes validation errors: invalid checkpoint name and invalid weight_type