Commit Graph

6 Commits

Author SHA1 Message Date
f74457b049 fix: apply LlamaConfig patch globally at import time
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Previous approach patched DiT.__init__ at runtime, but models were already
instantiated and cached. This version patches LlamaConfig globally BEFORE
any DiffRhythm imports, ensuring all model instances use the correct config.

Key changes:
- Created PatchedLlamaConfig subclass that auto-calculates attention heads
- Replaced LlamaConfig in transformers.models.llama module at import time
- Patch applies to all LlamaConfig instances, including pre-loaded models

This should finally fix the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 19:00:29 +01:00
91f6e9bd59 fix: patch DiffRhythm DIT to add missing LlamaConfig attention head parameters
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
Adds monkey-patch for DiT.__init__() to properly configure LlamaConfig with
num_attention_heads and num_key_value_heads parameters, which are missing
in the upstream DiffRhythm code.

Root cause: transformers 4.49.0+ requires these parameters but DiffRhythm's
dit.py only specifies hidden_size, causing the library to incorrectly infer
head_dim as 32 instead of 64, leading to tensor dimension mismatches.

Solution:
- Sets num_attention_heads = hidden_size // 64 (standard Llama architecture)
- Sets num_key_value_heads = num_attention_heads // 4 (GQA configuration)
- Ensures head_dim = 64, fixing the "tensor a (32) vs tensor b (64)" error

This is a proper fix rather than just downgrading transformers version.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:53:18 +01:00
8c4eb8c3f1 fix: pin transformers to 4.49.0 for DiffRhythm compatibility
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Resolves tensor dimension mismatch error in rotary position embeddings.
DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause
"The size of tensor a (32) must match the size of tensor b (64)" error
due to transformer block initialization changes.

Updated pivoine_diffrhythm.py documentation to reflect actual root cause
and link to upstream GitHub issues #44 and #48.

References:
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
- https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:14:40 +01:00
67d41c3923 fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
The correct function to patch is decode_audio from infer_utils module,
which is where chunked VAE decoding actually happens. This intercepts
the call at the right level to force chunked=False.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:28:30 +01:00
1981b7b256 fix: monkey-patch DiffRhythm infer function to force chunked=False
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
The previous approach of overriding diffrhythmgen wasn't working because
ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES.
This fix monkey-patches the infer() function at module level to always
force chunked=False, preventing the tensor dimension mismatch error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 17:24:22 +01:00
5096e3ffb5 feat: add Pivoine custom ComfyUI nodes for DiffRhythm
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension
mismatch error by disabling chunked VAE decoding. The original DiffRhythm
node's overlap=32 parameter conflicts with the VAE's 64-channel architecture.

Changes:
- Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper
- Add comfyui/nodes/__init__.py: Package initialization
- Add arty.yml setup/pivoine-nodes: Deployment script for symlink
- Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun

Technical details:
- Inherits from DiffRhythmRun to avoid upstream patching
- Forces chunked=False in diffrhythmgen() override
- Requires more VRAM (~12-16GB) but RTX 4090 has 24GB
- Category: 🌸Pivoine/Audio for easy identification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 16:28:54 +01:00