fix: pin transformers to 4.49.0 for DiffRhythm compatibility

Resolves tensor dimension mismatch error in rotary position embeddings. DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause "The size of tensor a (32) must match the size of tensor b (64)" error due to transformer block initialization changes. Updated pivoine_diffrhythm.py documentation to reflect actual root cause and link to upstream GitHub issues #44 and #48. References: - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44 - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:14:40 +01:00
parent 67d41c3923
commit 8c4eb8c3f1
2 changed files with 17 additions and 6 deletions
--- a/comfyui/nodes/pivoine_diffrhythm.py
+++ b/comfyui/nodes/pivoine_diffrhythm.py
@@ -1,7 +1,14 @@
 """
 Pivoine DiffRhythm Node
-Custom wrapper for DiffRhythm that disables chunked decoding to prevent
-tensor dimension mismatch errors (32 vs 64) in VAE overlap logic.
+Custom wrapper for DiffRhythm that ensures correct transformer library version
+compatibility and provides fallback fixes for tensor dimension issues.
+
+Known Issue: DiffRhythm requires transformers==4.49.0. Newer versions (4.50+)
+cause "The size of tensor a (32) must match the size of tensor b (64)" error
+in rotary position embeddings due to transformer block initialization changes.
+
+Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
+Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

 Author: valknar@pivoine.art
 """
@@ -24,12 +31,16 @@ from DiffRhythmNode import DiffRhythmRun

 class PivoineDiffRhythmRun(DiffRhythmRun):
    """
-    Pivoine version of DiffRhythmRun with chunked decoding forcibly disabled.
+    Pivoine version of DiffRhythmRun with enhanced compatibility and error handling.

    Changes from original:
-    - Monkey-patches the infer() function to always use chunked=False
-    - Prevents tensor dimension mismatch in VAE (32 vs 64 channel error)
+    - Monkey-patches decode_audio to always use chunked=False for stability
+    - Ensures transformers library version compatibility (requires 4.49.0)
+    - Prevents tensor dimension mismatch in VAE decoding
    - Requires more VRAM (~12-16GB) but works reliably on RTX 4090
+
+    Note: If you encounter "tensor a (32) must match tensor b (64)" errors,
+    ensure transformers==4.49.0 is installed in your ComfyUI venv.
    """

    CATEGORY = "🌸Pivoine/Audio"
--- a/comfyui/requirements.txt
+++ b/comfyui/requirements.txt
@@ -1,7 +1,7 @@
 torch
 torchvision
 torchaudio
-transformers
+transformers==4.49.0
 diffusers>=0.31.0
 accelerate
 safetensors