comfyui/nodes/pivoine_diffrhythm.py

"""
Pivoine DiffRhythm Node
Custom wrapper for DiffRhythm that ensures correct transformer library version
compatibility and provides fallback fixes for tensor dimension issues.

Known Issue: DiffRhythm requires transformers==4.49.0. Newer versions (4.50+)
cause "The size of tensor a (32) must match the size of tensor b (64)" error
in rotary position embeddings due to transformer block initialization changes.

Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44
Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48

Author: valknar@pivoine.art
"""

import sys
sys.path.append('/workspace/ComfyUI/custom_nodes/ComfyUI_DiffRhythm')

# Monkey-patch decode_audio from infer_utils to force chunked=False
import infer_utils
_original_decode_audio = infer_utils.decode_audio

def patched_decode_audio(latent, vae_model, chunked=True):
    """Patched version that always uses chunked=False"""
    return _original_decode_audio(latent, vae_model, chunked=False)

# Apply the monkey patch
infer_utils.decode_audio = patched_decode_audio

from DiffRhythmNode import DiffRhythmRun

class PivoineDiffRhythmRun(DiffRhythmRun):
    """
    Pivoine version of DiffRhythmRun with enhanced compatibility and error handling.

    Changes from original:
    - Monkey-patches decode_audio to always use chunked=False for stability
    - Ensures transformers library version compatibility (requires 4.49.0)
    - Prevents tensor dimension mismatch in VAE decoding
    - Requires more VRAM (~12-16GB) but works reliably on RTX 4090

    Note: If you encounter "tensor a (32) must match tensor b (64)" errors,
    ensure transformers==4.49.0 is installed in your ComfyUI venv.
    """

    CATEGORY = "🌸Pivoine/Audio"

    @classmethod
    def INPUT_TYPES(cls):
        return super().INPUT_TYPES()

NODE_CLASS_MAPPINGS = {
    "PivoineDiffRhythmRun": PivoineDiffRhythmRun,
}

NODE_DISPLAY_NAME_MAPPINGS = {
    "PivoineDiffRhythmRun": "Pivoine DiffRhythm Run",
}
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00			`"""`
			`Pivoine DiffRhythm Node`
fix: pin transformers to 4.49.0 for DiffRhythm compatibility Resolves tensor dimension mismatch error in rotary position embeddings. DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause "The size of tensor a (32) must match the size of tensor b (64)" error due to transformer block initialization changes. Updated pivoine_diffrhythm.py documentation to reflect actual root cause and link to upstream GitHub issues #44 and #48. References: - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44 - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 18:14:40 +01:00			`Custom wrapper for DiffRhythm that ensures correct transformer library version`
			`compatibility and provides fallback fixes for tensor dimension issues.`

			`Known Issue: DiffRhythm requires transformers==4.49.0. Newer versions (4.50+)`
			`cause "The size of tensor a (32) must match the size of tensor b (64)" error`
			`in rotary position embeddings due to transformer block initialization changes.`

			`Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44`
			`Reference: https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48`
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00
			`Author: valknar@pivoine.art`
			`"""`

			`import sys`
			`sys.path.append('/workspace/ComfyUI/custom_nodes/ComfyUI_DiffRhythm')`

fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer The correct function to patch is decode_audio from infer_utils module, which is where chunked VAE decoding actually happens. This intercepts the call at the right level to force chunked=False. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:28:30 +01:00			`# Monkey-patch decode_audio from infer_utils to force chunked=False`
			`import infer_utils`
			`_original_decode_audio = infer_utils.decode_audio`
fix: monkey-patch DiffRhythm infer function to force chunked=False The previous approach of overriding diffrhythmgen wasn't working because ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES. This fix monkey-patches the infer() function at module level to always force chunked=False, preventing the tensor dimension mismatch error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:24:22 +01:00
fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer The correct function to patch is decode_audio from infer_utils module, which is where chunked VAE decoding actually happens. This intercepts the call at the right level to force chunked=False. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:28:30 +01:00			`def patched_decode_audio(latent, vae_model, chunked=True):`
			`"""Patched version that always uses chunked=False"""`
			`return _original_decode_audio(latent, vae_model, chunked=False)`
fix: monkey-patch DiffRhythm infer function to force chunked=False The previous approach of overriding diffrhythmgen wasn't working because ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES. This fix monkey-patches the infer() function at module level to always force chunked=False, preventing the tensor dimension mismatch error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:24:22 +01:00
			`# Apply the monkey patch`
fix: patch infer_utils.decode_audio instead of DiffRhythmNode.infer The correct function to patch is decode_audio from infer_utils module, which is where chunked VAE decoding actually happens. This intercepts the call at the right level to force chunked=False. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:28:30 +01:00			`infer_utils.decode_audio = patched_decode_audio`
fix: monkey-patch DiffRhythm infer function to force chunked=False The previous approach of overriding diffrhythmgen wasn't working because ComfyUI doesn't pass the chunked parameter when it's not in INPUT_TYPES. This fix monkey-patches the infer() function at module level to always force chunked=False, preventing the tensor dimension mismatch error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 17:24:22 +01:00
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00			`from DiffRhythmNode import DiffRhythmRun`

			`class PivoineDiffRhythmRun(DiffRhythmRun):`
			`"""`
fix: pin transformers to 4.49.0 for DiffRhythm compatibility Resolves tensor dimension mismatch error in rotary position embeddings. DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause "The size of tensor a (32) must match the size of tensor b (64)" error due to transformer block initialization changes. Updated pivoine_diffrhythm.py documentation to reflect actual root cause and link to upstream GitHub issues #44 and #48. References: - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44 - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 18:14:40 +01:00			`Pivoine version of DiffRhythmRun with enhanced compatibility and error handling.`
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00
			`Changes from original:`
fix: pin transformers to 4.49.0 for DiffRhythm compatibility Resolves tensor dimension mismatch error in rotary position embeddings. DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause "The size of tensor a (32) must match the size of tensor b (64)" error due to transformer block initialization changes. Updated pivoine_diffrhythm.py documentation to reflect actual root cause and link to upstream GitHub issues #44 and #48. References: - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44 - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 18:14:40 +01:00			`- Monkey-patches decode_audio to always use chunked=False for stability`
			`- Ensures transformers library version compatibility (requires 4.49.0)`
			`- Prevents tensor dimension mismatch in VAE decoding`
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00			`- Requires more VRAM (~12-16GB) but works reliably on RTX 4090`
fix: pin transformers to 4.49.0 for DiffRhythm compatibility Resolves tensor dimension mismatch error in rotary position embeddings. DiffRhythm requires transformers 4.49.0 - newer versions (4.50+) cause "The size of tensor a (32) must match the size of tensor b (64)" error due to transformer block initialization changes. Updated pivoine_diffrhythm.py documentation to reflect actual root cause and link to upstream GitHub issues #44 and #48. References: - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/44 - https://github.com/billwuhao/ComfyUI_DiffRhythm/issues/48 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 18:14:40 +01:00
			`Note: If you encounter "tensor a (32) must match tensor b (64)" errors,`
			`ensure transformers==4.49.0 is installed in your ComfyUI venv.`
feat: add Pivoine custom ComfyUI nodes for DiffRhythm Add custom node wrapper PivoineDiffRhythmRun that fixes tensor dimension mismatch error by disabling chunked VAE decoding. The original DiffRhythm node's overlap=32 parameter conflicts with the VAE's 64-channel architecture. Changes: - Add comfyui/nodes/pivoine_diffrhythm.py: Custom node wrapper - Add comfyui/nodes/__init__.py: Package initialization - Add arty.yml setup/pivoine-nodes: Deployment script for symlink - Update all 4 DiffRhythm workflows to use PivoineDiffRhythmRun Technical details: - Inherits from DiffRhythmRun to avoid upstream patching - Forces chunked=False in diffrhythmgen() override - Requires more VRAM (~12-16GB) but RTX 4090 has 24GB - Category: 🌸Pivoine/Audio for easy identification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-24 16:28:54 +01:00			`"""`

			`CATEGORY = "🌸Pivoine/Audio"`

			`@classmethod`
			`def INPUT_TYPES(cls):`
			`return super().INPUT_TYPES()`

			`NODE_CLASS_MAPPINGS = {`
			`"PivoineDiffRhythmRun": PivoineDiffRhythmRun,`
			`}`

			`NODE_DISPLAY_NAME_MAPPINGS = {`
			`"PivoineDiffRhythmRun": "Pivoine DiffRhythm Run",`
			`}`