feat: add complete HunyuanVideo and Wan2.2 video generation integration

Integrated 35+ video generation models and 13 production workflows from ComfyUI docs tutorials for state-of-the-art text-to-video and image-to-video generation. Models Added (models_huggingface.yaml): - HunyuanVideo (5 models): Original T2V/I2V (720p), v1.5 (720p/1080p) with Qwen 2.5 VL - Wan2.2 diffusion models (18 models): - 5B TI2V hybrid (8GB VRAM, efficient) - 14B variants: T2V, I2V (high/low noise), Animate, S2V (FP8/BF16), Fun Camera/Control (high/low noise) - Support models (12): VAEs, UMT5-XXL, CLIP Vision H, Wav2Vec2, LLaVA encoders - LoRA accelerators (4): Lightx2v 4-step distillation for 5x speedup Workflows Added (comfyui/workflows/image-to-video/): - HunyuanVideo (5 workflows): T2V original, I2V v1/v2 (webp embedded), v1.5 T2V/I2V (JSON) - Wan2.2 (8 workflows): 5B TI2V, 14B T2V/I2V/FLF2V/Animate/S2V/Fun Camera/Fun Control - Asset files (10): Reference images, videos, audio for workflow testing Custom Nodes Added (arty.yml): - ComfyUI-KJNodes: Kijai optimizations for HunyuanVideo/Wan2.2 (FP8 scaling, video helpers) - comfyui_controlnet_aux: ControlNet preprocessors (Canny, Depth, OpenPose, MLSD) for Fun Control - ComfyUI-GGUF: GGUF quantization support for memory optimization VRAM Requirements: - HunyuanVideo original: 24GB (720p T2V/I2V, 129 frames, 5s generation) - HunyuanVideo 1.5: 30-60GB (720p/1080p, improved quality with Qwen 2.5 VL) - Wan2.2 5B: 8GB (efficient dual-expert architecture with native offloading) - Wan2.2 14B: 24GB (high-quality video generation, all modes) Note: Wan2.2 Fun Inpaint workflow not available in official templates repository (404). Tutorial Sources: - https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video - https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5 - https://docs.comfy.org/tutorials/video/wan/wan2_2 - https://docs.comfy.org/tutorials/video/wan/wan2-2-animate - https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v - https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-camera - https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 10:43:39 +01:00
parent 06b8ec0064
commit 6efb55c59f
21 changed files with 32794 additions and 0 deletions
--- a/models_huggingface.yaml
+++ b/models_huggingface.yaml
@@ -169,6 +169,301 @@ model_categories:
        - source: "svd_xt.safetensors"
          dest: "svd_xt.safetensors"

+    # HunyuanVideo - Original (720p, T2V/I2V)
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: HunyuanVideo T2V - 720p text-to-video with MLLM encoders
+      size_gb: 20
+      essential: true
+      category: video
+      type: diffusion_models
+      format: bf16
+      vram_gb: 24
+      frames: 129
+      resolution: 720p
+      notes: 5-second T2V generation with Chinese/English support, DiT architecture with 3D VAE
+      files:
+        - source: "split_files/diffusion_models/hunyuan_video_t2v_720p_bf16.safetensors"
+          dest: "hunyuan_video_t2v_720p_bf16.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: HunyuanVideo I2V v1 - 720p image-to-video (concat method)
+      size_gb: 20
+      essential: true
+      category: video
+      type: diffusion_models
+      format: bf16
+      vram_gb: 24
+      frames: 129
+      resolution: 720p
+      notes: Static image to video with concat conditioning, better motion fluidity
+      files:
+        - source: "split_files/diffusion_models/hunyuan_video_image_to_video_720p_bf16.safetensors"
+          dest: "hunyuan_video_image_to_video_720p_bf16.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: HunyuanVideo I2V v2 - 720p image-to-video (replace method)
+      size_gb: 20
+      essential: true
+      category: video
+      type: diffusion_models
+      format: bf16
+      vram_gb: 24
+      frames: 129
+      resolution: 720p
+      notes: Updated I2V with replace conditioning, better image guidance adherence
+      files:
+        - source: "split_files/diffusion_models/hunyuan_video_v2_replace_image_to_video_720p_bf16.safetensors"
+          dest: "hunyuan_video_v2_replace_image_to_video_720p_bf16.safetensors"
+
+    # HunyuanVideo 1.5 - Latest generation (720p/1080p, T2V/I2V)
+    - repo_id: Comfy-Org/HunyuanVideo_1.5_repackaged
+      description: HunyuanVideo 1.5 T2V - 720p text-to-video (8.3B parameters)
+      size_gb: 18
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp16
+      vram_gb: 24
+      frames: 129-257
+      resolution: 720p
+      notes: 5-10 second T2V with Qwen 2.5 VL encoder, requires 24GB VRAM
+      files:
+        - source: "hunyuanvideo1.5_720p_t2v_fp16.safetensors"
+          dest: "hunyuanvideo1.5_720p_t2v_fp16.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_1.5_repackaged
+      description: HunyuanVideo 1.5 SR - 1080p super-resolution (distilled)
+      size_gb: 18
+      essential: false
+      category: video
+      type: diffusion_models
+      format: fp16
+      vram_gb: 24
+      frames: 129-257
+      resolution: 1080p
+      notes: Upscales 720p to 1080p with distilled model for faster generation
+      files:
+        - source: "hunyuanvideo1.5_1080p_sr_distilled_fp16.safetensors"
+          dest: "hunyuanvideo1.5_1080p_sr_distilled_fp16.safetensors"
+
+    # Wan2.2 5B - Hybrid text+image to video (low VRAM)
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 TI2V 5B - Hybrid text+image to video (8GB VRAM)
+      size_gb: 10
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp16
+      vram_gb: 8
+      frames: 81
+      resolution: 640x640
+      notes: Efficient 5B model with native offloading, dual-expert architecture
+      files:
+        - source: "wan2.2_ti2v_5B_fp16.safetensors"
+          dest: "wan2.2_ti2v_5B_fp16.safetensors"
+
+    # Wan2.2 14B T2V - Dual-expert text-to-video
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 T2V High Noise 14B - Text-to-video high noise expert (FP8)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Dual-expert T2V high noise denoising, FP8 quantized for 24GB GPU
+      files:
+        - source: "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 T2V Low Noise 14B - Text-to-video low noise expert (FP8)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Dual-expert T2V low noise refinement, FP8 quantized for 24GB GPU
+      files:
+        - source: "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors"
+
+    # Wan2.2 14B I2V - Image-to-video with content consistency
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 I2V High Noise 14B - Image-to-video high noise expert (FP16)
+      size_gb: 28
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp16
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Dual-expert I2V high noise denoising with content consistency
+      files:
+        - source: "wan2.2_i2v_high_noise_14B_fp16.safetensors"
+          dest: "wan2.2_i2v_high_noise_14B_fp16.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 I2V Low Noise 14B - Image-to-video low noise expert (FP16)
+      size_gb: 28
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp16
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Dual-expert I2V low noise refinement with content consistency
+      files:
+        - source: "wan2.2_i2v_low_noise_14B_fp16.safetensors"
+          dest: "wan2.2_i2v_low_noise_14B_fp16.safetensors"
+
+    # Wan2.2 14B Animate - Video-to-video character animation
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Animate 14B - Video-to-video character animation (BF16)
+      size_gb: 28
+      essential: true
+      category: video
+      type: diffusion_models
+      format: bf16
+      vram_gb: 24
+      frames: 81
+      resolution: multiples of 16
+      notes: V2V animation with Mix/Move modes, requires CLIP Vision H for reference image
+      files:
+        - source: "wan2.2_animate_14B_bf16.safetensors"
+          dest: "wan2.2_animate_14B_bf16.safetensors"
+
+    # Wan2.2 14B S2V - Sound-to-video synchronization
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 S2V 14B - Sound-to-video with audio sync (FP8)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Transforms static images + audio into synchronized videos, uses Wav2Vec2 audio encoder
+      files:
+        - source: "wan2.2_s2v_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_s2v_14B_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 S2V 14B - Sound-to-video with audio sync (BF16 quality)
+      size_gb: 28
+      essential: false
+      category: video
+      type: diffusion_models
+      format: bf16
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Higher quality BF16 version of S2V for better output quality
+      files:
+        - source: "wan2.2_s2v_14B_bf16.safetensors"
+          dest: "wan2.2_s2v_14B_bf16.safetensors"
+
+    # Wan2.2 14B Fun Inpaint - Start-end frame controlled generation
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Inpaint High Noise 14B - Start-end frame transition (FP8)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Generates transition between start and end frames with high noise denoising
+      files:
+        - source: "wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Inpaint Low Noise 14B - Start-end frame transition (FP8)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: Generates transition between start and end frames with low noise refinement
+      files:
+        - source: "wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors"
+
+    # Wan2.2 14B Fun Control - ControlNet-style conditioning
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Control High Noise 14B - Control conditions (Canny/Depth/Pose/MLSD/trajectory)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: I2V with control conditions (Canny, Depth, OpenPose, MLSD, trajectory), requires controlnet_aux
+      files:
+        - source: "wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Control Low Noise 14B - Control conditions (Canny/Depth/Pose/MLSD/trajectory)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: I2V with control conditions low noise refinement
+      files:
+        - source: "wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors"
+
+    # Wan2.2 14B Fun Camera - Camera motion control
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Camera High Noise 14B - Camera motion control (pan/zoom/static)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: I2V with camera motion control (pan, zoom, static), 108s with LoRA / 536s without
+      files:
+        - source: "wan2.2_fun_camera_high_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_camera_high_noise_14B_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 Fun Camera Low Noise 14B - Camera motion control (pan/zoom/static)
+      size_gb: 14
+      essential: true
+      category: video
+      type: diffusion_models
+      format: fp8_scaled
+      vram_gb: 24
+      frames: 81
+      resolution: 640x640
+      notes: I2V with camera motion control low noise refinement
+      files:
+        - source: "wan2.2_fun_camera_low_noise_14B_fp8_scaled.safetensors"
+          dest: "wan2.2_fun_camera_low_noise_14B_fp8_scaled.safetensors"
+
  # ==========================================================================
  # AUDIO GENERATION MODELS
  # ==========================================================================
@@ -383,6 +678,205 @@ model_categories:
        - source: "text_encoders/clip_g.safetensors"
          dest: "clip_g.safetensors"

+    # HunyuanVideo Support Models
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: HunyuanVideo VAE - 3D VAE for video encoding/decoding (BF16)
+      size_gb: 1
+      essential: true
+      category: support
+      type: vae
+      format: bf16
+      vram_gb: 2
+      notes: 3D VAE autoencoder for HunyuanVideo models
+      files:
+        - source: "split_files/vae/hunyuan_video_vae_bf16.safetensors"
+          dest: "hunyuan_video_vae_bf16.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: LLaVA LLaMA3 FP8 - Multimodal text encoder for HunyuanVideo
+      size_gb: 8
+      essential: true
+      category: support
+      type: text_encoders
+      format: fp8_scaled
+      vram_gb: 4
+      notes: LLaVA LLaMA3-based text encoder with FP8 quantization
+      files:
+        - source: "split_files/text_encoders/llava_llama3_fp8_scaled.safetensors"
+          dest: "llava_llama3_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_repackaged
+      description: LLaVA LLaMA3 Vision - Vision encoder for HunyuanVideo I2V
+      size_gb: 2
+      essential: true
+      category: support
+      type: clip_vision
+      format: safetensors
+      vram_gb: 2
+      notes: Vision encoder for image-to-video conditioning
+      files:
+        - source: "split_files/clip_vision/llava_llama3_vision.safetensors"
+          dest: "llava_llama3_vision.safetensors"
+
+    # HunyuanVideo 1.5 Support Models
+    - repo_id: Comfy-Org/HunyuanVideo_1.5_repackaged
+      description: HunyuanVideo 1.5 VAE - VAE for v1.5 models (FP16)
+      size_gb: 1
+      essential: true
+      category: support
+      type: vae
+      format: fp16
+      vram_gb: 2
+      notes: VAE autoencoder for HunyuanVideo 1.5
+      files:
+        - source: "hunyuanvideo15_vae_fp16.safetensors"
+          dest: "hunyuanvideo15_vae_fp16.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_1.5_repackaged
+      description: Qwen 2.5 VL 7B FP8 - Vision-language encoder for HunyuanVideo 1.5
+      size_gb: 14
+      essential: true
+      category: support
+      type: text_encoders
+      format: fp8_scaled
+      vram_gb: 8
+      notes: Qwen 2.5 VL 7B text encoder with FP8 quantization
+      files:
+        - source: "qwen_2.5_vl_7b_fp8_scaled.safetensors"
+          dest: "qwen_2.5_vl_7b_fp8_scaled.safetensors"
+
+    - repo_id: Comfy-Org/HunyuanVideo_1.5_repackaged
+      description: ByT5 Small GlyphXL FP16 - Glyph-aware text encoder for HunyuanVideo 1.5
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: text_encoders
+      format: fp16
+      vram_gb: 1
+      notes: ByT5 small text encoder with glyph awareness
+      files:
+        - source: "byt5_small_glyphxl_fp16.safetensors"
+          dest: "byt5_small_glyphxl_fp16.safetensors"
+
+    # Wan2.2 Support Models
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan2.2 VAE - VAE for Wan2.2 5B models
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: vae
+      format: safetensors
+      vram_gb: 1
+      notes: VAE autoencoder for Wan2.2 5B TI2V model
+      files:
+        - source: "wan2.2_vae.safetensors"
+          dest: "wan2.2_vae.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wan 2.1 VAE - VAE for Wan2.2 14B models
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: vae
+      format: safetensors
+      vram_gb: 1
+      notes: VAE autoencoder for all Wan2.2 14B models (T2V, I2V, S2V, Animate, etc.)
+      files:
+        - source: "wan_2.1_vae.safetensors"
+          dest: "wan_2.1_vae.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.1_ComfyUI_repackaged
+      description: UMT5-XXL FP8 - Text encoder for all Wan2.2 models
+      size_gb: 10
+      essential: true
+      category: support
+      type: text_encoders
+      format: fp8_scaled
+      vram_gb: 5
+      notes: Shared text encoder for all Wan2.2 models (5B and 14B), FP8 quantized
+      files:
+        - source: "umt5_xxl_fp8_e4m3fn_scaled.safetensors"
+          dest: "umt5_xxl_fp8_e4m3fn_scaled.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: CLIP Vision H - Vision encoder for Wan2.2 Animate mode
+      size_gb: 4
+      essential: true
+      category: support
+      type: clip_vision
+      format: safetensors
+      vram_gb: 2
+      notes: CLIP Vision H for reference image in Wan2.2 Animate video-to-video
+      files:
+        - source: "clip_vision_h.safetensors"
+          dest: "clip_vision_h.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Wav2Vec2 Large English FP16 - Audio encoder for Wan2.2 S2V
+      size_gb: 1
+      essential: true
+      category: support
+      type: audio_models
+      format: fp16
+      vram_gb: 2
+      notes: Audio encoder for sound-to-video synchronization
+      files:
+        - source: "wav2vec2_large_english_fp16.safetensors"
+          dest: "wav2vec2_large_english_fp16.safetensors"
+
+    # Wan2.2 LoRA Accelerators (4-step distillation)
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Lightx2v I2V Animate LoRA - 4-step acceleration for Wan2.2 Animate
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: loras
+      format: bf16
+      vram_gb: 1
+      notes: 4-step LoRA for Wan2.2 Animate (480p, cfg distilled), 5x speedup
+      files:
+        - source: "lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors"
+          dest: "lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Lightx2v T2V High Noise LoRA - 4-step acceleration for Wan2.2 T2V high noise
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: loras
+      format: safetensors
+      vram_gb: 1
+      notes: 4-step LoRA for T2V high noise expert, v1.1
+      files:
+        - source: "wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors"
+          dest: "wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Lightx2v I2V High Noise LoRA - 4-step acceleration for Wan2.2 I2V high noise
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: loras
+      format: safetensors
+      vram_gb: 1
+      notes: 4-step LoRA for I2V/Fun Inpaint/Fun Control/Fun Camera high noise expert
+      files:
+        - source: "wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors"
+          dest: "wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors"
+
+    - repo_id: Comfy-Org/Wan_2.2_ComfyUI_Repackaged
+      description: Lightx2v I2V Low Noise LoRA - 4-step acceleration for Wan2.2 I2V low noise
+      size_gb: 0.5
+      essential: true
+      category: support
+      type: loras
+      format: safetensors
+      vram_gb: 1
+      notes: 4-step LoRA for I2V/Fun Inpaint/Fun Control/Fun Camera low noise expert
+      files:
+        - source: "wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors"
+          dest: "wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors"
+
  # ==========================================================================
  # ANIMATEDIFF MODELS
  # ==========================================================================