feat: integrate ACE Step music generation with 19-language support

Added ACE Step v1 3.5B model for state-of-the-art music generation: - 15x faster than LLM baselines with superior structural coherence - Supports 19 languages (en, zh, ja, ko, fr, es, de, it, pt, ru + 9 more) - Voice cloning, lyric alignment, and multi-genre capabilities Changes: - Added ACE Step models to models_huggingface.yaml (checkpoint + Chinese RAP LoRA) - Added ComfyUI_ACE-Step custom node to arty.yml with installation script - Created 4 comprehensive workflows in comfyui/workflows/text-to-music/: * acestep-simple-t2m-v1.json - Basic 60s text-to-music generation * acestep-multilang-t2m-v1.json - 19-language music generation * acestep-remix-m2m-v1.json - Music-to-music remixing with style transfer * acestep-chinese-rap-v1.json - Chinese hip-hop with specialized LoRA 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 08:40:17 +01:00
parent 5af3eeb333
commit 513062623c
6 changed files with 480 additions and 0 deletions
--- a/models_huggingface.yaml
+++ b/models_huggingface.yaml
@@ -219,6 +219,34 @@ model_categories:
        - source: "pytorch_model.bin.index.json"
          dest: "musicgen-large-pytorch_model.bin.index.json"

+    # ACE Step v1 3.5B - State-of-the-art music generation
+    - repo_id: Comfy-Org/ACE-Step_ComfyUI_repackaged
+      description: ACE Step v1 3.5B - Fast coherent music generation with 19-language support
+      size_gb: 7.7
+      essential: true
+      category: audio
+      type: checkpoints
+      format: safetensors
+      vram_gb: 16
+      duration_seconds: 240
+      notes: 15x faster than LLM baselines, superior structural coherence, voice cloning, 19-language lyrics
+      files:
+        - source: "all_in_one/ace_step_v1_3.5b.safetensors"
+          dest: "ace_step_v1_3.5b.safetensors"
+
+    # ACE Step Chinese RAP LoRA (optional)
+    - repo_id: ACE-Step/ACE-Step-v1-chinese-rap-LoRA
+      description: ACE Step Chinese RAP LoRA - Enhanced Chinese pronunciation and hip-hop genre
+      size_gb: 0.3
+      essential: false
+      category: audio
+      type: loras
+      format: safetensors
+      notes: Improves Chinese pronunciation accuracy and hip-hop/electronic genre adherence
+      files:
+        - source: "pytorch_lora_weights.safetensors"
+          dest: "ace-step-chinese-rap-lora.safetensors"
+
  # ==========================================================================
  # SUPPORT MODELS (CLIP, IP-Adapter, etc.)
  # ==========================================================================