feat: add Arty configuration and Claude Code documentation

- Add arty.yml for repository management with environment profiles (prod/dev/minimal) - Add CLAUDE.md with comprehensive architecture and usage documentation - Add comfyui_models.yaml for ComfyUI model configuration - Include deployment scripts for model linking and dependency installation - Document all git repositories (ComfyUI + 10 custom nodes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:50:36 +01:00
parent c9b01eef68
commit 2207d60f98
3 changed files with 891 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,411 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+This is a lightweight, process-based AI model orchestrator designed for RunPod GPU instances (specifically RTX 4090 with 24GB VRAM). It manages sequential loading of multiple large AI models on a single GPU, providing OpenAI-compatible API endpoints for text, image, and audio generation.
+
+**Key Design Philosophy:**
+- **Sequential model loading** - Only one model active at a time to fit within GPU memory constraints
+- **Process-based architecture** - Uses Python subprocess instead of Docker-in-Docker for RunPod compatibility
+- **Automatic model switching** - Orchestrator detects request types and switches models on-demand
+- **OpenAI-compatible APIs** - Works seamlessly with LiteLLM proxy and other AI tools
+
+## Architecture
+
+### Core Components
+
+1. **Orchestrator** (`model-orchestrator/orchestrator_subprocess.py`)
+   - FastAPI proxy server listening on port 9000
+   - Manages model lifecycle via Python subprocesses
+   - Routes requests to appropriate model services
+   - Handles sequential model loading/unloading
+
+2. **Model Registry** (`model-orchestrator/models.yaml`)
+   - YAML configuration defining available models
+   - Specifies: type, framework, service script, port, VRAM requirements, startup time
+   - Easy to extend with new models
+
+3. **Model Services** (`models/*/`)
+   - Individual Python servers running specific AI models
+   - vLLM for text generation (Qwen 2.5 7B, Llama 3.1 8B)
+   - ComfyUI for image/video/audio generation (FLUX, SDXL, CogVideoX, MusicGen)
+
+4. **Ansible Provisioning** (`playbook.yml`)
+   - Complete infrastructure-as-code setup
+   - Installs dependencies, downloads models, configures services
+   - Supports selective installation via tags
+
+### Why Process-Based Instead of Docker?
+
+The subprocess implementation (`orchestrator_subprocess.py`) is preferred over the Docker version (`orchestrator.py`) because:
+- RunPod instances run in containers - Docker-in-Docker adds complexity
+- Faster model startup (direct Python process spawning)
+- Simpler debugging (single process tree)
+- Reduced overhead (no container management layer)
+
+**Note:** Always use `orchestrator_subprocess.py` for RunPod deployments.
+
+## Common Commands
+
+### Repository Management with Arty
+
+This project uses Arty for repository and deployment management. See `arty.yml` for full configuration.
+
+```bash
+# Clone all repositories (fresh deployment)
+arty sync --env prod          # Production: Essential nodes only
+arty sync --env dev           # Development: All nodes including optional
+arty sync --env minimal       # Minimal: Just orchestrator + ComfyUI base
+
+# Run deployment scripts
+arty run setup/full           # Show setup instructions
+arty run models/link-comfyui  # Link downloaded models to ComfyUI
+arty run deps/comfyui-nodes   # Install custom node dependencies
+arty run services/start       # Start orchestrator
+arty run services/stop        # Stop all services
+
+# Health checks
+arty run health/orchestrator  # Check orchestrator
+arty run health/comfyui      # Check ComfyUI
+arty run check/gpu           # nvidia-smi
+arty run check/models        # Show cache size
+```
+
+### Initial Setup
+
+```bash
+# 1. Clone repositories with Arty (fresh RunPod instance)
+arty sync --env prod
+
+# 2. Configure environment
+cd /workspace/ai
+cp .env.example .env
+# Edit .env and set HF_TOKEN=your_huggingface_token
+
+# 3. Full deployment with Ansible
+ansible-playbook playbook.yml
+
+# 4. Essential ComfyUI setup (faster, ~80GB instead of ~137GB)
+ansible-playbook playbook.yml --tags comfyui-essential
+
+# 5. Link models to ComfyUI
+arty run models/link-comfyui
+
+# 6. Install custom node dependencies
+arty run deps/comfyui-nodes
+
+# 7. Selective installation (base system + Python + vLLM models only)
+ansible-playbook playbook.yml --tags base,python,dependencies
+```
+
+### Service Management
+
+```bash
+# Start orchestrator (runs in foreground)
+bash scripts/start-all.sh
+# Or directly:
+python3 model-orchestrator/orchestrator_subprocess.py
+
+# Stop all services
+bash scripts/stop-all.sh
+
+# Stop orchestrator only
+pkill -f orchestrator_subprocess.py
+
+# Stop specific model service
+pkill -f "models/vllm/server.py"
+```
+
+### Testing
+
+```bash
+# Health check
+curl http://localhost:9000/health
+
+# List available models
+curl http://localhost:9000/v1/models
+
+# Test text generation (streaming)
+curl -s -N -X POST http://localhost:9000/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "qwen-2.5-7b",
+    "messages": [{"role": "user", "content": "Count to 5"}],
+    "max_tokens": 50,
+    "stream": true
+  }'
+
+# Test image generation
+curl -X POST http://localhost:9000/v1/images/generations \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "flux-schnell",
+    "prompt": "A serene mountain landscape at sunset",
+    "size": "1024x1024"
+  }'
+```
+
+### Ansible Tags Reference
+
+**System Setup:**
+- `base` - Base system packages
+- `python` - Python environment setup
+- `dependencies` - Install Python packages
+
+**Model Installation:**
+- `models` - Download vLLM/Flux/MusicGen models (legacy)
+- `comfyui` - Install ComfyUI base
+- `comfyui-essential` - Quick setup (ComfyUI + essential models only, ~80GB)
+- `comfyui-models-image` - Image generation models (FLUX, SDXL, SD3.5)
+- `comfyui-models-video` - Video generation models (CogVideoX, SVD)
+- `comfyui-models-audio` - Audio generation models (MusicGen variants)
+- `comfyui-models-support` - CLIP, IP-Adapter, ControlNet models
+- `comfyui-models-all` - All ComfyUI models (~137GB)
+- `comfyui-nodes` - Install essential custom nodes
+
+**Infrastructure:**
+- `tailscale` - Install Tailscale VPN client
+- `systemd` - Configure systemd services (use `never` - not for RunPod)
+- `validate` - Health checks (use `never` - run explicitly)
+
+### Adding New Models
+
+1. **Add model definition to `model-orchestrator/models.yaml`:**
+
+```yaml
+llama-3.1-8b:
+  type: text
+  framework: vllm
+  service_script: models/vllm/server_llama.py
+  port: 8001
+  vram_gb: 17
+  startup_time_seconds: 120
+  endpoint: /v1/chat/completions
+  description: "Llama 3.1 8B Instruct"
+```
+
+2. **Create service script** (`models/vllm/server_llama.py`):
+
+```python
+import os
+from vllm.entrypoints.openai.api_server import run_server
+
+model = "meta-llama/Llama-3.1-8B-Instruct"
+port = int(os.getenv("PORT", 8001))
+run_server(model=model, port=port)
+```
+
+3. **Download model** (handled by Ansible playbook or manually via HuggingFace CLI)
+
+4. **Restart orchestrator:**
+
+```bash
+bash scripts/stop-all.sh && bash scripts/start-all.sh
+```
+
+## Key Implementation Details
+
+### Model Switching Logic
+
+The orchestrator automatically switches models based on:
+- **Endpoint path** - `/v1/chat/completions` → text models, `/v1/images/generations` → image models
+- **Model name in request** - Matches against model registry
+- **Sequential loading** - Stops current model before starting new one to conserve VRAM
+
+See `orchestrator_subprocess.py:64-100` for process management implementation.
+
+### Model Registry Structure
+
+Each model in `models.yaml` requires:
+- `type` - text, image, or audio
+- `framework` - vllm, openedai-images, audiocraft, comfyui
+- `service_script` - Relative path to Python/shell script
+- `port` - Service port (8000+)
+- `vram_gb` - GPU memory requirement
+- `startup_time_seconds` - Max health check timeout
+- `endpoint` - API endpoint path
+- `description` - Human-readable description
+
+### Environment Variables
+
+Set in `.env` file:
+- `HF_TOKEN` - **Required** - HuggingFace API token for model downloads
+- `GPU_TAILSCALE_IP` - Optional - Tailscale IP for VPN access
+
+Models are cached in:
+- `/workspace/huggingface_cache` - HuggingFace models
+- `/workspace/models` - Other model files
+- `/workspace/ComfyUI/models` - ComfyUI model directory structure
+
+### Integration with LiteLLM
+
+For unified API management through LiteLLM proxy:
+
+**LiteLLM configuration (`litellm-config.yaml` on VPS):**
+```yaml
+model_list:
+  - model_name: qwen-2.5-7b
+    litellm_params:
+      model: hosted_vllm/openai/qwen-2.5-7b  # Use hosted_vllm prefix!
+      api_base: http://100.121.199.88:9000/v1  # Tailscale VPN IP
+      api_key: dummy
+      stream: true
+      timeout: 600
+```
+
+**Critical:** Use `hosted_vllm/openai/` prefix for vLLM models to enable proper streaming support. Wrong prefix causes empty delta chunks.
+
+### ComfyUI Installation
+
+ComfyUI provides advanced image/video/audio generation capabilities:
+
+**Directory structure created:**
+```
+/workspace/ComfyUI/
+├── models/
+│   ├── checkpoints/        # FLUX, SDXL, SD3 models
+│   ├── clip_vision/        # CLIP vision models
+│   ├── video_models/       # CogVideoX, SVD
+│   ├── audio_models/       # MusicGen
+│   └── custom_nodes/       # Extension nodes
+```
+
+**Essential custom nodes installed:**
+- ComfyUI-Manager - Model/node management GUI
+- ComfyUI-VideoHelperSuite - Video operations
+- ComfyUI-AnimateDiff-Evolved - Video generation
+- ComfyUI_IPAdapter_plus - Style transfer
+- ComfyUI-Impact-Pack - Auto face enhancement
+- comfyui-sound-lab - Audio generation
+
+**VRAM requirements for 24GB GPU:**
+- FLUX Schnell FP16: 23GB (leaves 1GB)
+- SDXL Base: 12GB
+- CogVideoX-5B: 12GB (with optimizations)
+- MusicGen Medium: 8GB
+
+See `COMFYUI_MODELS.md` for detailed model catalog and usage examples.
+
+## Deployment Workflow
+
+### RunPod Deployment (Current Setup)
+
+1. **Clone repository:**
+   ```bash
+   cd /workspace
+   git clone <repo-url> ai
+   cd ai
+   ```
+
+2. **Configure environment:**
+   ```bash
+   cp .env.example .env
+   # Edit .env, set HF_TOKEN
+   ```
+
+3. **Run Ansible provisioning:**
+   ```bash
+   ansible-playbook playbook.yml
+   # Or selective: --tags base,python,comfyui-essential
+   ```
+
+4. **Start services:**
+   ```bash
+   bash scripts/start-all.sh
+   ```
+
+5. **Verify:**
+   ```bash
+   curl http://localhost:9000/health
+   ```
+
+### Tailscale VPN Integration
+
+To connect RunPod GPU to VPS infrastructure:
+
+```bash
+# On RunPod instance
+curl -fsSL https://tailscale.com/install.sh | sh
+tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
+tailscale up --advertise-tags=tag:gpu
+tailscale ip -4  # Get IP for LiteLLM config
+```
+
+Benefits: Secure tunnel, no public exposure, low latency.
+
+## Project Structure
+
+```
+runpod/
+├── model-orchestrator/
+│   ├── orchestrator_subprocess.py  # Main orchestrator (USE THIS)
+│   ├── orchestrator.py             # Docker-based version (legacy)
+│   ├── models.yaml                 # Model registry
+│   └── requirements.txt
+├── models/
+│   ├── vllm/
+│   │   ├── server.py               # vLLM text generation service
+│   │   └── requirements.txt
+│   └── comfyui/
+│       ├── start.sh                # ComfyUI startup script
+│       └── requirements.txt
+├── scripts/
+│   ├── start-all.sh                # Start orchestrator
+│   └── stop-all.sh                 # Stop all services
+├── arty.yml                        # Arty repository manager config
+├── playbook.yml                    # Ansible provisioning playbook
+├── inventory.yml                   # Ansible inventory (localhost)
+├── ansible.cfg                     # Ansible configuration
+├── .env.example                    # Environment variables template
+├── CLAUDE.md                       # This file
+├── COMFYUI_MODELS.md               # ComfyUI models catalog
+├── MODELS_LINKED.md                # Model linkage documentation
+├── comfyui_models.yaml             # ComfyUI model configuration
+└── README.md                       # User documentation
+```
+
+## Troubleshooting
+
+### Model fails to start
+- Check VRAM: `nvidia-smi`
+- Verify model weights downloaded: `ls -lh /workspace/huggingface_cache`
+- Check port conflicts: `lsof -i :9000`
+- Test model directly: `python3 models/vllm/server.py`
+
+### Streaming returns empty deltas
+- Use correct LiteLLM model prefix: `hosted_vllm/openai/model-name`
+- Set `stream: true` in LiteLLM config
+- Verify orchestrator proxies streaming correctly
+
+### HuggingFace download errors
+- Check token: `echo $HF_TOKEN`
+- Set in .env: `HF_TOKEN=your_token_here`
+- Re-run Ansible: `ansible-playbook playbook.yml --tags dependencies`
+
+### Out of storage space
+- Check disk usage: `df -h /workspace`
+- Use essential tags: `--tags comfyui-essential` (~80GB vs ~137GB)
+- Clear cache: `rm -rf /workspace/huggingface_cache`
+
+### Orchestrator not responding
+- Check process: `ps aux | grep orchestrator`
+- View logs: Check terminal output where orchestrator was started
+- Restart: `bash scripts/stop-all.sh && bash scripts/start-all.sh`
+
+## Performance Notes
+
+- **Model switching time:** 30-120 seconds (depends on model size)
+- **Text generation:** ~20-40 tokens/second (Qwen 2.5 7B on RTX 4090)
+- **Image generation:** 4-5 seconds per image (FLUX Schnell)
+- **Music generation:** 60-90 seconds for 30s audio (MusicGen Medium)
+
+## Important Conventions
+
+- **Always use `orchestrator_subprocess.py`** - Not the Docker version
+- **Sequential loading only** - One model active at a time for 24GB VRAM
+- **Models downloaded by Ansible** - Use playbook tags, not manual downloads
+- **Services run as processes** - Not systemd (RunPod containers don't support it)
+- **Environment managed via .env** - Required: HF_TOKEN
+- **Port 9000 for orchestrator** - Model services use 8000+
--- a/arty.yml
+++ b/arty.yml
@@ -0,0 +1,212 @@
+name: "RunPod AI Model Orchestrator"
+version: "2.0.0"
+description: "Process-based AI model orchestrator for RunPod GPU instances with ComfyUI integration"
+author: "valknar@pivoine.art"
+license: "MIT"
+
+# Git repositories to clone for a fresh RunPod deployment
+references:
+  # ComfyUI base installation
+  - url: https://github.com/comfyanonymous/ComfyUI.git
+    into: /workspace/ComfyUI
+    description: "ComfyUI - Node-based interface for image/video/audio generation"
+
+  # ComfyUI Essential Custom Nodes
+  - url: https://github.com/ltdrdata/ComfyUI-Manager.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
+    description: "ComfyUI Manager - Install/manage custom nodes and models"
+    essential: true
+
+  - url: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
+    description: "Video operations and processing"
+    essential: true
+
+  - url: https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
+    description: "AnimateDiff for video generation"
+    essential: true
+
+  - url: https://github.com/cubiq/ComfyUI_IPAdapter_plus.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
+    description: "IP-Adapter for style transfer"
+    essential: true
+
+  - url: https://github.com/ltdrdata/ComfyUI-Impact-Pack.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
+    description: "Auto face enhancement and detailer"
+    essential: true
+
+  # ComfyUI Optional Custom Nodes
+  - url: https://github.com/kijai/ComfyUI-CogVideoXWrapper.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
+    description: "CogVideoX integration for text-to-video"
+    essential: false
+
+  - url: https://github.com/ltdrdata/ComfyUI-Inspire-Pack.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
+    description: "Additional inspiration tools"
+    essential: false
+
+  - url: https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNet.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet
+    description: "Advanced ControlNet features"
+    essential: false
+
+  - url: https://github.com/MrForExample/ComfyUI-3D-Pack.git
+    into: /workspace/ComfyUI/custom_nodes/ComfyUI-3D-Pack
+    description: "3D asset generation"
+    essential: false
+
+  - url: https://github.com/MixLabPro/comfyui-sound-lab.git
+    into: /workspace/ComfyUI/custom_nodes/comfyui-sound-lab
+    description: "MusicGen and Stable Audio integration"
+    essential: false
+
+# Environment profiles for selective repository management
+envs:
+  # Production: Only essential components
+  prod:
+    - /workspace/ai
+    - /workspace/ComfyUI
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
+    - /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
+
+  # Development: All repositories including optional nodes
+  dev:
+    - /workspace/ai
+    - /workspace/ComfyUI
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
+    - /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-3D-Pack
+    - /workspace/ComfyUI/custom_nodes/comfyui-sound-lab
+
+  # Minimal: Only orchestrator and ComfyUI base
+  minimal:
+    - /workspace/ai
+    - /workspace/ComfyUI
+    - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
+
+# Deployment scripts for RunPod instances
+scripts:
+  # Initial setup
+  setup/full: |
+    cd /workspace/ai
+    cp .env.example .env
+    echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml"
+
+  setup/essential: |
+    cd /workspace/ai
+    cp .env.example .env
+    echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml --tags comfyui-essential"
+
+  # Model linking (run after models are downloaded)
+  models/link-comfyui: |
+    cd /workspace/ComfyUI/models/diffusers
+    ln -sf /workspace/huggingface_cache/models--black-forest-labs--FLUX.1-schnell FLUX.1-schnell
+    ln -sf /workspace/huggingface_cache/models--black-forest-labs--FLUX.1-dev FLUX.1-dev
+    ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-xl-base-1.0 stable-diffusion-xl-base-1.0
+    ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-xl-refiner-1.0 stable-diffusion-xl-refiner-1.0
+    ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-3.5-large stable-diffusion-3.5-large
+    cd /workspace/ComfyUI/models/clip_vision
+    ln -sf /workspace/huggingface_cache/models--openai--clip-vit-large-patch14 clip-vit-large-patch14
+    ln -sf /workspace/huggingface_cache/models--laion--CLIP-ViT-bigG-14-laion2B-39B-b160k CLIP-ViT-bigG-14
+    ln -sf /workspace/huggingface_cache/models--google--siglip-so400m-patch14-384 siglip-so400m-patch14-384
+    cd /workspace/ComfyUI/models/diffusion_models
+    ln -sf /workspace/huggingface_cache/models--THUDM--CogVideoX-5b CogVideoX-5b
+    ln -sf /workspace/huggingface_cache/models--stabilityai--stable-video-diffusion-img2vid stable-video-diffusion-img2vid
+    ln -sf /workspace/huggingface_cache/models--stabilityai--stable-video-diffusion-img2vid-xt stable-video-diffusion-img2vid-xt
+    echo "Models linked to ComfyUI"
+
+  # Service management
+  services/start: bash /workspace/ai/scripts/start-all.sh
+  services/stop: bash /workspace/ai/scripts/stop-all.sh
+  services/restart: bash /workspace/ai/scripts/stop-all.sh && bash /workspace/ai/scripts/start-all.sh
+
+  # Dependency installation
+  deps/comfyui-nodes: |
+    pip3 install -r /workspace/ComfyUI/custom_nodes/ComfyUI-Manager/requirements.txt
+    pip3 install -r /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite/requirements.txt
+    pip3 install 'numpy<2.0.0' --force-reinstall
+    echo "Custom node dependencies installed"
+
+  # Ansible provisioning shortcuts
+  ansible/base: cd /workspace/ai && ansible-playbook playbook.yml --tags base,python,dependencies
+  ansible/vllm: cd /workspace/ai && ansible-playbook playbook.yml --tags models
+  ansible/comfyui: cd /workspace/ai && ansible-playbook playbook.yml --tags comfyui,comfyui-essential
+  ansible/comfyui-all: cd /workspace/ai && ansible-playbook playbook.yml --tags comfyui,comfyui-models-all,comfyui-nodes
+  ansible/full: cd /workspace/ai && ansible-playbook playbook.yml
+
+  # Health checks
+  health/orchestrator: curl http://localhost:9000/health
+  health/comfyui: curl http://localhost:8188
+  health/vllm: curl http://localhost:8000/health
+
+  # System checks
+  check/gpu: nvidia-smi
+  check/disk: df -h /workspace
+  check/models: du -sh /workspace/huggingface_cache
+  check/cache: find /workspace/huggingface_cache -type d -name 'models--*' -maxdepth 1
+
+# Deployment notes
+notes: |
+  RunPod AI Model Orchestrator - Quick Start
+
+  1. Fresh Deployment:
+     - Clone repositories: arty sync --env prod
+     - Configure environment: cd /workspace/ai && cp .env.example .env
+     - Set HF_TOKEN in .env file
+     - Run Ansible: ansible-playbook playbook.yml --tags comfyui-essential
+     - Link models: arty run models/link-comfyui
+     - Install node deps: arty run deps/comfyui-nodes
+     - Start services: arty run services/start
+
+  2. Model Downloads:
+     - Essential (~80GB): ansible-playbook playbook.yml --tags comfyui-essential
+     - All models (~137GB): ansible-playbook playbook.yml --tags comfyui-models-all
+
+  3. Service Management:
+     - Start: arty run services/start
+     - Stop: arty run services/stop
+     - Restart: arty run services/restart
+
+  4. Health Checks:
+     - Orchestrator: arty run health/orchestrator
+     - ComfyUI: arty run health/comfyui
+     - vLLM: arty run health/vllm
+
+  5. Environment Profiles:
+     - Production (essential only): arty sync --env prod
+     - Development (all nodes): arty sync --env dev
+     - Minimal (orchestrator + ComfyUI only): arty sync --env minimal
+
+  6. Important Files:
+     - Configuration: /workspace/ai/playbook.yml
+     - Model registry: /workspace/ai/model-orchestrator/models.yaml
+     - Environment: /workspace/ai/.env
+     - Services: /workspace/ai/scripts/*.sh
+
+  7. Ports:
+     - Orchestrator: 9000
+     - ComfyUI: 8188
+     - vLLM: 8000+
+
+  8. Storage:
+     - Models cache: /workspace/huggingface_cache (~401GB)
+     - ComfyUI models: /workspace/ComfyUI/models (symlinks to cache)
+     - Project: /workspace/ai
+
+  For detailed documentation, see:
+  - /workspace/ai/README.md
+  - /workspace/ai/CLAUDE.md
+  - /workspace/ai/COMFYUI_MODELS.md
+  - /workspace/ai/MODELS_LINKED.md
--- a/comfyui_models.yaml
+++ b/comfyui_models.yaml
@@ -0,0 +1,268 @@
+# ============================================================================
+# ComfyUI Model Configuration
+# ============================================================================
+#
+# This configuration file defines all available ComfyUI models for download.
+# Models are organized by category: image, video, audio, and support models.
+#
+# Each model entry contains:
+#   - repo_id: HuggingFace repository identifier
+#   - description: Human-readable description
+#   - size_gb: Approximate size in gigabytes
+#   - essential: Whether this is an essential model (true/false)
+#   - category: Model category (image/video/audio/support)
+#
+# ============================================================================
+
+# Global settings
+settings:
+  cache_dir: /workspace/huggingface_cache
+  parallel_downloads: 1
+  retry_attempts: 3
+  timeout_seconds: 3600
+
+# Model categories
+model_categories:
+  # ==========================================================================
+  # IMAGE GENERATION MODELS
+  # ==========================================================================
+  image_models:
+    - repo_id: black-forest-labs/FLUX.1-schnell
+      description: FLUX.1 Schnell - Fast 4-step inference
+      size_gb: 23
+      essential: true
+      category: image
+      format: fp16
+      vram_gb: 23
+      notes: Industry-leading image generation quality
+
+    - repo_id: black-forest-labs/FLUX.1-dev
+      description: FLUX.1 Dev - Balanced quality/speed
+      size_gb: 23
+      essential: false
+      category: image
+      format: fp16
+      vram_gb: 23
+      notes: Development version with enhanced features
+
+    - repo_id: stabilityai/stable-diffusion-xl-base-1.0
+      description: SDXL Base 1.0 - Industry standard
+      size_gb: 7
+      essential: true
+      category: image
+      format: fp16
+      vram_gb: 12
+      notes: Most widely used Stable Diffusion model
+
+    - repo_id: stabilityai/stable-diffusion-xl-refiner-1.0
+      description: SDXL Refiner 1.0 - Enhances base output
+      size_gb: 6
+      essential: false
+      category: image
+      format: fp16
+      vram_gb: 12
+      notes: Use after SDXL base for improved details
+
+    - repo_id: stabilityai/stable-diffusion-3.5-large
+      description: SD 3.5 Large - Latest Stability AI
+      size_gb: 18
+      essential: false
+      category: image
+      format: fp16
+      vram_gb: 20
+      notes: Newest generation Stable Diffusion
+
+  # ==========================================================================
+  # VIDEO GENERATION MODELS
+  # ==========================================================================
+  video_models:
+    - repo_id: THUDM/CogVideoX-5b
+      description: CogVideoX-5B - Professional text-to-video
+      size_gb: 20
+      essential: true
+      category: video
+      format: fp16
+      vram_gb: 20
+      frames: 49
+      resolution: 720p
+      notes: State-of-the-art text-to-video generation
+
+    - repo_id: stabilityai/stable-video-diffusion-img2vid
+      description: SVD - 14 frame image-to-video
+      size_gb: 8
+      essential: true
+      category: video
+      format: fp16
+      vram_gb: 20
+      frames: 14
+      resolution: 576x1024
+      notes: Convert images to short video clips
+
+    - repo_id: stabilityai/stable-video-diffusion-img2vid-xt
+      description: SVD-XT - 25 frame image-to-video
+      size_gb: 8
+      essential: false
+      category: video
+      format: fp16
+      vram_gb: 20
+      frames: 25
+      resolution: 576x1024
+      notes: Extended frame count version
+
+  # ==========================================================================
+  # AUDIO GENERATION MODELS
+  # ==========================================================================
+  audio_models:
+    - repo_id: facebook/musicgen-small
+      description: MusicGen Small - Fast generation
+      size_gb: 3
+      essential: false
+      category: audio
+      format: fp32
+      vram_gb: 4
+      duration_seconds: 30
+      notes: Fastest music generation, lower quality
+
+    - repo_id: facebook/musicgen-medium
+      description: MusicGen Medium - Balanced quality
+      size_gb: 11
+      essential: true
+      category: audio
+      format: fp32
+      vram_gb: 8
+      duration_seconds: 30
+      notes: Best balance of speed and quality
+
+    - repo_id: facebook/musicgen-large
+      description: MusicGen Large - Highest quality
+      size_gb: 22
+      essential: false
+      category: audio
+      format: fp32
+      vram_gb: 16
+      duration_seconds: 30
+      notes: Best quality, slower generation
+
+  # ==========================================================================
+  # SUPPORT MODELS (CLIP, IP-Adapter, etc.)
+  # ==========================================================================
+  support_models:
+    - repo_id: openai/clip-vit-large-patch14
+      description: CLIP H - For SD 1.5 IP-Adapter
+      size_gb: 2
+      essential: true
+      category: support
+      format: fp32
+      vram_gb: 2
+      notes: Text-image understanding model
+
+    - repo_id: laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
+      description: CLIP G - For SDXL IP-Adapter
+      size_gb: 7
+      essential: true
+      category: support
+      format: fp32
+      vram_gb: 4
+      notes: Larger CLIP model for SDXL
+
+    - repo_id: google/siglip-so400m-patch14-384
+      description: SigLIP - For FLUX models
+      size_gb: 2
+      essential: true
+      category: support
+      format: fp32
+      vram_gb: 2
+      notes: Advanced image-text alignment
+
+# ============================================================================
+# STORAGE & VRAM SUMMARIES
+# ============================================================================
+
+storage_requirements:
+  essential_only:
+    image: 30      # FLUX Schnell + SDXL Base
+    video: 28      # CogVideoX + SVD
+    audio: 11      # MusicGen Medium
+    support: 11    # All 3 CLIP models
+    total: 80      # Total essential storage
+
+  all_models:
+    image: 54      # All image models
+    video: 36      # All video models
+    audio: 36      # All audio models
+    support: 11    # All support models
+    total: 137     # Total with optional models
+
+vram_requirements:
+  # For 24GB GPU (RTX 4090)
+  simultaneous_loadable:
+    - name: Image Focus - FLUX FP16
+      models: [FLUX.1 Schnell]
+      vram_used: 23
+      remaining: 1
+
+    - name: Image Focus - FLUX FP8 + SDXL
+      models: [FLUX.1 Schnell FP8, SDXL Base]
+      vram_used: 24
+      remaining: 0
+
+    - name: Video Generation
+      models: [CogVideoX-5B optimized, SDXL]
+      vram_used: 24
+      remaining: 0
+
+    - name: Multi-Modal
+      models: [SDXL, MusicGen Medium]
+      vram_used: 20
+      remaining: 4
+
+# ============================================================================
+# INSTALLATION PROFILES
+# ============================================================================
+
+installation_profiles:
+  minimal:
+    description: Minimal setup for testing
+    categories: [support_models]
+    storage_gb: 11
+    estimated_time: 5-10 minutes
+
+  essential:
+    description: Essential models only (~80GB)
+    categories: [image_models, video_models, audio_models, support_models]
+    essential_only: true
+    storage_gb: 80
+    estimated_time: 1-2 hours
+
+  image_focused:
+    description: All image generation models
+    categories: [image_models, support_models]
+    storage_gb: 65
+    estimated_time: 45-90 minutes
+
+  video_focused:
+    description: All video generation models
+    categories: [video_models, image_models, support_models]
+    essential_only: true
+    storage_gb: 69
+    estimated_time: 1-2 hours
+
+  complete:
+    description: All models (including optional)
+    categories: [image_models, video_models, audio_models, support_models]
+    storage_gb: 137
+    estimated_time: 2-4 hours
+
+# ============================================================================
+# METADATA
+# ============================================================================
+
+metadata:
+  version: 1.0.0
+  last_updated: 2025-11-21
+  compatible_with:
+    - ComfyUI >= 0.1.0
+    - Python >= 3.10
+    - HuggingFace Hub >= 0.20.0
+  maintainer: Valknar
+  repository: https://github.com/yourusername/runpod