diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..90f98d4 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,411 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Overview + +This is a lightweight, process-based AI model orchestrator designed for RunPod GPU instances (specifically RTX 4090 with 24GB VRAM). It manages sequential loading of multiple large AI models on a single GPU, providing OpenAI-compatible API endpoints for text, image, and audio generation. + +**Key Design Philosophy:** +- **Sequential model loading** - Only one model active at a time to fit within GPU memory constraints +- **Process-based architecture** - Uses Python subprocess instead of Docker-in-Docker for RunPod compatibility +- **Automatic model switching** - Orchestrator detects request types and switches models on-demand +- **OpenAI-compatible APIs** - Works seamlessly with LiteLLM proxy and other AI tools + +## Architecture + +### Core Components + +1. **Orchestrator** (`model-orchestrator/orchestrator_subprocess.py`) + - FastAPI proxy server listening on port 9000 + - Manages model lifecycle via Python subprocesses + - Routes requests to appropriate model services + - Handles sequential model loading/unloading + +2. **Model Registry** (`model-orchestrator/models.yaml`) + - YAML configuration defining available models + - Specifies: type, framework, service script, port, VRAM requirements, startup time + - Easy to extend with new models + +3. **Model Services** (`models/*/`) + - Individual Python servers running specific AI models + - vLLM for text generation (Qwen 2.5 7B, Llama 3.1 8B) + - ComfyUI for image/video/audio generation (FLUX, SDXL, CogVideoX, MusicGen) + +4. **Ansible Provisioning** (`playbook.yml`) + - Complete infrastructure-as-code setup + - Installs dependencies, downloads models, configures services + - Supports selective installation via tags + +### Why Process-Based Instead of Docker? + +The subprocess implementation (`orchestrator_subprocess.py`) is preferred over the Docker version (`orchestrator.py`) because: +- RunPod instances run in containers - Docker-in-Docker adds complexity +- Faster model startup (direct Python process spawning) +- Simpler debugging (single process tree) +- Reduced overhead (no container management layer) + +**Note:** Always use `orchestrator_subprocess.py` for RunPod deployments. + +## Common Commands + +### Repository Management with Arty + +This project uses Arty for repository and deployment management. See `arty.yml` for full configuration. + +```bash +# Clone all repositories (fresh deployment) +arty sync --env prod # Production: Essential nodes only +arty sync --env dev # Development: All nodes including optional +arty sync --env minimal # Minimal: Just orchestrator + ComfyUI base + +# Run deployment scripts +arty run setup/full # Show setup instructions +arty run models/link-comfyui # Link downloaded models to ComfyUI +arty run deps/comfyui-nodes # Install custom node dependencies +arty run services/start # Start orchestrator +arty run services/stop # Stop all services + +# Health checks +arty run health/orchestrator # Check orchestrator +arty run health/comfyui # Check ComfyUI +arty run check/gpu # nvidia-smi +arty run check/models # Show cache size +``` + +### Initial Setup + +```bash +# 1. Clone repositories with Arty (fresh RunPod instance) +arty sync --env prod + +# 2. Configure environment +cd /workspace/ai +cp .env.example .env +# Edit .env and set HF_TOKEN=your_huggingface_token + +# 3. Full deployment with Ansible +ansible-playbook playbook.yml + +# 4. Essential ComfyUI setup (faster, ~80GB instead of ~137GB) +ansible-playbook playbook.yml --tags comfyui-essential + +# 5. Link models to ComfyUI +arty run models/link-comfyui + +# 6. Install custom node dependencies +arty run deps/comfyui-nodes + +# 7. Selective installation (base system + Python + vLLM models only) +ansible-playbook playbook.yml --tags base,python,dependencies +``` + +### Service Management + +```bash +# Start orchestrator (runs in foreground) +bash scripts/start-all.sh +# Or directly: +python3 model-orchestrator/orchestrator_subprocess.py + +# Stop all services +bash scripts/stop-all.sh + +# Stop orchestrator only +pkill -f orchestrator_subprocess.py + +# Stop specific model service +pkill -f "models/vllm/server.py" +``` + +### Testing + +```bash +# Health check +curl http://localhost:9000/health + +# List available models +curl http://localhost:9000/v1/models + +# Test text generation (streaming) +curl -s -N -X POST http://localhost:9000/v1/chat/completions \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "qwen-2.5-7b", + "messages": [{"role": "user", "content": "Count to 5"}], + "max_tokens": 50, + "stream": true + }' + +# Test image generation +curl -X POST http://localhost:9000/v1/images/generations \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "flux-schnell", + "prompt": "A serene mountain landscape at sunset", + "size": "1024x1024" + }' +``` + +### Ansible Tags Reference + +**System Setup:** +- `base` - Base system packages +- `python` - Python environment setup +- `dependencies` - Install Python packages + +**Model Installation:** +- `models` - Download vLLM/Flux/MusicGen models (legacy) +- `comfyui` - Install ComfyUI base +- `comfyui-essential` - Quick setup (ComfyUI + essential models only, ~80GB) +- `comfyui-models-image` - Image generation models (FLUX, SDXL, SD3.5) +- `comfyui-models-video` - Video generation models (CogVideoX, SVD) +- `comfyui-models-audio` - Audio generation models (MusicGen variants) +- `comfyui-models-support` - CLIP, IP-Adapter, ControlNet models +- `comfyui-models-all` - All ComfyUI models (~137GB) +- `comfyui-nodes` - Install essential custom nodes + +**Infrastructure:** +- `tailscale` - Install Tailscale VPN client +- `systemd` - Configure systemd services (use `never` - not for RunPod) +- `validate` - Health checks (use `never` - run explicitly) + +### Adding New Models + +1. **Add model definition to `model-orchestrator/models.yaml`:** + +```yaml +llama-3.1-8b: + type: text + framework: vllm + service_script: models/vllm/server_llama.py + port: 8001 + vram_gb: 17 + startup_time_seconds: 120 + endpoint: /v1/chat/completions + description: "Llama 3.1 8B Instruct" +``` + +2. **Create service script** (`models/vllm/server_llama.py`): + +```python +import os +from vllm.entrypoints.openai.api_server import run_server + +model = "meta-llama/Llama-3.1-8B-Instruct" +port = int(os.getenv("PORT", 8001)) +run_server(model=model, port=port) +``` + +3. **Download model** (handled by Ansible playbook or manually via HuggingFace CLI) + +4. **Restart orchestrator:** + +```bash +bash scripts/stop-all.sh && bash scripts/start-all.sh +``` + +## Key Implementation Details + +### Model Switching Logic + +The orchestrator automatically switches models based on: +- **Endpoint path** - `/v1/chat/completions` → text models, `/v1/images/generations` → image models +- **Model name in request** - Matches against model registry +- **Sequential loading** - Stops current model before starting new one to conserve VRAM + +See `orchestrator_subprocess.py:64-100` for process management implementation. + +### Model Registry Structure + +Each model in `models.yaml` requires: +- `type` - text, image, or audio +- `framework` - vllm, openedai-images, audiocraft, comfyui +- `service_script` - Relative path to Python/shell script +- `port` - Service port (8000+) +- `vram_gb` - GPU memory requirement +- `startup_time_seconds` - Max health check timeout +- `endpoint` - API endpoint path +- `description` - Human-readable description + +### Environment Variables + +Set in `.env` file: +- `HF_TOKEN` - **Required** - HuggingFace API token for model downloads +- `GPU_TAILSCALE_IP` - Optional - Tailscale IP for VPN access + +Models are cached in: +- `/workspace/huggingface_cache` - HuggingFace models +- `/workspace/models` - Other model files +- `/workspace/ComfyUI/models` - ComfyUI model directory structure + +### Integration with LiteLLM + +For unified API management through LiteLLM proxy: + +**LiteLLM configuration (`litellm-config.yaml` on VPS):** +```yaml +model_list: + - model_name: qwen-2.5-7b + litellm_params: + model: hosted_vllm/openai/qwen-2.5-7b # Use hosted_vllm prefix! + api_base: http://100.121.199.88:9000/v1 # Tailscale VPN IP + api_key: dummy + stream: true + timeout: 600 +``` + +**Critical:** Use `hosted_vllm/openai/` prefix for vLLM models to enable proper streaming support. Wrong prefix causes empty delta chunks. + +### ComfyUI Installation + +ComfyUI provides advanced image/video/audio generation capabilities: + +**Directory structure created:** +``` +/workspace/ComfyUI/ +├── models/ +│ ├── checkpoints/ # FLUX, SDXL, SD3 models +│ ├── clip_vision/ # CLIP vision models +│ ├── video_models/ # CogVideoX, SVD +│ ├── audio_models/ # MusicGen +│ └── custom_nodes/ # Extension nodes +``` + +**Essential custom nodes installed:** +- ComfyUI-Manager - Model/node management GUI +- ComfyUI-VideoHelperSuite - Video operations +- ComfyUI-AnimateDiff-Evolved - Video generation +- ComfyUI_IPAdapter_plus - Style transfer +- ComfyUI-Impact-Pack - Auto face enhancement +- comfyui-sound-lab - Audio generation + +**VRAM requirements for 24GB GPU:** +- FLUX Schnell FP16: 23GB (leaves 1GB) +- SDXL Base: 12GB +- CogVideoX-5B: 12GB (with optimizations) +- MusicGen Medium: 8GB + +See `COMFYUI_MODELS.md` for detailed model catalog and usage examples. + +## Deployment Workflow + +### RunPod Deployment (Current Setup) + +1. **Clone repository:** + ```bash + cd /workspace + git clone ai + cd ai + ``` + +2. **Configure environment:** + ```bash + cp .env.example .env + # Edit .env, set HF_TOKEN + ``` + +3. **Run Ansible provisioning:** + ```bash + ansible-playbook playbook.yml + # Or selective: --tags base,python,comfyui-essential + ``` + +4. **Start services:** + ```bash + bash scripts/start-all.sh + ``` + +5. **Verify:** + ```bash + curl http://localhost:9000/health + ``` + +### Tailscale VPN Integration + +To connect RunPod GPU to VPS infrastructure: + +```bash +# On RunPod instance +curl -fsSL https://tailscale.com/install.sh | sh +tailscaled --tun=userspace-networking --socks5-server=localhost:1055 & +tailscale up --advertise-tags=tag:gpu +tailscale ip -4 # Get IP for LiteLLM config +``` + +Benefits: Secure tunnel, no public exposure, low latency. + +## Project Structure + +``` +runpod/ +├── model-orchestrator/ +│ ├── orchestrator_subprocess.py # Main orchestrator (USE THIS) +│ ├── orchestrator.py # Docker-based version (legacy) +│ ├── models.yaml # Model registry +│ └── requirements.txt +├── models/ +│ ├── vllm/ +│ │ ├── server.py # vLLM text generation service +│ │ └── requirements.txt +│ └── comfyui/ +│ ├── start.sh # ComfyUI startup script +│ └── requirements.txt +├── scripts/ +│ ├── start-all.sh # Start orchestrator +│ └── stop-all.sh # Stop all services +├── arty.yml # Arty repository manager config +├── playbook.yml # Ansible provisioning playbook +├── inventory.yml # Ansible inventory (localhost) +├── ansible.cfg # Ansible configuration +├── .env.example # Environment variables template +├── CLAUDE.md # This file +├── COMFYUI_MODELS.md # ComfyUI models catalog +├── MODELS_LINKED.md # Model linkage documentation +├── comfyui_models.yaml # ComfyUI model configuration +└── README.md # User documentation +``` + +## Troubleshooting + +### Model fails to start +- Check VRAM: `nvidia-smi` +- Verify model weights downloaded: `ls -lh /workspace/huggingface_cache` +- Check port conflicts: `lsof -i :9000` +- Test model directly: `python3 models/vllm/server.py` + +### Streaming returns empty deltas +- Use correct LiteLLM model prefix: `hosted_vllm/openai/model-name` +- Set `stream: true` in LiteLLM config +- Verify orchestrator proxies streaming correctly + +### HuggingFace download errors +- Check token: `echo $HF_TOKEN` +- Set in .env: `HF_TOKEN=your_token_here` +- Re-run Ansible: `ansible-playbook playbook.yml --tags dependencies` + +### Out of storage space +- Check disk usage: `df -h /workspace` +- Use essential tags: `--tags comfyui-essential` (~80GB vs ~137GB) +- Clear cache: `rm -rf /workspace/huggingface_cache` + +### Orchestrator not responding +- Check process: `ps aux | grep orchestrator` +- View logs: Check terminal output where orchestrator was started +- Restart: `bash scripts/stop-all.sh && bash scripts/start-all.sh` + +## Performance Notes + +- **Model switching time:** 30-120 seconds (depends on model size) +- **Text generation:** ~20-40 tokens/second (Qwen 2.5 7B on RTX 4090) +- **Image generation:** 4-5 seconds per image (FLUX Schnell) +- **Music generation:** 60-90 seconds for 30s audio (MusicGen Medium) + +## Important Conventions + +- **Always use `orchestrator_subprocess.py`** - Not the Docker version +- **Sequential loading only** - One model active at a time for 24GB VRAM +- **Models downloaded by Ansible** - Use playbook tags, not manual downloads +- **Services run as processes** - Not systemd (RunPod containers don't support it) +- **Environment managed via .env** - Required: HF_TOKEN +- **Port 9000 for orchestrator** - Model services use 8000+ diff --git a/arty.yml b/arty.yml new file mode 100644 index 0000000..c9a0560 --- /dev/null +++ b/arty.yml @@ -0,0 +1,212 @@ +name: "RunPod AI Model Orchestrator" +version: "2.0.0" +description: "Process-based AI model orchestrator for RunPod GPU instances with ComfyUI integration" +author: "valknar@pivoine.art" +license: "MIT" + +# Git repositories to clone for a fresh RunPod deployment +references: + # ComfyUI base installation + - url: https://github.com/comfyanonymous/ComfyUI.git + into: /workspace/ComfyUI + description: "ComfyUI - Node-based interface for image/video/audio generation" + + # ComfyUI Essential Custom Nodes + - url: https://github.com/ltdrdata/ComfyUI-Manager.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-Manager + description: "ComfyUI Manager - Install/manage custom nodes and models" + essential: true + + - url: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite + description: "Video operations and processing" + essential: true + + - url: https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved + description: "AnimateDiff for video generation" + essential: true + + - url: https://github.com/cubiq/ComfyUI_IPAdapter_plus.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus + description: "IP-Adapter for style transfer" + essential: true + + - url: https://github.com/ltdrdata/ComfyUI-Impact-Pack.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack + description: "Auto face enhancement and detailer" + essential: true + + # ComfyUI Optional Custom Nodes + - url: https://github.com/kijai/ComfyUI-CogVideoXWrapper.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper + description: "CogVideoX integration for text-to-video" + essential: false + + - url: https://github.com/ltdrdata/ComfyUI-Inspire-Pack.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack + description: "Additional inspiration tools" + essential: false + + - url: https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNet.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet + description: "Advanced ControlNet features" + essential: false + + - url: https://github.com/MrForExample/ComfyUI-3D-Pack.git + into: /workspace/ComfyUI/custom_nodes/ComfyUI-3D-Pack + description: "3D asset generation" + essential: false + + - url: https://github.com/MixLabPro/comfyui-sound-lab.git + into: /workspace/ComfyUI/custom_nodes/comfyui-sound-lab + description: "MusicGen and Stable Audio integration" + essential: false + +# Environment profiles for selective repository management +envs: + # Production: Only essential components + prod: + - /workspace/ai + - /workspace/ComfyUI + - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager + - /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite + - /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved + - /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus + - /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack + + # Development: All repositories including optional nodes + dev: + - /workspace/ai + - /workspace/ComfyUI + - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager + - /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite + - /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved + - /workspace/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus + - /workspace/ComfyUI/custom_nodes/ComfyUI-Impact-Pack + - /workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper + - /workspace/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack + - /workspace/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet + - /workspace/ComfyUI/custom_nodes/ComfyUI-3D-Pack + - /workspace/ComfyUI/custom_nodes/comfyui-sound-lab + + # Minimal: Only orchestrator and ComfyUI base + minimal: + - /workspace/ai + - /workspace/ComfyUI + - /workspace/ComfyUI/custom_nodes/ComfyUI-Manager + +# Deployment scripts for RunPod instances +scripts: + # Initial setup + setup/full: | + cd /workspace/ai + cp .env.example .env + echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml" + + setup/essential: | + cd /workspace/ai + cp .env.example .env + echo "Edit .env and set HF_TOKEN, then run: ansible-playbook playbook.yml --tags comfyui-essential" + + # Model linking (run after models are downloaded) + models/link-comfyui: | + cd /workspace/ComfyUI/models/diffusers + ln -sf /workspace/huggingface_cache/models--black-forest-labs--FLUX.1-schnell FLUX.1-schnell + ln -sf /workspace/huggingface_cache/models--black-forest-labs--FLUX.1-dev FLUX.1-dev + ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-xl-base-1.0 stable-diffusion-xl-base-1.0 + ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-xl-refiner-1.0 stable-diffusion-xl-refiner-1.0 + ln -sf /workspace/huggingface_cache/models--stabilityai--stable-diffusion-3.5-large stable-diffusion-3.5-large + cd /workspace/ComfyUI/models/clip_vision + ln -sf /workspace/huggingface_cache/models--openai--clip-vit-large-patch14 clip-vit-large-patch14 + ln -sf /workspace/huggingface_cache/models--laion--CLIP-ViT-bigG-14-laion2B-39B-b160k CLIP-ViT-bigG-14 + ln -sf /workspace/huggingface_cache/models--google--siglip-so400m-patch14-384 siglip-so400m-patch14-384 + cd /workspace/ComfyUI/models/diffusion_models + ln -sf /workspace/huggingface_cache/models--THUDM--CogVideoX-5b CogVideoX-5b + ln -sf /workspace/huggingface_cache/models--stabilityai--stable-video-diffusion-img2vid stable-video-diffusion-img2vid + ln -sf /workspace/huggingface_cache/models--stabilityai--stable-video-diffusion-img2vid-xt stable-video-diffusion-img2vid-xt + echo "Models linked to ComfyUI" + + # Service management + services/start: bash /workspace/ai/scripts/start-all.sh + services/stop: bash /workspace/ai/scripts/stop-all.sh + services/restart: bash /workspace/ai/scripts/stop-all.sh && bash /workspace/ai/scripts/start-all.sh + + # Dependency installation + deps/comfyui-nodes: | + pip3 install -r /workspace/ComfyUI/custom_nodes/ComfyUI-Manager/requirements.txt + pip3 install -r /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite/requirements.txt + pip3 install 'numpy<2.0.0' --force-reinstall + echo "Custom node dependencies installed" + + # Ansible provisioning shortcuts + ansible/base: cd /workspace/ai && ansible-playbook playbook.yml --tags base,python,dependencies + ansible/vllm: cd /workspace/ai && ansible-playbook playbook.yml --tags models + ansible/comfyui: cd /workspace/ai && ansible-playbook playbook.yml --tags comfyui,comfyui-essential + ansible/comfyui-all: cd /workspace/ai && ansible-playbook playbook.yml --tags comfyui,comfyui-models-all,comfyui-nodes + ansible/full: cd /workspace/ai && ansible-playbook playbook.yml + + # Health checks + health/orchestrator: curl http://localhost:9000/health + health/comfyui: curl http://localhost:8188 + health/vllm: curl http://localhost:8000/health + + # System checks + check/gpu: nvidia-smi + check/disk: df -h /workspace + check/models: du -sh /workspace/huggingface_cache + check/cache: find /workspace/huggingface_cache -type d -name 'models--*' -maxdepth 1 + +# Deployment notes +notes: | + RunPod AI Model Orchestrator - Quick Start + + 1. Fresh Deployment: + - Clone repositories: arty sync --env prod + - Configure environment: cd /workspace/ai && cp .env.example .env + - Set HF_TOKEN in .env file + - Run Ansible: ansible-playbook playbook.yml --tags comfyui-essential + - Link models: arty run models/link-comfyui + - Install node deps: arty run deps/comfyui-nodes + - Start services: arty run services/start + + 2. Model Downloads: + - Essential (~80GB): ansible-playbook playbook.yml --tags comfyui-essential + - All models (~137GB): ansible-playbook playbook.yml --tags comfyui-models-all + + 3. Service Management: + - Start: arty run services/start + - Stop: arty run services/stop + - Restart: arty run services/restart + + 4. Health Checks: + - Orchestrator: arty run health/orchestrator + - ComfyUI: arty run health/comfyui + - vLLM: arty run health/vllm + + 5. Environment Profiles: + - Production (essential only): arty sync --env prod + - Development (all nodes): arty sync --env dev + - Minimal (orchestrator + ComfyUI only): arty sync --env minimal + + 6. Important Files: + - Configuration: /workspace/ai/playbook.yml + - Model registry: /workspace/ai/model-orchestrator/models.yaml + - Environment: /workspace/ai/.env + - Services: /workspace/ai/scripts/*.sh + + 7. Ports: + - Orchestrator: 9000 + - ComfyUI: 8188 + - vLLM: 8000+ + + 8. Storage: + - Models cache: /workspace/huggingface_cache (~401GB) + - ComfyUI models: /workspace/ComfyUI/models (symlinks to cache) + - Project: /workspace/ai + + For detailed documentation, see: + - /workspace/ai/README.md + - /workspace/ai/CLAUDE.md + - /workspace/ai/COMFYUI_MODELS.md + - /workspace/ai/MODELS_LINKED.md diff --git a/comfyui_models.yaml b/comfyui_models.yaml new file mode 100644 index 0000000..dd8eb7a --- /dev/null +++ b/comfyui_models.yaml @@ -0,0 +1,268 @@ +# ============================================================================ +# ComfyUI Model Configuration +# ============================================================================ +# +# This configuration file defines all available ComfyUI models for download. +# Models are organized by category: image, video, audio, and support models. +# +# Each model entry contains: +# - repo_id: HuggingFace repository identifier +# - description: Human-readable description +# - size_gb: Approximate size in gigabytes +# - essential: Whether this is an essential model (true/false) +# - category: Model category (image/video/audio/support) +# +# ============================================================================ + +# Global settings +settings: + cache_dir: /workspace/huggingface_cache + parallel_downloads: 1 + retry_attempts: 3 + timeout_seconds: 3600 + +# Model categories +model_categories: + # ========================================================================== + # IMAGE GENERATION MODELS + # ========================================================================== + image_models: + - repo_id: black-forest-labs/FLUX.1-schnell + description: FLUX.1 Schnell - Fast 4-step inference + size_gb: 23 + essential: true + category: image + format: fp16 + vram_gb: 23 + notes: Industry-leading image generation quality + + - repo_id: black-forest-labs/FLUX.1-dev + description: FLUX.1 Dev - Balanced quality/speed + size_gb: 23 + essential: false + category: image + format: fp16 + vram_gb: 23 + notes: Development version with enhanced features + + - repo_id: stabilityai/stable-diffusion-xl-base-1.0 + description: SDXL Base 1.0 - Industry standard + size_gb: 7 + essential: true + category: image + format: fp16 + vram_gb: 12 + notes: Most widely used Stable Diffusion model + + - repo_id: stabilityai/stable-diffusion-xl-refiner-1.0 + description: SDXL Refiner 1.0 - Enhances base output + size_gb: 6 + essential: false + category: image + format: fp16 + vram_gb: 12 + notes: Use after SDXL base for improved details + + - repo_id: stabilityai/stable-diffusion-3.5-large + description: SD 3.5 Large - Latest Stability AI + size_gb: 18 + essential: false + category: image + format: fp16 + vram_gb: 20 + notes: Newest generation Stable Diffusion + + # ========================================================================== + # VIDEO GENERATION MODELS + # ========================================================================== + video_models: + - repo_id: THUDM/CogVideoX-5b + description: CogVideoX-5B - Professional text-to-video + size_gb: 20 + essential: true + category: video + format: fp16 + vram_gb: 20 + frames: 49 + resolution: 720p + notes: State-of-the-art text-to-video generation + + - repo_id: stabilityai/stable-video-diffusion-img2vid + description: SVD - 14 frame image-to-video + size_gb: 8 + essential: true + category: video + format: fp16 + vram_gb: 20 + frames: 14 + resolution: 576x1024 + notes: Convert images to short video clips + + - repo_id: stabilityai/stable-video-diffusion-img2vid-xt + description: SVD-XT - 25 frame image-to-video + size_gb: 8 + essential: false + category: video + format: fp16 + vram_gb: 20 + frames: 25 + resolution: 576x1024 + notes: Extended frame count version + + # ========================================================================== + # AUDIO GENERATION MODELS + # ========================================================================== + audio_models: + - repo_id: facebook/musicgen-small + description: MusicGen Small - Fast generation + size_gb: 3 + essential: false + category: audio + format: fp32 + vram_gb: 4 + duration_seconds: 30 + notes: Fastest music generation, lower quality + + - repo_id: facebook/musicgen-medium + description: MusicGen Medium - Balanced quality + size_gb: 11 + essential: true + category: audio + format: fp32 + vram_gb: 8 + duration_seconds: 30 + notes: Best balance of speed and quality + + - repo_id: facebook/musicgen-large + description: MusicGen Large - Highest quality + size_gb: 22 + essential: false + category: audio + format: fp32 + vram_gb: 16 + duration_seconds: 30 + notes: Best quality, slower generation + + # ========================================================================== + # SUPPORT MODELS (CLIP, IP-Adapter, etc.) + # ========================================================================== + support_models: + - repo_id: openai/clip-vit-large-patch14 + description: CLIP H - For SD 1.5 IP-Adapter + size_gb: 2 + essential: true + category: support + format: fp32 + vram_gb: 2 + notes: Text-image understanding model + + - repo_id: laion/CLIP-ViT-bigG-14-laion2B-39B-b160k + description: CLIP G - For SDXL IP-Adapter + size_gb: 7 + essential: true + category: support + format: fp32 + vram_gb: 4 + notes: Larger CLIP model for SDXL + + - repo_id: google/siglip-so400m-patch14-384 + description: SigLIP - For FLUX models + size_gb: 2 + essential: true + category: support + format: fp32 + vram_gb: 2 + notes: Advanced image-text alignment + +# ============================================================================ +# STORAGE & VRAM SUMMARIES +# ============================================================================ + +storage_requirements: + essential_only: + image: 30 # FLUX Schnell + SDXL Base + video: 28 # CogVideoX + SVD + audio: 11 # MusicGen Medium + support: 11 # All 3 CLIP models + total: 80 # Total essential storage + + all_models: + image: 54 # All image models + video: 36 # All video models + audio: 36 # All audio models + support: 11 # All support models + total: 137 # Total with optional models + +vram_requirements: + # For 24GB GPU (RTX 4090) + simultaneous_loadable: + - name: Image Focus - FLUX FP16 + models: [FLUX.1 Schnell] + vram_used: 23 + remaining: 1 + + - name: Image Focus - FLUX FP8 + SDXL + models: [FLUX.1 Schnell FP8, SDXL Base] + vram_used: 24 + remaining: 0 + + - name: Video Generation + models: [CogVideoX-5B optimized, SDXL] + vram_used: 24 + remaining: 0 + + - name: Multi-Modal + models: [SDXL, MusicGen Medium] + vram_used: 20 + remaining: 4 + +# ============================================================================ +# INSTALLATION PROFILES +# ============================================================================ + +installation_profiles: + minimal: + description: Minimal setup for testing + categories: [support_models] + storage_gb: 11 + estimated_time: 5-10 minutes + + essential: + description: Essential models only (~80GB) + categories: [image_models, video_models, audio_models, support_models] + essential_only: true + storage_gb: 80 + estimated_time: 1-2 hours + + image_focused: + description: All image generation models + categories: [image_models, support_models] + storage_gb: 65 + estimated_time: 45-90 minutes + + video_focused: + description: All video generation models + categories: [video_models, image_models, support_models] + essential_only: true + storage_gb: 69 + estimated_time: 1-2 hours + + complete: + description: All models (including optional) + categories: [image_models, video_models, audio_models, support_models] + storage_gb: 137 + estimated_time: 2-4 hours + +# ============================================================================ +# METADATA +# ============================================================================ + +metadata: + version: 1.0.0 + last_updated: 2025-11-21 + compatible_with: + - ComfyUI >= 0.1.0 + - Python >= 3.10 + - HuggingFace Hub >= 0.20.0 + maintainer: Valknar + repository: https://github.com/yourusername/runpod