Files
runpod/RUNPOD_TEMPLATE.md
Sebastian Krüger 9439185b3d
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 2m8s
fix: update Docker registry from Docker Hub to dev.pivoine.art
- Use Gitea container registry instead of Docker Hub
- Update workflow to use gitea.actor and REGISTRY_TOKEN
- Update documentation to reflect correct registry URL
- Match supervisor-ui workflow configuration
2025-11-23 21:57:14 +01:00

12 KiB

RunPod Template Setup Guide

This guide explains how to deploy the AI Orchestrator (ComfyUI + vLLM) on RunPod using a custom Docker template and network volume.

Architecture Overview

The deployment uses a two-tier strategy:

  1. Docker Image (software layer) - Contains system packages, Supervisor, Tailscale
  2. Network Volume (data layer) - Contains models, ComfyUI installation, venvs, configuration

This approach allows fast pod deployment (~2-3 minutes) while keeping all large files (models, ~80-200GB) on a persistent network volume.

Prerequisites

  • RunPod account with credits
  • Docker Hub account (for hosting the template image)
  • HuggingFace account with API token (for model downloads)
  • Tailscale account with auth key (optional, for VPN access)

Step 1: Build and Push Docker Image

The repository includes a Gitea workflow that automatically builds and pushes the Docker image to your Gitea container registry when you push to the main branch or create a version tag.

  1. Configure Gitea Secret:

    • Go to your Gitea repository → Settings → Secrets
    • Add REGISTRY_TOKEN = your Gitea access token with registry permissions
    • (The workflow automatically uses your Gitea username via gitea.actor)
  2. Trigger Build:

    # Push to main branch
    git push origin main
    
    # Or create a version tag
    git tag v1.0.0
    git push origin v1.0.0
    
  3. Monitor Build:

    • Go to Actions tab in Gitea
    • Wait for build to complete (~5-10 minutes)
    • Note the Docker image name: dev.pivoine.art/valknar/runpod-ai-orchestrator:latest

Option B: Manual Build

If you prefer to build manually:

# From the repository root
cd /path/to/runpod

# Build the image
docker build -t dev.pivoine.art/valknar/runpod-ai-orchestrator:latest .

# Login to your Gitea registry
docker login dev.pivoine.art

# Push to Gitea registry
docker push dev.pivoine.art/valknar/runpod-ai-orchestrator:latest

Step 2: Create Network Volume

Network volumes persist your models and data across pod restarts and rebuilds.

  1. Go to RunPod Dashboard → Storage → Network Volumes

  2. Click "New Network Volume"

  3. Configure:

    • Name: ai-orchestrator-models
    • Size: 200GB (adjust based on your needs)
      • Essential models only: ~80GB
      • All models: ~137-200GB
    • Datacenter: Choose closest to you (volume tied to datacenter)
  4. Click "Create Volume"

  5. Note the Volume ID (e.g., vol-abc123def456) for pod deployment

Storage Requirements

Configuration Size Models Included
Essential ~80GB FLUX Schnell, 1-2 SDXL checkpoints, MusicGen Medium
Complete ~137GB All image/video/audio models from playbook
Full + vLLM ~200GB Complete + Qwen 2.5 7B + Llama 3.1 8B

Step 3: Create RunPod Template

  1. Go to RunPod Dashboard → Templates

  2. Click "New Template"

  3. Configure Template Settings:

    Container Configuration:

    • Template Name: AI Orchestrator (ComfyUI + vLLM)
    • Template Type: Docker
    • Container Image: dev.pivoine.art/valknar/runpod-ai-orchestrator:latest
    • Container Disk: 50GB (for system and temp files)
    • Docker Command: Leave empty (uses default /start.sh)

    Volume Configuration:

    • Volume Mount Path: /workspace
    • Attach to Network Volume: Select your volume ID from Step 2

    Port Configuration:

    • Expose HTTP Ports: 8188, 9000, 9001
      • 8188 - ComfyUI web interface
      • 9000 - Model orchestrator API
      • 9001 - Supervisor web UI
    • Expose TCP Ports: 22 (SSH access)

    Environment Variables:

    HF_TOKEN=your_huggingface_token_here
    TAILSCALE_AUTHKEY=tskey-auth-your_tailscale_authkey_here
    SUPERVISOR_BACKEND_HOST=localhost
    SUPERVISOR_BACKEND_PORT=9001
    

    Advanced Settings:

    • Start Jupyter: No
    • Start SSH: Yes (handled by base image)
  4. Click "Save Template"

Step 4: First Deployment (Initial Setup)

The first time you deploy, you need to set up the network volume with models and configuration.

4.1 Deploy Pod

  1. Go to RunPod Dashboard → Pods
  2. Click "Deploy" or "GPU Pods"
  3. Select your custom template: AI Orchestrator (ComfyUI + vLLM)
  4. Configure GPU:
    • GPU Type: RTX 4090 (24GB VRAM) or higher
    • Network Volume: Select your volume from Step 2
    • On-Demand vs Spot: Choose based on budget
  5. Click "Deploy"

4.2 SSH into Pod

# Get pod SSH command from RunPod dashboard
ssh root@<pod-ip> -p <port> -i ~/.ssh/id_ed25519

# Or use RunPod web terminal

4.3 Initial Setup on Network Volume

# 1. Clone the repository to /workspace/ai
cd /workspace
git clone https://github.com/your-username/runpod.git ai
cd ai

# 2. Create .env file with your credentials
cp .env.example .env
nano .env

# Edit and add:
# HF_TOKEN=your_huggingface_token
# TAILSCALE_AUTHKEY=tskey-auth-your_key
# GPU_TAILSCALE_IP=<will be set automatically>

# 3. Download essential models (this takes 30-60 minutes)
ansible-playbook playbook.yml --tags comfyui-essential

# OR download all models (1-2 hours)
ansible-playbook playbook.yml --tags comfyui-models-all

# 4. Link models to ComfyUI
bash scripts/link-comfyui-models.sh

# OR if arty is available
arty run models/link-comfyui

# 5. Install ComfyUI custom nodes dependencies
cd /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
pip install -r requirements.txt
cd /workspace/ai

# 6. Restart the container to apply all changes
exit
# Go to RunPod dashboard → Stop pod → Start pod

4.4 Verify Services

After restart, SSH back in and check:

# Check supervisor status
supervisorctl -c /workspace/supervisord.conf status

# Expected output:
# comfyui                          RUNNING   pid 123, uptime 0:01:00
# (orchestrator is disabled by default - enable for vLLM)

# Test ComfyUI
curl -I http://localhost:8188

# Test Supervisor web UI
curl -I http://localhost:9001

Step 5: Subsequent Deployments

After initial setup, deploying new pods is quick (2-3 minutes):

  1. Deploy pod with same template + network volume
  2. Wait for startup (~1-2 minutes for services to start)
  3. Access services:
    • ComfyUI: http://<pod-ip>:8188
    • Supervisor: http://<pod-ip>:9001

All models, configuration, and data persist on the network volume!

Step 6: Access Services

Via Direct IP (HTTP)

Get pod IP and ports from RunPod dashboard:

ComfyUI:           http://<pod-ip>:8188
Supervisor UI:     http://<pod-ip>:9001
Orchestrator API:  http://<pod-ip>:9000
SSH:               ssh root@<pod-ip> -p <port>

If you configured TAILSCALE_AUTHKEY, the pod automatically joins your Tailscale network:

  1. Get Tailscale IP:

    ssh root@<pod-ip> -p <port>
    tailscale ip -4
    # Example output: 100.114.60.40
    
  2. Access via Tailscale:

    ComfyUI:      http://<tailscale-ip>:8188
    Supervisor:   http://<tailscale-ip>:9001
    Orchestrator: http://<tailscale-ip>:9000
    SSH:          ssh root@<tailscale-ip>
    
  3. Update LiteLLM config on your VPS with the Tailscale IP

Service Management

Start/Stop Services

# Start all services
supervisorctl -c /workspace/supervisord.conf start all

# Stop all services
supervisorctl -c /workspace/supervisord.conf stop all

# Restart specific service
supervisorctl -c /workspace/supervisord.conf restart comfyui

# View status
supervisorctl -c /workspace/supervisord.conf status

Enable vLLM Models (Text Generation)

By default, only ComfyUI runs (to save VRAM). To enable vLLM:

  1. Stop ComfyUI (frees up VRAM):

    supervisorctl -c /workspace/supervisord.conf stop comfyui
    
  2. Start orchestrator (manages vLLM models):

    supervisorctl -c /workspace/supervisord.conf start orchestrator
    
  3. Test text generation:

    curl -X POST http://localhost:9000/v1/chat/completions \
      -H 'Content-Type: application/json' \
      -d '{"model":"qwen-2.5-7b","messages":[{"role":"user","content":"Hello"}]}'
    

Switch Back to ComfyUI

# Stop orchestrator (stops all vLLM models)
supervisorctl -c /workspace/supervisord.conf stop orchestrator

# Start ComfyUI
supervisorctl -c /workspace/supervisord.conf start comfyui

Updating the Template

When you make changes to code or configuration:

Update Docker Image

# 1. Make changes to Dockerfile or start.sh
# 2. Push to repository
git add .
git commit -m "Update template configuration"
git push origin main

# 3. Gitea workflow auto-builds new image

# 4. Terminate old pod and deploy new one with updated image

Update Network Volume Data

# SSH into running pod
ssh root@<pod-ip> -p <port>

# Update repository
cd /workspace/ai
git pull

# Re-run Ansible if needed
ansible-playbook playbook.yml --tags <specific-tag>

# Restart services
supervisorctl -c /workspace/supervisord.conf restart all

Troubleshooting

Pod fails to start

Check logs:

# Via SSH
cat /workspace/logs/supervisord.log
cat /workspace/logs/comfyui.err.log

# Via RunPod web terminal
tail -f /workspace/logs/*.log

Common issues:

  • Missing .env file → Create /workspace/ai/.env with required vars
  • Supervisor config not found → Ensure /workspace/ai/supervisord.conf exists
  • Port conflicts → Check if services are already running

Tailscale not connecting

Check Tailscale status:

tailscale status
tailscale ip -4

Common issues:

  • Missing or invalid TAILSCALE_AUTHKEY in .env
  • Auth key expired → Generate new key in Tailscale admin
  • Firewall blocking → RunPod should allow Tailscale by default

Services not starting

Check Supervisor:

supervisorctl -c /workspace/supervisord.conf status
supervisorctl -c /workspace/supervisord.conf tail -f comfyui

Common issues:

  • venv broken → Re-run scripts/bootstrap-venvs.sh
  • Models not downloaded → Run Ansible playbook again
  • Python version mismatch → Rebuild venvs

Out of VRAM

Check GPU memory:

nvidia-smi

RTX 4090 (24GB) capacity:

  • ComfyUI (FLUX Schnell): ~23GB (can't run with vLLM)
  • vLLM (Qwen 2.5 7B): ~14GB
  • vLLM (Llama 3.1 8B): ~17GB

Solution: Only run one service at a time (see Service Management section)

Network volume full

Check disk usage:

df -h /workspace
du -sh /workspace/*

Clean up:

# Remove old HuggingFace cache
rm -rf /workspace/huggingface_cache

# Re-download essential models only
cd /workspace/ai
ansible-playbook playbook.yml --tags comfyui-essential

Cost Optimization

Spot vs On-Demand

  • Spot instances: ~70% cheaper, can be interrupted
  • On-Demand: More expensive, guaranteed availability

Recommendation: Use spot for development, on-demand for production

Network Volume Pricing

  • First 1TB: $0.07/GB/month
  • Beyond 1TB: $0.05/GB/month

200GB volume cost: ~$14/month

Pod Auto-Stop

Configure auto-stop in RunPod pod settings to save costs when idle:

  • Stop after 15 minutes idle
  • Stop after 1 hour idle
  • Manual stop only

Advanced Configuration

Custom Environment Variables

Add to template or pod environment variables:

# Model cache locations
HF_HOME=/workspace/huggingface_cache
TRANSFORMERS_CACHE=/workspace/huggingface_cache

# ComfyUI settings
COMFYUI_PORT=8188
COMFYUI_LISTEN=0.0.0.0

# Orchestrator settings
ORCHESTRATOR_PORT=9000

# GPU settings
CUDA_VISIBLE_DEVICES=0

Multiple Network Volumes

You can attach multiple network volumes for organization:

  1. Models volume - /workspace/models (read-only, shared)
  2. Data volume - /workspace/data (read-write, per-project)

Custom Startup Script

Override /start.sh behavior by creating /workspace/custom-start.sh:

#!/bin/bash
# Custom startup commands

# Source default startup
source /start.sh

# Add your custom commands here
echo "Running custom initialization..."

References

Support

For issues or questions:

  • Check troubleshooting section above
  • Review /workspace/logs/ files
  • Check RunPod community forums
  • Open issue in project repository