Files
runpod/docs/RUNPOD_TEMPLATE.md
Sebastian Krüger 0fa69cae28 refactor: rename docker-compose.gpu.yaml to compose.yaml
Simplified compose file naming to follow Docker Compose best practices:
- Renamed docker-compose.gpu.yaml to compose.yaml
- Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md)
- Updated references in scripts (prepare-template.sh)

This change enables simpler command syntax:
- Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator
- After: docker compose up -d orchestrator

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:49:42 +01:00

9.8 KiB
Raw Blame History

RunPod Template Creation Guide

This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.

Why Create a Template?

Without Template (Manual Setup Every Time):

  • Install Docker & Docker Compose (10-15 min)
  • Install Tailscale (5 min)
  • Pull Docker images (10-20 min)
  • Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
  • Configure everything (5-10 min)
  • Total: 60-90 minutes per Spot instance restart

With Template (Ready to Go):

  • Everything pre-installed
  • Models cached in /workspace
  • Just start orchestrator
  • Total: 2-3 minutes

Template Contents

System Software

  • Docker 24.x + Docker Compose v2
  • Tailscale latest
  • NVIDIA Docker runtime
  • Python 3.11
  • Git, curl, wget, htop, nvtop

Docker Images (Pre-built)

  • ai_orchestrator - Model orchestration service
  • ai_vllm-qwen_1 - Text generation (vLLM + Qwen 2.5 7B)
  • ai_musicgen_1 - Music generation (AudioCraft)
  • ghcr.io/matatonic/openedai-images-flux:latest - Image generation

Model Cache (/workspace - Persistent)

  • Qwen 2.5 7B Instruct (~14GB)
  • Flux.1 Schnell (~12GB)
  • MusicGen Medium (~11GB)
  • Total: ~37GB cached

Project Files (/workspace/ai)

  • All orchestrator code
  • Docker Compose configurations
  • Model service configurations
  • Documentation

Step-by-Step Template Creation

Prerequisites

  1. RunPod account
  2. Active RTX 4090 pod (or similar GPU)
  3. SSH access to the pod
  4. This repository cloned locally

Step 1: Deploy Fresh Pod

# Create new RunPod instance:
# - GPU: RTX 4090 (24GB VRAM)
# - Disk: 50GB container disk
# - Network Volume: Attach or create 100GB+ volume
# - Template: Start with official PyTorch or CUDA template

# Note the SSH connection details (host, port, password)

Step 2: Prepare the Instance

Run the automated preparation script:

# On your local machine, copy everything to RunPod
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/

# SSH to the pod
ssh -p <PORT> root@<HOST>

# Run the preparation script
cd /workspace/ai
chmod +x scripts/prepare-template.sh
./scripts/prepare-template.sh

What the script does:

  1. Installs Docker & Docker Compose
  2. Installs Tailscale
  3. Builds all Docker images
  4. Pre-downloads all models
  5. Validates everything works
  6. Cleans up temporary files

Estimated time: 45-60 minutes

Step 3: Manual Verification

After the script completes, verify everything:

# Check Docker is installed
docker --version
docker compose version

# Check Tailscale
tailscale version

# Check all images are built
docker images | grep ai_

# Check models are cached
ls -lh /workspace/huggingface_cache/
ls -lh /workspace/flux/models/
ls -lh /workspace/musicgen/models/

# Test orchestrator starts
cd /workspace/ai
docker compose -f compose.yaml up -d orchestrator
docker logs ai_orchestrator

# Test model loading (should be fast since models are cached)
curl http://localhost:9000/health

# Stop orchestrator
docker compose -f compose.yaml down

Step 4: Clean Up Before Saving

IMPORTANT: Remove secrets and temporary data before creating template!

# Remove sensitive data
rm -f /workspace/ai/.env
rm -f /root/.ssh/known_hosts
rm -f /root/.bash_history

# Clear logs
rm -f /var/log/*.log
docker system prune -af --volumes  # Clean Docker cache but keep images

# Clear Tailscale state (will re-authenticate on first use)
tailscale logout

# Create template-ready marker
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION

Step 5: Save Template in RunPod Dashboard

  1. Go to RunPod Dashboard → "My Pods"

  2. Select your prepared pod

  3. Click "⋮" menu → "Save as Template"

  4. Template Configuration:

    • Name: multi-modal-ai-v1.0
    • Description:
      Multi-Modal AI Stack with Orchestrator
      - Text: vLLM + Qwen 2.5 7B
      - Image: Flux.1 Schnell
      - Music: MusicGen Medium
      - Models pre-cached (~37GB)
      - Ready to deploy in 2-3 minutes
      
    • Category: AI/ML
    • Docker Image: (auto-detected)
    • Container Disk: 50GB
    • Expose Ports: 9000, 8001, 8002, 8003
    • Environment Variables (optional):
      HF_TOKEN=<leave empty, user will add>
      TAILSCALE_AUTHKEY=<leave empty, user will add>
      
  5. Click "Save Template"

  6. Wait for template creation (5-10 minutes)

  7. Test the template by deploying a new pod with it


Using Your Template

Deploy New Pod from Template

  1. RunPod Dashboard → " Deploy"

  2. Select "Community Templates" or "My Templates"

  3. Choose: multi-modal-ai-v1.0

  4. Configure:

    • GPU: RTX 4090 (or compatible)
    • Network Volume: Attach your existing volume with /workspace mount
    • Environment:
      • HF_TOKEN: Your Hugging Face token
      • (Tailscale will be configured via SSH)
  5. Deploy Pod

First-Time Setup (On New Pod)

# SSH to the new pod
ssh -p <PORT> root@<HOST>

# Navigate to project
cd /workspace/ai

# Create .env file
cat > .env <<EOF
HF_TOKEN=hf_your_token_here
GPU_TAILSCALE_IP=100.100.108.13
EOF

# Configure Tailscale (one-time)
tailscale up --authkey=<YOUR_TAILSCALE_KEY>

# Start orchestrator (models already cached, starts in seconds!)
docker compose -f compose.yaml up -d orchestrator

# Verify
curl http://localhost:9000/health

# Check logs
docker logs -f ai_orchestrator

Total setup time: 2-3 minutes! 🎉

Updating SSH Config (If Spot Instance Restarts)

Since Spot instances can restart with new IPs/ports:

# On your local machine
# Update ~/.ssh/config with new connection details

Host gpu-pivoine
    HostName <NEW_IP>
    Port <NEW_PORT>
    User root
    IdentityFile ~/.ssh/id_ed25519

Template Maintenance

Updating the Template

When you add new models or make improvements:

  1. Deploy a pod from your existing template
  2. Make your changes
  3. Test everything
  4. Clean up (remove secrets)
  5. Save as new template version: multi-modal-ai-v1.1
  6. Update your documentation

Version History

Keep track of template versions:

v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator

v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading

Troubleshooting Template Creation

Models Not Downloading

# Manually trigger model downloads
docker compose --profile text up -d vllm-qwen
docker logs -f ai_vllm-qwen_1
# Wait for "Model loaded successfully"
docker compose stop vllm-qwen

# Repeat for other models
docker compose --profile image up -d flux
docker compose --profile audio up -d musicgen

Docker Images Not Building

# Build images one at a time
docker compose -f compose.yaml build orchestrator
docker compose -f compose.yaml build vllm-qwen
docker compose -f compose.yaml build musicgen

# Check build logs for errors
docker compose -f compose.yaml build --no-cache --progress=plain orchestrator

Tailscale Won't Install

# Manual Tailscale installation
curl -fsSL https://tailscale.com/install.sh | sh

# Start daemon
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &

# Test
tailscale version

Template Too Large

RunPod templates have size limits. If your template is too large:

Option 1: Use network volume for models

  • Move models to network volume: /workspace/models/
  • Mount volume when deploying from template
  • Models persist across pod restarts

Option 2: Reduce cached models

  • Only cache most-used model (Qwen 2.5 7B)
  • Download others on first use
  • Accept slightly longer first-time startup

Option 3: Use Docker layer optimization

# In Dockerfile, order commands by change frequency
# Less frequently changed layers first

Cost Analysis

Template Storage Cost

  • RunPod charges for template storage: ~$0.10/GB/month
  • This template: 50GB = **$5/month**
  • Worth it! Saves 60-90 minutes per Spot restart

Time Savings

  • Spot instance restarts: 2-5 times per week (highly variable)
  • Time saved per restart: 60-90 minutes
  • Total saved per month: 8-20 hours
  • Value: Priceless for rapid deployment

Advanced: Automated Template Updates

Create a CI/CD pipeline to automatically update templates:

# GitHub Actions workflow (future enhancement)
# 1. Deploy pod from template
# 2. Pull latest code
# 3. Rebuild images
# 4. Test
# 5. Save new template version
# 6. Notify team

Template Checklist

Before saving your template, verify:

  • All Docker images built and working
  • All models downloaded and cached
  • Tailscale installed (but logged out)
  • Docker Compose files present
  • .env file removed (secrets cleared)
  • Logs cleared
  • SSH keys removed
  • Bash history cleared
  • Template version documented
  • Test deployment successful

Support

If you have issues creating the template:

  1. Check /workspace/ai/scripts/prepare-template.sh logs
  2. Review Docker build logs: docker compose build --progress=plain
  3. Check model download logs: docker logs <container>
  4. Verify disk space: df -h
  5. Check network volume is mounted: mount | grep workspace

For RunPod-specific issues:


Next Steps

After creating your template:

  1. Test deployment from template
  2. Document in GPU_DEPLOYMENT_LOG.md
  3. Share template ID with team (if applicable)
  4. Set up monitoring (Netdata, etc.)
  5. Configure auto-stop for cost optimization
  6. Add more models as needed

Your multi-modal AI infrastructure is now portable and reproducible! 🚀