Files

Sebastian Krüger 0fa69cae28 refactor: rename docker-compose.gpu.yaml to compose.yaml

Simplified compose file naming to follow Docker Compose best practices:
- Renamed docker-compose.gpu.yaml to compose.yaml
- Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md)
- Updated references in scripts (prepare-template.sh)

This change enables simpler command syntax:
- Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator
- After: docker compose up -d orchestrator

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-21 14:49:42 +01:00

9.8 KiB

Raw Blame History

RunPod Template Creation Guide

This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.

Why Create a Template?

Without Template (Manual Setup Every Time):

❌ Install Docker & Docker Compose (10-15 min)
❌ Install Tailscale (5 min)
❌ Pull Docker images (10-20 min)
❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
❌ Configure everything (5-10 min)
Total: 60-90 minutes per Spot instance restart

With Template (Ready to Go):

✅ Everything pre-installed
✅ Models cached in /workspace
✅ Just start orchestrator
Total: 2-3 minutes

Template Contents

System Software

✅ Docker 24.x + Docker Compose v2
✅ Tailscale latest
✅ NVIDIA Docker runtime
✅ Python 3.11
✅ Git, curl, wget, htop, nvtop

Docker Images (Pre-built)

✅ ai_orchestrator - Model orchestration service
✅ ai_vllm-qwen_1 - Text generation (vLLM + Qwen 2.5 7B)
✅ ai_musicgen_1 - Music generation (AudioCraft)
✅ ghcr.io/matatonic/openedai-images-flux:latest - Image generation

Model Cache (/workspace - Persistent)

✅ Qwen 2.5 7B Instruct (~14GB)
✅ Flux.1 Schnell (~12GB)
✅ MusicGen Medium (~11GB)
Total: ~37GB cached

Project Files (/workspace/ai)

✅ All orchestrator code
✅ Docker Compose configurations
✅ Model service configurations
✅ Documentation

Step-by-Step Template Creation

Prerequisites

RunPod account
Active RTX 4090 pod (or similar GPU)
SSH access to the pod
This repository cloned locally

Step 1: Deploy Fresh Pod

# Create new RunPod instance:
# - GPU: RTX 4090 (24GB VRAM)
# - Disk: 50GB container disk
# - Network Volume: Attach or create 100GB+ volume
# - Template: Start with official PyTorch or CUDA template

# Note the SSH connection details (host, port, password)

Step 2: Prepare the Instance

Run the automated preparation script:

# On your local machine, copy everything to RunPod
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/

# SSH to the pod
ssh -p <PORT> root@<HOST>

# Run the preparation script
cd /workspace/ai
chmod +x scripts/prepare-template.sh
./scripts/prepare-template.sh

What the script does:

Installs Docker & Docker Compose
Installs Tailscale
Builds all Docker images
Pre-downloads all models
Validates everything works
Cleans up temporary files

Estimated time: 45-60 minutes

Step 3: Manual Verification

After the script completes, verify everything:

# Check Docker is installed
docker --version
docker compose version

# Check Tailscale
tailscale version

# Check all images are built
docker images | grep ai_

# Check models are cached
ls -lh /workspace/huggingface_cache/
ls -lh /workspace/flux/models/
ls -lh /workspace/musicgen/models/

# Test orchestrator starts
cd /workspace/ai
docker compose -f compose.yaml up -d orchestrator
docker logs ai_orchestrator

# Test model loading (should be fast since models are cached)
curl http://localhost:9000/health

# Stop orchestrator
docker compose -f compose.yaml down

Step 4: Clean Up Before Saving

IMPORTANT: Remove secrets and temporary data before creating template!

# Remove sensitive data
rm -f /workspace/ai/.env
rm -f /root/.ssh/known_hosts
rm -f /root/.bash_history

# Clear logs
rm -f /var/log/*.log
docker system prune -af --volumes  # Clean Docker cache but keep images

# Clear Tailscale state (will re-authenticate on first use)
tailscale logout

# Create template-ready marker
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION

Step 5: Save Template in RunPod Dashboard

Go to RunPod Dashboard → "My Pods"
Select your prepared pod
Click "⋮" menu → "Save as Template"

Template Configuration:

Name: multi-modal-ai-v1.0

Description:

Multi-Modal AI Stack with Orchestrator
- Text: vLLM + Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Models pre-cached (~37GB)
- Ready to deploy in 2-3 minutes

Category: AI/ML
Docker Image: (auto-detected)
Container Disk: 50GB
Expose Ports: 9000, 8001, 8002, 8003

Environment Variables (optional):

HF_TOKEN=<leave empty, user will add>
TAILSCALE_AUTHKEY=<leave empty, user will add>

Click "Save Template"
Wait for template creation (5-10 minutes)
Test the template by deploying a new pod with it

Using Your Template

Deploy New Pod from Template

RunPod Dashboard → "➕ Deploy"
Select "Community Templates" or "My Templates"
Choose: multi-modal-ai-v1.0
Configure:
- GPU: RTX 4090 (or compatible)
- Network Volume: Attach your existing volume with /workspace mount
- Environment:
  - HF_TOKEN: Your Hugging Face token
  - (Tailscale will be configured via SSH)
Deploy Pod

First-Time Setup (On New Pod)

# SSH to the new pod
ssh -p <PORT> root@<HOST>

# Navigate to project
cd /workspace/ai

# Create .env file
cat > .env <<EOF
HF_TOKEN=hf_your_token_here
GPU_TAILSCALE_IP=100.100.108.13
EOF

# Configure Tailscale (one-time)
tailscale up --authkey=<YOUR_TAILSCALE_KEY>

# Start orchestrator (models already cached, starts in seconds!)
docker compose -f compose.yaml up -d orchestrator

# Verify
curl http://localhost:9000/health

# Check logs
docker logs -f ai_orchestrator

Total setup time: 2-3 minutes! 🎉

Updating SSH Config (If Spot Instance Restarts)

Since Spot instances can restart with new IPs/ports:

# On your local machine
# Update ~/.ssh/config with new connection details

Host gpu-pivoine
    HostName <NEW_IP>
    Port <NEW_PORT>
    User root
    IdentityFile ~/.ssh/id_ed25519

Template Maintenance

Updating the Template

When you add new models or make improvements:

Deploy a pod from your existing template
Make your changes
Test everything
Clean up (remove secrets)
Save as new template version: multi-modal-ai-v1.1
Update your documentation

Version History

Keep track of template versions:

v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator

v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading

Troubleshooting Template Creation

Models Not Downloading

# Manually trigger model downloads
docker compose --profile text up -d vllm-qwen
docker logs -f ai_vllm-qwen_1
# Wait for "Model loaded successfully"
docker compose stop vllm-qwen

# Repeat for other models
docker compose --profile image up -d flux
docker compose --profile audio up -d musicgen

Docker Images Not Building

# Build images one at a time
docker compose -f compose.yaml build orchestrator
docker compose -f compose.yaml build vllm-qwen
docker compose -f compose.yaml build musicgen

# Check build logs for errors
docker compose -f compose.yaml build --no-cache --progress=plain orchestrator

Tailscale Won't Install

# Manual Tailscale installation
curl -fsSL https://tailscale.com/install.sh | sh

# Start daemon
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &

# Test
tailscale version

Template Too Large

RunPod templates have size limits. If your template is too large:

Option 1: Use network volume for models

Move models to network volume: /workspace/models/
Mount volume when deploying from template
Models persist across pod restarts

Option 2: Reduce cached models

Only cache most-used model (Qwen 2.5 7B)
Download others on first use
Accept slightly longer first-time startup

Option 3: Use Docker layer optimization

# In Dockerfile, order commands by change frequency
# Less frequently changed layers first

Cost Analysis

Template Storage Cost

RunPod charges for template storage: ~$0.10/GB/month
This template: 50GB = **$5/month**
Worth it! Saves 60-90 minutes per Spot restart

Time Savings

Spot instance restarts: 2-5 times per week (highly variable)
Time saved per restart: 60-90 minutes
Total saved per month: 8-20 hours
Value: Priceless for rapid deployment

Advanced: Automated Template Updates

Create a CI/CD pipeline to automatically update templates:

# GitHub Actions workflow (future enhancement)
# 1. Deploy pod from template
# 2. Pull latest code
# 3. Rebuild images
# 4. Test
# 5. Save new template version
# 6. Notify team

Template Checklist

Before saving your template, verify:

All Docker images built and working
All models downloaded and cached
Tailscale installed (but logged out)
Docker Compose files present
.env file removed (secrets cleared)
Logs cleared
SSH keys removed
Bash history cleared
Template version documented
Test deployment successful

Support

If you have issues creating the template:

Check /workspace/ai/scripts/prepare-template.sh logs
Review Docker build logs: docker compose build --progress=plain
Check model download logs: docker logs <container>
Verify disk space: df -h
Check network volume is mounted: mount | grep workspace

For RunPod-specific issues:

RunPod Docs: https://docs.runpod.io/
RunPod Discord: https://discord.gg/runpod

Next Steps

After creating your template:

✅ Test deployment from template
✅ Document in GPU_DEPLOYMENT_LOG.md
✅ Share template ID with team (if applicable)
✅ Set up monitoring (Netdata, etc.)
✅ Configure auto-stop for cost optimization
✅ Add more models as needed

Your multi-modal AI infrastructure is now portable and reproducible! 🚀

9.8 KiB Raw Blame History Unescape Escape