runpod/docs/RUNPOD_TEMPLATE.md

# RunPod Template Creation Guide

This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.

## Why Create a Template?

**Without Template** (Manual Setup Every Time):
- ❌ Install Docker & Docker Compose (10-15 min)
- ❌ Install Tailscale (5 min)
- ❌ Pull Docker images (10-20 min)
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
- ❌ Configure everything (5-10 min)
- **Total: 60-90 minutes per Spot instance restart**

**With Template** (Ready to Go):
- ✅ Everything pre-installed
- ✅ Models cached in `/workspace`
- ✅ Just start orchestrator
- **Total: 2-3 minutes**

## Template Contents

### System Software
- ✅ Docker 24.x + Docker Compose v2
- ✅ Tailscale latest
- ✅ NVIDIA Docker runtime
- ✅ Python 3.11
- ✅ Git, curl, wget, htop, nvtop

### Docker Images (Pre-built)
- ✅ `ai_orchestrator` - Model orchestration service
- ✅ `ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B)
- ✅ `ai_musicgen_1` - Music generation (AudioCraft)
- ✅ `ghcr.io/matatonic/openedai-images-flux:latest` - Image generation

### Model Cache (/workspace - Persistent)
- ✅ Qwen 2.5 7B Instruct (~14GB)
- ✅ Flux.1 Schnell (~12GB)
- ✅ MusicGen Medium (~11GB)
- **Total: ~37GB cached**

### Project Files (/workspace/ai)
- ✅ All orchestrator code
- ✅ Docker Compose configurations
- ✅ Model service configurations
- ✅ Documentation

---

## Step-by-Step Template Creation

### Prerequisites
1. RunPod account
2. Active RTX 4090 pod (or similar GPU)
3. SSH access to the pod
4. This repository cloned locally

### Step 1: Deploy Fresh Pod

```bash
# Create new RunPod instance:
# - GPU: RTX 4090 (24GB VRAM)
# - Disk: 50GB container disk
# - Network Volume: Attach or create 100GB+ volume
# - Template: Start with official PyTorch or CUDA template

# Note the SSH connection details (host, port, password)
```

### Step 2: Prepare the Instance

Run the automated preparation script:

```bash
# On your local machine, copy everything to RunPod
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/

# SSH to the pod
ssh -p <PORT> root@<HOST>

# Run the preparation script
cd /workspace/ai
chmod +x scripts/prepare-template.sh
./scripts/prepare-template.sh
```

**What the script does:**
1. Installs Docker & Docker Compose
2. Installs Tailscale
3. Builds all Docker images
4. Pre-downloads all models
5. Validates everything works
6. Cleans up temporary files

**Estimated time: 45-60 minutes**

### Step 3: Manual Verification

After the script completes, verify everything:

```bash
# Check Docker is installed
docker --version
docker compose version

# Check Tailscale
tailscale version

# Check all images are built
docker images | grep ai_

# Check models are cached
ls -lh /workspace/huggingface_cache/
ls -lh /workspace/flux/models/
ls -lh /workspace/musicgen/models/

# Test orchestrator starts
cd /workspace/ai
docker compose -f compose.yaml up -d orchestrator
docker logs ai_orchestrator

# Test model loading (should be fast since models are cached)
curl http://localhost:9000/health

# Stop orchestrator
docker compose -f compose.yaml down
```

### Step 4: Clean Up Before Saving

**IMPORTANT**: Remove secrets and temporary data before creating template!

```bash
# Remove sensitive data
rm -f /workspace/ai/.env
rm -f /root/.ssh/known_hosts
rm -f /root/.bash_history

# Clear logs
rm -f /var/log/*.log
docker system prune -af --volumes  # Clean Docker cache but keep images

# Clear Tailscale state (will re-authenticate on first use)
tailscale logout

# Create template-ready marker
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
```

### Step 5: Save Template in RunPod Dashboard

1. **Go to RunPod Dashboard** → "My Pods"
2. **Select your prepared pod**
3. **Click "⋮" menu** → "Save as Template"
4. **Template Configuration**:
   - **Name**: `multi-modal-ai-v1.0`
   - **Description**:
     ```
     Multi-Modal AI Stack with Orchestrator
     - Text: vLLM + Qwen 2.5 7B
     - Image: Flux.1 Schnell
     - Music: MusicGen Medium
     - Models pre-cached (~37GB)
     - Ready to deploy in 2-3 minutes
     ```
   - **Category**: `AI/ML`
   - **Docker Image**: (auto-detected)
   - **Container Disk**: 50GB
   - **Expose Ports**: 9000, 8001, 8002, 8003
   - **Environment Variables** (optional):
     ```
     HF_TOKEN=<leave empty, user will add>
     TAILSCALE_AUTHKEY=<leave empty, user will add>
     ```

5. **Click "Save Template"**
6. **Wait for template creation** (5-10 minutes)
7. **Test the template** by deploying a new pod with it

---

## Using Your Template

### Deploy New Pod from Template

1. **RunPod Dashboard** → "➕ Deploy"
2. **Select "Community Templates"** or "My Templates"
3. **Choose**: `multi-modal-ai-v1.0`
4. **Configure**:
   - GPU: RTX 4090 (or compatible)
   - Network Volume: Attach your existing volume with `/workspace` mount
   - Environment:
     - `HF_TOKEN`: Your Hugging Face token
     - (Tailscale will be configured via SSH)

5. **Deploy Pod**

### First-Time Setup (On New Pod)

```bash
# SSH to the new pod
ssh -p <PORT> root@<HOST>

# Navigate to project
cd /workspace/ai

# Create .env file
cat > .env <<EOF
HF_TOKEN=hf_your_token_here
GPU_TAILSCALE_IP=100.100.108.13
EOF

# Configure Tailscale (one-time)
tailscale up --authkey=<YOUR_TAILSCALE_KEY>

# Start orchestrator (models already cached, starts in seconds!)
docker compose -f compose.yaml up -d orchestrator

# Verify
curl http://localhost:9000/health

# Check logs
docker logs -f ai_orchestrator
```

**Total setup time: 2-3 minutes!** 🎉

### Updating SSH Config (If Spot Instance Restarts)

Since Spot instances can restart with new IPs/ports:

```bash
# On your local machine
# Update ~/.ssh/config with new connection details

Host gpu-pivoine
    HostName <NEW_IP>
    Port <NEW_PORT>
    User root
    IdentityFile ~/.ssh/id_ed25519
```

---

## Template Maintenance

### Updating the Template

When you add new models or make improvements:

1. Deploy a pod from your existing template
2. Make your changes
3. Test everything
4. Clean up (remove secrets)
5. Save as new template version: `multi-modal-ai-v1.1`
6. Update your documentation

### Version History

Keep track of template versions:

```
v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator

v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading
```

---

## Troubleshooting Template Creation

### Models Not Downloading

```bash
# Manually trigger model downloads
docker compose --profile text up -d vllm-qwen
docker logs -f ai_vllm-qwen_1
# Wait for "Model loaded successfully"
docker compose stop vllm-qwen

# Repeat for other models
docker compose --profile image up -d flux
docker compose --profile audio up -d musicgen
```

### Docker Images Not Building

```bash
# Build images one at a time
docker compose -f compose.yaml build orchestrator
docker compose -f compose.yaml build vllm-qwen
docker compose -f compose.yaml build musicgen

# Check build logs for errors
docker compose -f compose.yaml build --no-cache --progress=plain orchestrator
```

### Tailscale Won't Install

```bash
# Manual Tailscale installation
curl -fsSL https://tailscale.com/install.sh | sh

# Start daemon
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &

# Test
tailscale version
```

### Template Too Large

RunPod templates have size limits. If your template is too large:

**Option 1**: Use network volume for models
- Move models to network volume: `/workspace/models/`
- Mount volume when deploying from template
- Models persist across pod restarts

**Option 2**: Reduce cached models
- Only cache most-used model (Qwen 2.5 7B)
- Download others on first use
- Accept slightly longer first-time startup

**Option 3**: Use Docker layer optimization
```dockerfile
# In Dockerfile, order commands by change frequency
# Less frequently changed layers first
```

---

## Cost Analysis

### Template Storage Cost
- RunPod charges for template storage: ~$0.10/GB/month
- This template: ~50GB = **~$5/month**
- **Worth it!** Saves 60-90 minutes per Spot restart

### Time Savings
- Spot instance restarts: 2-5 times per week (highly variable)
- Time saved per restart: 60-90 minutes
- **Total saved per month: 8-20 hours**
- **Value: Priceless for rapid deployment**

---

## Advanced: Automated Template Updates

Create a CI/CD pipeline to automatically update templates:

```bash
# GitHub Actions workflow (future enhancement)
# 1. Deploy pod from template
# 2. Pull latest code
# 3. Rebuild images
# 4. Test
# 5. Save new template version
# 6. Notify team
```

---

## Template Checklist

Before saving your template, verify:

- [ ] All Docker images built and working
- [ ] All models downloaded and cached
- [ ] Tailscale installed (but logged out)
- [ ] Docker Compose files present
- [ ] `.env` file removed (secrets cleared)
- [ ] Logs cleared
- [ ] SSH keys removed
- [ ] Bash history cleared
- [ ] Template version documented
- [ ] Test deployment successful

---

## Support

If you have issues creating the template:

1. Check `/workspace/ai/scripts/prepare-template.sh` logs
2. Review Docker build logs: `docker compose build --progress=plain`
3. Check model download logs: `docker logs <container>`
4. Verify disk space: `df -h`
5. Check network volume is mounted: `mount | grep workspace`

For RunPod-specific issues:
- RunPod Docs: https://docs.runpod.io/
- RunPod Discord: https://discord.gg/runpod

---

## Next Steps

After creating your template:

1. ✅ Test deployment from template
2. ✅ Document in `GPU_DEPLOYMENT_LOG.md`
3. ✅ Share template ID with team (if applicable)
4. ✅ Set up monitoring (Netdata, etc.)
5. ✅ Configure auto-stop for cost optimization
6. ✅ Add more models as needed

**Your multi-modal AI infrastructure is now portable and reproducible!** 🚀