refactor: clean up runpod repository structure
Removed facefusion and VPS-related files: - compose.yaml, postgres/, litellm-config.yaml (VPS services) - Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion) Removed outdated documentation: - DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md - TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md) - GPU_EXPANSION_PLAN.md (historical planning doc) - gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions) - deploy-gpu-stack.sh, simple_vllm_server.py (old scripts) Organized documentation: - Created docs/ directory - Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/ - Updated all documentation links in README.md Final structure: - Clean root directory with only GPU-specific files - Organized documentation in docs/ - Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/) - Automation scripts in scripts/
This commit is contained in:
@@ -1,416 +0,0 @@
|
||||
# RunPod Template Creation Guide
|
||||
|
||||
This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.
|
||||
|
||||
## Why Create a Template?
|
||||
|
||||
**Without Template** (Manual Setup Every Time):
|
||||
- ❌ Install Docker & Docker Compose (10-15 min)
|
||||
- ❌ Install Tailscale (5 min)
|
||||
- ❌ Pull Docker images (10-20 min)
|
||||
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
|
||||
- ❌ Configure everything (5-10 min)
|
||||
- **Total: 60-90 minutes per Spot instance restart**
|
||||
|
||||
**With Template** (Ready to Go):
|
||||
- ✅ Everything pre-installed
|
||||
- ✅ Models cached in `/workspace`
|
||||
- ✅ Just start orchestrator
|
||||
- **Total: 2-3 minutes**
|
||||
|
||||
## Template Contents
|
||||
|
||||
### System Software
|
||||
- ✅ Docker 24.x + Docker Compose v2
|
||||
- ✅ Tailscale latest
|
||||
- ✅ NVIDIA Docker runtime
|
||||
- ✅ Python 3.11
|
||||
- ✅ Git, curl, wget, htop, nvtop
|
||||
|
||||
### Docker Images (Pre-built)
|
||||
- ✅ `ai_orchestrator` - Model orchestration service
|
||||
- ✅ `ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B)
|
||||
- ✅ `ai_musicgen_1` - Music generation (AudioCraft)
|
||||
- ✅ `ghcr.io/matatonic/openedai-images-flux:latest` - Image generation
|
||||
|
||||
### Model Cache (/workspace - Persistent)
|
||||
- ✅ Qwen 2.5 7B Instruct (~14GB)
|
||||
- ✅ Flux.1 Schnell (~12GB)
|
||||
- ✅ MusicGen Medium (~11GB)
|
||||
- **Total: ~37GB cached**
|
||||
|
||||
### Project Files (/workspace/ai)
|
||||
- ✅ All orchestrator code
|
||||
- ✅ Docker Compose configurations
|
||||
- ✅ Model service configurations
|
||||
- ✅ Documentation
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step Template Creation
|
||||
|
||||
### Prerequisites
|
||||
1. RunPod account
|
||||
2. Active RTX 4090 pod (or similar GPU)
|
||||
3. SSH access to the pod
|
||||
4. This repository cloned locally
|
||||
|
||||
### Step 1: Deploy Fresh Pod
|
||||
|
||||
```bash
|
||||
# Create new RunPod instance:
|
||||
# - GPU: RTX 4090 (24GB VRAM)
|
||||
# - Disk: 50GB container disk
|
||||
# - Network Volume: Attach or create 100GB+ volume
|
||||
# - Template: Start with official PyTorch or CUDA template
|
||||
|
||||
# Note the SSH connection details (host, port, password)
|
||||
```
|
||||
|
||||
### Step 2: Prepare the Instance
|
||||
|
||||
Run the automated preparation script:
|
||||
|
||||
```bash
|
||||
# On your local machine, copy everything to RunPod
|
||||
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/
|
||||
|
||||
# SSH to the pod
|
||||
ssh -p <PORT> root@<HOST>
|
||||
|
||||
# Run the preparation script
|
||||
cd /workspace/ai
|
||||
chmod +x scripts/prepare-template.sh
|
||||
./scripts/prepare-template.sh
|
||||
```
|
||||
|
||||
**What the script does:**
|
||||
1. Installs Docker & Docker Compose
|
||||
2. Installs Tailscale
|
||||
3. Builds all Docker images
|
||||
4. Pre-downloads all models
|
||||
5. Validates everything works
|
||||
6. Cleans up temporary files
|
||||
|
||||
**Estimated time: 45-60 minutes**
|
||||
|
||||
### Step 3: Manual Verification
|
||||
|
||||
After the script completes, verify everything:
|
||||
|
||||
```bash
|
||||
# Check Docker is installed
|
||||
docker --version
|
||||
docker compose version
|
||||
|
||||
# Check Tailscale
|
||||
tailscale version
|
||||
|
||||
# Check all images are built
|
||||
docker images | grep ai_
|
||||
|
||||
# Check models are cached
|
||||
ls -lh /workspace/huggingface_cache/
|
||||
ls -lh /workspace/flux/models/
|
||||
ls -lh /workspace/musicgen/models/
|
||||
|
||||
# Test orchestrator starts
|
||||
cd /workspace/ai
|
||||
docker compose -f docker-compose.gpu.yaml up -d orchestrator
|
||||
docker logs ai_orchestrator
|
||||
|
||||
# Test model loading (should be fast since models are cached)
|
||||
curl http://localhost:9000/health
|
||||
|
||||
# Stop orchestrator
|
||||
docker compose -f docker-compose.gpu.yaml down
|
||||
```
|
||||
|
||||
### Step 4: Clean Up Before Saving
|
||||
|
||||
**IMPORTANT**: Remove secrets and temporary data before creating template!
|
||||
|
||||
```bash
|
||||
# Remove sensitive data
|
||||
rm -f /workspace/ai/.env
|
||||
rm -f /root/.ssh/known_hosts
|
||||
rm -f /root/.bash_history
|
||||
|
||||
# Clear logs
|
||||
rm -f /var/log/*.log
|
||||
docker system prune -af --volumes # Clean Docker cache but keep images
|
||||
|
||||
# Clear Tailscale state (will re-authenticate on first use)
|
||||
tailscale logout
|
||||
|
||||
# Create template-ready marker
|
||||
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
|
||||
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
|
||||
```
|
||||
|
||||
### Step 5: Save Template in RunPod Dashboard
|
||||
|
||||
1. **Go to RunPod Dashboard** → "My Pods"
|
||||
2. **Select your prepared pod**
|
||||
3. **Click "⋮" menu** → "Save as Template"
|
||||
4. **Template Configuration**:
|
||||
- **Name**: `multi-modal-ai-v1.0`
|
||||
- **Description**:
|
||||
```
|
||||
Multi-Modal AI Stack with Orchestrator
|
||||
- Text: vLLM + Qwen 2.5 7B
|
||||
- Image: Flux.1 Schnell
|
||||
- Music: MusicGen Medium
|
||||
- Models pre-cached (~37GB)
|
||||
- Ready to deploy in 2-3 minutes
|
||||
```
|
||||
- **Category**: `AI/ML`
|
||||
- **Docker Image**: (auto-detected)
|
||||
- **Container Disk**: 50GB
|
||||
- **Expose Ports**: 9000, 8001, 8002, 8003
|
||||
- **Environment Variables** (optional):
|
||||
```
|
||||
HF_TOKEN=<leave empty, user will add>
|
||||
TAILSCALE_AUTHKEY=<leave empty, user will add>
|
||||
```
|
||||
|
||||
5. **Click "Save Template"**
|
||||
6. **Wait for template creation** (5-10 minutes)
|
||||
7. **Test the template** by deploying a new pod with it
|
||||
|
||||
---
|
||||
|
||||
## Using Your Template
|
||||
|
||||
### Deploy New Pod from Template
|
||||
|
||||
1. **RunPod Dashboard** → "➕ Deploy"
|
||||
2. **Select "Community Templates"** or "My Templates"
|
||||
3. **Choose**: `multi-modal-ai-v1.0`
|
||||
4. **Configure**:
|
||||
- GPU: RTX 4090 (or compatible)
|
||||
- Network Volume: Attach your existing volume with `/workspace` mount
|
||||
- Environment:
|
||||
- `HF_TOKEN`: Your Hugging Face token
|
||||
- (Tailscale will be configured via SSH)
|
||||
|
||||
5. **Deploy Pod**
|
||||
|
||||
### First-Time Setup (On New Pod)
|
||||
|
||||
```bash
|
||||
# SSH to the new pod
|
||||
ssh -p <PORT> root@<HOST>
|
||||
|
||||
# Navigate to project
|
||||
cd /workspace/ai
|
||||
|
||||
# Create .env file
|
||||
cat > .env <<EOF
|
||||
HF_TOKEN=hf_your_token_here
|
||||
GPU_TAILSCALE_IP=100.100.108.13
|
||||
EOF
|
||||
|
||||
# Configure Tailscale (one-time)
|
||||
tailscale up --authkey=<YOUR_TAILSCALE_KEY>
|
||||
|
||||
# Start orchestrator (models already cached, starts in seconds!)
|
||||
docker compose -f docker-compose.gpu.yaml up -d orchestrator
|
||||
|
||||
# Verify
|
||||
curl http://localhost:9000/health
|
||||
|
||||
# Check logs
|
||||
docker logs -f ai_orchestrator
|
||||
```
|
||||
|
||||
**Total setup time: 2-3 minutes!** 🎉
|
||||
|
||||
### Updating SSH Config (If Spot Instance Restarts)
|
||||
|
||||
Since Spot instances can restart with new IPs/ports:
|
||||
|
||||
```bash
|
||||
# On your local machine
|
||||
# Update ~/.ssh/config with new connection details
|
||||
|
||||
Host gpu-pivoine
|
||||
HostName <NEW_IP>
|
||||
Port <NEW_PORT>
|
||||
User root
|
||||
IdentityFile ~/.ssh/id_ed25519
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template Maintenance
|
||||
|
||||
### Updating the Template
|
||||
|
||||
When you add new models or make improvements:
|
||||
|
||||
1. Deploy a pod from your existing template
|
||||
2. Make your changes
|
||||
3. Test everything
|
||||
4. Clean up (remove secrets)
|
||||
5. Save as new template version: `multi-modal-ai-v1.1`
|
||||
6. Update your documentation
|
||||
|
||||
### Version History
|
||||
|
||||
Keep track of template versions:
|
||||
|
||||
```
|
||||
v1.0 (2025-11-21) - Initial release
|
||||
- Text: Qwen 2.5 7B
|
||||
- Image: Flux.1 Schnell
|
||||
- Music: MusicGen Medium
|
||||
- Docker orchestrator
|
||||
|
||||
v1.1 (future) - Planned
|
||||
- Add Llama 3.1 8B
|
||||
- Add Whisper Large v3
|
||||
- Optimize model loading
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Template Creation
|
||||
|
||||
### Models Not Downloading
|
||||
|
||||
```bash
|
||||
# Manually trigger model downloads
|
||||
docker compose --profile text up -d vllm-qwen
|
||||
docker logs -f ai_vllm-qwen_1
|
||||
# Wait for "Model loaded successfully"
|
||||
docker compose stop vllm-qwen
|
||||
|
||||
# Repeat for other models
|
||||
docker compose --profile image up -d flux
|
||||
docker compose --profile audio up -d musicgen
|
||||
```
|
||||
|
||||
### Docker Images Not Building
|
||||
|
||||
```bash
|
||||
# Build images one at a time
|
||||
docker compose -f docker-compose.gpu.yaml build orchestrator
|
||||
docker compose -f docker-compose.gpu.yaml build vllm-qwen
|
||||
docker compose -f docker-compose.gpu.yaml build musicgen
|
||||
|
||||
# Check build logs for errors
|
||||
docker compose -f docker-compose.gpu.yaml build --no-cache --progress=plain orchestrator
|
||||
```
|
||||
|
||||
### Tailscale Won't Install
|
||||
|
||||
```bash
|
||||
# Manual Tailscale installation
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Start daemon
|
||||
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
|
||||
|
||||
# Test
|
||||
tailscale version
|
||||
```
|
||||
|
||||
### Template Too Large
|
||||
|
||||
RunPod templates have size limits. If your template is too large:
|
||||
|
||||
**Option 1**: Use network volume for models
|
||||
- Move models to network volume: `/workspace/models/`
|
||||
- Mount volume when deploying from template
|
||||
- Models persist across pod restarts
|
||||
|
||||
**Option 2**: Reduce cached models
|
||||
- Only cache most-used model (Qwen 2.5 7B)
|
||||
- Download others on first use
|
||||
- Accept slightly longer first-time startup
|
||||
|
||||
**Option 3**: Use Docker layer optimization
|
||||
```dockerfile
|
||||
# In Dockerfile, order commands by change frequency
|
||||
# Less frequently changed layers first
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### Template Storage Cost
|
||||
- RunPod charges for template storage: ~$0.10/GB/month
|
||||
- This template: ~50GB = **~$5/month**
|
||||
- **Worth it!** Saves 60-90 minutes per Spot restart
|
||||
|
||||
### Time Savings
|
||||
- Spot instance restarts: 2-5 times per week (highly variable)
|
||||
- Time saved per restart: 60-90 minutes
|
||||
- **Total saved per month: 8-20 hours**
|
||||
- **Value: Priceless for rapid deployment**
|
||||
|
||||
---
|
||||
|
||||
## Advanced: Automated Template Updates
|
||||
|
||||
Create a CI/CD pipeline to automatically update templates:
|
||||
|
||||
```bash
|
||||
# GitHub Actions workflow (future enhancement)
|
||||
# 1. Deploy pod from template
|
||||
# 2. Pull latest code
|
||||
# 3. Rebuild images
|
||||
# 4. Test
|
||||
# 5. Save new template version
|
||||
# 6. Notify team
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template Checklist
|
||||
|
||||
Before saving your template, verify:
|
||||
|
||||
- [ ] All Docker images built and working
|
||||
- [ ] All models downloaded and cached
|
||||
- [ ] Tailscale installed (but logged out)
|
||||
- [ ] Docker Compose files present
|
||||
- [ ] `.env` file removed (secrets cleared)
|
||||
- [ ] Logs cleared
|
||||
- [ ] SSH keys removed
|
||||
- [ ] Bash history cleared
|
||||
- [ ] Template version documented
|
||||
- [ ] Test deployment successful
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
If you have issues creating the template:
|
||||
|
||||
1. Check `/workspace/ai/scripts/prepare-template.sh` logs
|
||||
2. Review Docker build logs: `docker compose build --progress=plain`
|
||||
3. Check model download logs: `docker logs <container>`
|
||||
4. Verify disk space: `df -h`
|
||||
5. Check network volume is mounted: `mount | grep workspace`
|
||||
|
||||
For RunPod-specific issues:
|
||||
- RunPod Docs: https://docs.runpod.io/
|
||||
- RunPod Discord: https://discord.gg/runpod
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After creating your template:
|
||||
|
||||
1. ✅ Test deployment from template
|
||||
2. ✅ Document in `GPU_DEPLOYMENT_LOG.md`
|
||||
3. ✅ Share template ID with team (if applicable)
|
||||
4. ✅ Set up monitoring (Netdata, etc.)
|
||||
5. ✅ Configure auto-stop for cost optimization
|
||||
6. ✅ Add more models as needed
|
||||
|
||||
**Your multi-modal AI infrastructure is now portable and reproducible!** 🚀
|
||||
Reference in New Issue
Block a user