refactor: clean up runpod repository structure

Removed facefusion and VPS-related files: - compose.yaml, postgres/, litellm-config.yaml (VPS services) - Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion) Removed outdated documentation: - DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md - TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md) - GPU_EXPANSION_PLAN.md (historical planning doc) - gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions) - deploy-gpu-stack.sh, simple_vllm_server.py (old scripts) Organized documentation: - Created docs/ directory - Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/ - Updated all documentation links in README.md Final structure: - Clean root directory with only GPU-specific files - Organized documentation in docs/ - Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/) - Automation scripts in scripts/
2025-11-21 14:45:49 +01:00
parent 277f1c95bd
commit cafa0a1147
20 changed files with 8 additions and 4612 deletions
--- a/RUNPOD_TEMPLATE.md
+++ b/RUNPOD_TEMPLATE.md
@@ -1,416 +0,0 @@
-# RunPod Template Creation Guide
-
-This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.
-
-## Why Create a Template?
-
-**Without Template** (Manual Setup Every Time):
- ❌ Install Docker & Docker Compose (10-15 min)
- ❌ Install Tailscale (5 min)
- ❌ Pull Docker images (10-20 min)
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
- ❌ Configure everything (5-10 min)
- **Total: 60-90 minutes per Spot instance restart**
-
-**With Template** (Ready to Go):
- ✅ Everything pre-installed
- ✅ Models cached in `/workspace`
- ✅ Just start orchestrator
- **Total: 2-3 minutes**
-
-## Template Contents
-
-### System Software
- ✅ Docker 24.x + Docker Compose v2
- ✅ Tailscale latest
- ✅ NVIDIA Docker runtime
- ✅ Python 3.11
- ✅ Git, curl, wget, htop, nvtop
-
-### Docker Images (Pre-built)
- ✅ `ai_orchestrator` - Model orchestration service
- ✅ `ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B)
- ✅ `ai_musicgen_1` - Music generation (AudioCraft)
- ✅ `ghcr.io/matatonic/openedai-images-flux:latest` - Image generation
-
-### Model Cache (/workspace - Persistent)
- ✅ Qwen 2.5 7B Instruct (~14GB)
- ✅ Flux.1 Schnell (~12GB)
- ✅ MusicGen Medium (~11GB)
- **Total: ~37GB cached**
-
-### Project Files (/workspace/ai)
- ✅ All orchestrator code
- ✅ Docker Compose configurations
- ✅ Model service configurations
- ✅ Documentation
-
---
-
-## Step-by-Step Template Creation
-
-### Prerequisites
-1. RunPod account
-2. Active RTX 4090 pod (or similar GPU)
-3. SSH access to the pod
-4. This repository cloned locally
-
-### Step 1: Deploy Fresh Pod
-
-```bash
-# Create new RunPod instance:
-# - GPU: RTX 4090 (24GB VRAM)
-# - Disk: 50GB container disk
-# - Network Volume: Attach or create 100GB+ volume
-# - Template: Start with official PyTorch or CUDA template
-
-# Note the SSH connection details (host, port, password)
-```
-
-### Step 2: Prepare the Instance
-
-Run the automated preparation script:
-
-```bash
-# On your local machine, copy everything to RunPod
-scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/
-
-# SSH to the pod
-ssh -p <PORT> root@<HOST>
-
-# Run the preparation script
-cd /workspace/ai
-chmod +x scripts/prepare-template.sh
-./scripts/prepare-template.sh
-```
-
-**What the script does:**
-1. Installs Docker & Docker Compose
-2. Installs Tailscale
-3. Builds all Docker images
-4. Pre-downloads all models
-5. Validates everything works
-6. Cleans up temporary files
-
-**Estimated time: 45-60 minutes**
-
-### Step 3: Manual Verification
-
-After the script completes, verify everything:
-
-```bash
-# Check Docker is installed
-docker --version
-docker compose version
-
-# Check Tailscale
-tailscale version
-
-# Check all images are built
-docker images | grep ai_
-
-# Check models are cached
-ls -lh /workspace/huggingface_cache/
-ls -lh /workspace/flux/models/
-ls -lh /workspace/musicgen/models/
-
-# Test orchestrator starts
-cd /workspace/ai
-docker compose -f docker-compose.gpu.yaml up -d orchestrator
-docker logs ai_orchestrator
-
-# Test model loading (should be fast since models are cached)
-curl http://localhost:9000/health
-
-# Stop orchestrator
-docker compose -f docker-compose.gpu.yaml down
-```
-
-### Step 4: Clean Up Before Saving
-
-**IMPORTANT**: Remove secrets and temporary data before creating template!
-
-```bash
-# Remove sensitive data
-rm -f /workspace/ai/.env
-rm -f /root/.ssh/known_hosts
-rm -f /root/.bash_history
-
-# Clear logs
-rm -f /var/log/*.log
-docker system prune -af --volumes  # Clean Docker cache but keep images
-
-# Clear Tailscale state (will re-authenticate on first use)
-tailscale logout
-
-# Create template-ready marker
-echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
-echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
-```
-
-### Step 5: Save Template in RunPod Dashboard
-
-1. **Go to RunPod Dashboard** → "My Pods"
-2. **Select your prepared pod**
-3. **Click "⋮" menu** → "Save as Template"
-4. **Template Configuration**:
-   - **Name**: `multi-modal-ai-v1.0`
-   - **Description**:
-     ```
-     Multi-Modal AI Stack with Orchestrator
-     - Text: vLLM + Qwen 2.5 7B
-     - Image: Flux.1 Schnell
-     - Music: MusicGen Medium
-     - Models pre-cached (~37GB)
-     - Ready to deploy in 2-3 minutes
-     ```
-   - **Category**: `AI/ML`
-   - **Docker Image**: (auto-detected)
-   - **Container Disk**: 50GB
-   - **Expose Ports**: 9000, 8001, 8002, 8003
-   - **Environment Variables** (optional):
-     ```
-     HF_TOKEN=<leave empty, user will add>
-     TAILSCALE_AUTHKEY=<leave empty, user will add>
-     ```
-
-5. **Click "Save Template"**
-6. **Wait for template creation** (5-10 minutes)
-7. **Test the template** by deploying a new pod with it
-
---
-
-## Using Your Template
-
-### Deploy New Pod from Template
-
-1. **RunPod Dashboard** → "➕ Deploy"
-2. **Select "Community Templates"** or "My Templates"
-3. **Choose**: `multi-modal-ai-v1.0`
-4. **Configure**:
-   - GPU: RTX 4090 (or compatible)
-   - Network Volume: Attach your existing volume with `/workspace` mount
-   - Environment:
-     - `HF_TOKEN`: Your Hugging Face token
-     - (Tailscale will be configured via SSH)
-
-5. **Deploy Pod**
-
-### First-Time Setup (On New Pod)
-
-```bash
-# SSH to the new pod
-ssh -p <PORT> root@<HOST>
-
-# Navigate to project
-cd /workspace/ai
-
-# Create .env file
-cat > .env <<EOF
-HF_TOKEN=hf_your_token_here
-GPU_TAILSCALE_IP=100.100.108.13
-EOF
-
-# Configure Tailscale (one-time)
-tailscale up --authkey=<YOUR_TAILSCALE_KEY>
-
-# Start orchestrator (models already cached, starts in seconds!)
-docker compose -f docker-compose.gpu.yaml up -d orchestrator
-
-# Verify
-curl http://localhost:9000/health
-
-# Check logs
-docker logs -f ai_orchestrator
-```
-
-**Total setup time: 2-3 minutes!** 🎉
-
-### Updating SSH Config (If Spot Instance Restarts)
-
-Since Spot instances can restart with new IPs/ports:
-
-```bash
-# On your local machine
-# Update ~/.ssh/config with new connection details
-
-Host gpu-pivoine
-    HostName <NEW_IP>
-    Port <NEW_PORT>
-    User root
-    IdentityFile ~/.ssh/id_ed25519
-```
-
---
-
-## Template Maintenance
-
-### Updating the Template
-
-When you add new models or make improvements:
-
-1. Deploy a pod from your existing template
-2. Make your changes
-3. Test everything
-4. Clean up (remove secrets)
-5. Save as new template version: `multi-modal-ai-v1.1`
-6. Update your documentation
-
-### Version History
-
-Keep track of template versions:
-
-```
-v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator
-
-v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading
-```
-
---
-
-## Troubleshooting Template Creation
-
-### Models Not Downloading
-
-```bash
-# Manually trigger model downloads
-docker compose --profile text up -d vllm-qwen
-docker logs -f ai_vllm-qwen_1
-# Wait for "Model loaded successfully"
-docker compose stop vllm-qwen
-
-# Repeat for other models
-docker compose --profile image up -d flux
-docker compose --profile audio up -d musicgen
-```
-
-### Docker Images Not Building
-
-```bash
-# Build images one at a time
-docker compose -f docker-compose.gpu.yaml build orchestrator
-docker compose -f docker-compose.gpu.yaml build vllm-qwen
-docker compose -f docker-compose.gpu.yaml build musicgen
-
-# Check build logs for errors
-docker compose -f docker-compose.gpu.yaml build --no-cache --progress=plain orchestrator
-```
-
-### Tailscale Won't Install
-
-```bash
-# Manual Tailscale installation
-curl -fsSL https://tailscale.com/install.sh | sh
-
-# Start daemon
-tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
-
-# Test
-tailscale version
-```
-
-### Template Too Large
-
-RunPod templates have size limits. If your template is too large:
-
-**Option 1**: Use network volume for models
- Move models to network volume: `/workspace/models/`
- Mount volume when deploying from template
- Models persist across pod restarts
-
-**Option 2**: Reduce cached models
- Only cache most-used model (Qwen 2.5 7B)
- Download others on first use
- Accept slightly longer first-time startup
-
-**Option 3**: Use Docker layer optimization
-```dockerfile
-# In Dockerfile, order commands by change frequency
-# Less frequently changed layers first
-```
-
---
-
-## Cost Analysis
-
-### Template Storage Cost
- RunPod charges for template storage: ~$0.10/GB/month
- This template: ~50GB = **~$5/month**
- **Worth it!** Saves 60-90 minutes per Spot restart
-
-### Time Savings
- Spot instance restarts: 2-5 times per week (highly variable)
- Time saved per restart: 60-90 minutes
- **Total saved per month: 8-20 hours**
- **Value: Priceless for rapid deployment**
-
---
-
-## Advanced: Automated Template Updates
-
-Create a CI/CD pipeline to automatically update templates:
-
-```bash
-# GitHub Actions workflow (future enhancement)
-# 1. Deploy pod from template
-# 2. Pull latest code
-# 3. Rebuild images
-# 4. Test
-# 5. Save new template version
-# 6. Notify team
-```
-
---
-
-## Template Checklist
-
-Before saving your template, verify:
-
- [ ] All Docker images built and working
- [ ] All models downloaded and cached
- [ ] Tailscale installed (but logged out)
- [ ] Docker Compose files present
- [ ] `.env` file removed (secrets cleared)
- [ ] Logs cleared
- [ ] SSH keys removed
- [ ] Bash history cleared
- [ ] Template version documented
- [ ] Test deployment successful
-
---
-
-## Support
-
-If you have issues creating the template:
-
-1. Check `/workspace/ai/scripts/prepare-template.sh` logs
-2. Review Docker build logs: `docker compose build --progress=plain`
-3. Check model download logs: `docker logs <container>`
-4. Verify disk space: `df -h`
-5. Check network volume is mounted: `mount | grep workspace`
-
-For RunPod-specific issues:
- RunPod Docs: https://docs.runpod.io/
- RunPod Discord: https://discord.gg/runpod
-
---
-
-## Next Steps
-
-After creating your template:
-
-1. ✅ Test deployment from template
-2. ✅ Document in `GPU_DEPLOYMENT_LOG.md`
-3. ✅ Share template ID with team (if applicable)
-4. ✅ Set up monitoring (Netdata, etc.)
-5. ✅ Configure auto-stop for cost optimization
-6. ✅ Add more models as needed
-
-**Your multi-modal AI infrastructure is now portable and reproducible!** 🚀