docs(ai): add comprehensive GPU setup documentation and configs

- Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 12:57:06 +01:00
parent c0b1308ffe
commit 8de88d96ac
10 changed files with 4089 additions and 0 deletions
--- a/ai/TAILSCALE_SETUP.md
+++ b/ai/TAILSCALE_SETUP.md
@@ -0,0 +1,417 @@
+# Tailscale VPN Setup - Better Alternative to WireGuard
+
+## Why Tailscale?
+
+RunPod doesn't support UDP ports, which blocks WireGuard. Tailscale solves this by:
+- ✅ Works over HTTPS (TCP) - no UDP needed
+- ✅ Zero configuration - automatic setup
+- ✅ Free for personal use
+- ✅ Built on WireGuard (same security)
+- ✅ Automatic NAT traversal
+- ✅ Peer-to-peer when possible (low latency)
+
+---
+
+## Step 1: Create Tailscale Account
+
+1. Go to: https://tailscale.com/
+2. Click **"Get Started"**
+3. Sign up with **GitHub** or **Google** (easiest)
+4. You'll be redirected to the Tailscale admin console
+
+**No credit card required!** Free tier is perfect for our use case.
+
+---
+
+## Step 2: Install Tailscale on VPS
+
+**SSH into your VPS:**
+
+```bash
+ssh root@vps
+```
+
+**Install Tailscale:**
+
+```bash
+# Download and run install script
+curl -fsSL https://tailscale.com/install.sh | sh
+
+# Start Tailscale
+tailscale up
+
+# You'll see a URL like:
+# https://login.tailscale.com/a/xxxxxxxxxx
+```
+
+**Authenticate:**
+1. Copy the URL and open in browser
+2. Click **"Connect"** to authorize the device
+3. Name it: `pivoine-vps`
+
+**Check status:**
+```bash
+tailscale status
+```
+
+You should see your VPS listed with an IP like `100.x.x.x`
+
+**Save your VPS Tailscale IP:**
+```bash
+tailscale ip -4
+# Example output: 100.101.102.103
+```
+
+**Write this down - you'll need it!**
+
+---
+
+## Step 3: Install Tailscale on GPU Server
+
+**SSH into your RunPod GPU server:**
+
+```bash
+ssh root@abc123def456-12345678.runpod.io -p 12345
+```
+
+**Install Tailscale:**
+
+```bash
+# Download and run install script
+curl -fsSL https://tailscale.com/install.sh | sh
+
+# Start Tailscale
+tailscale up --advertise-tags=tag:gpu
+
+# You'll see another URL
+```
+
+**Authenticate:**
+1. Copy the URL and open in browser
+2. Click **"Connect"**
+3. Name it: `gpu-runpod`
+
+**Check status:**
+```bash
+tailscale status
+```
+
+You should now see BOTH devices:
+- `pivoine-vps` - 100.x.x.x
+- `gpu-runpod` - 100.x.x.x
+
+**Save your GPU server Tailscale IP:**
+```bash
+tailscale ip -4
+# Example output: 100.104.105.106
+```
+
+---
+
+## Step 4: Test Connectivity
+
+**From VPS, ping GPU server:**
+
+```bash
+# SSH into VPS
+ssh root@vps
+
+# Ping GPU server (use its Tailscale IP)
+ping 100.104.105.106 -c 4
+```
+
+Expected output:
+```
+PING 100.104.105.106 (100.104.105.106) 56(84) bytes of data.
+64 bytes from 100.104.105.106: icmp_seq=1 ttl=64 time=15.3 ms
+64 bytes from 100.104.105.106: icmp_seq=2 ttl=64 time=14.8 ms
+...
+```
+
+**From GPU server, ping VPS:**
+
+```bash
+# SSH into GPU server
+ssh root@abc123def456-12345678.runpod.io -p 12345
+
+# Ping VPS (use its Tailscale IP)
+ping 100.101.102.103 -c 4
+```
+
+**Both should work!** ✅
+
+---
+
+## Step 5: Update Configuration Files
+
+Now update the IP addresses in your configs to use Tailscale IPs.
+
+### On GPU Server (.env file)
+
+**Edit your .env file:**
+
+```bash
+# On GPU server
+cd /workspace/gpu-stack
+
+nano .env
+```
+
+**Update these lines:**
+```bash
+# VPN Network (use your actual Tailscale IPs)
+VPS_IP=100.101.102.103      # Your VPS Tailscale IP
+GPU_IP=100.104.105.106      # Your GPU Tailscale IP
+
+# PostgreSQL (on VPS)
+DB_HOST=100.101.102.103     # Your VPS Tailscale IP
+DB_PORT=5432
+```
+
+Save and exit (Ctrl+X, Y, Enter)
+
+### On VPS (LiteLLM config)
+
+**Edit your LiteLLM config:**
+
+```bash
+# On VPS
+ssh root@vps
+cd ~/Projects/docker-compose/ai
+
+nano litellm-config-gpu.yaml
+```
+
+**Update the GPU server IP:**
+
+```yaml
+# Find this section and update IP:
+  - model_name: llama-3.1-8b
+    litellm_params:
+      model: openai/meta-llama/Meta-Llama-3.1-8B-Instruct
+      api_base: http://100.104.105.106:8000/v1  # Use GPU Tailscale IP
+      api_key: dummy
+```
+
+Save and exit.
+
+---
+
+## Step 6: Verify PostgreSQL Access
+
+**From GPU server, test database connection:**
+
+```bash
+# Install PostgreSQL client
+apt install -y postgresql-client
+
+# Test connection (use your VPS Tailscale IP)
+psql -h 100.101.102.103 -U valknar -d openwebui -c "SELECT 1;"
+```
+
+**If this fails, allow Tailscale network on VPS PostgreSQL:**
+
+```bash
+# On VPS
+ssh root@vps
+
+# Check if postgres allows Tailscale network
+docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
+
+# If not present, add it:
+docker exec -it core_postgres bash
+
+# Inside container:
+echo "host    all             all             100.0.0.0/8             scram-sha-256" >> /var/lib/postgresql/data/pg_hba.conf
+
+# Restart postgres
+exit
+docker restart core_postgres
+```
+
+Try connecting again - should work now!
+
+---
+
+## Tailscale Management
+
+### View Connected Devices
+
+**Web dashboard:**
+https://login.tailscale.com/admin/machines
+
+You'll see all your devices with their Tailscale IPs.
+
+**Command line:**
+```bash
+tailscale status
+```
+
+### Disconnect/Reconnect
+
+```bash
+# Stop Tailscale
+tailscale down
+
+# Start Tailscale
+tailscale up
+```
+
+### Remove Device
+
+From web dashboard:
+1. Click on device
+2. Click "..." menu
+3. Select "Disable" or "Delete"
+
+---
+
+## Advantages Over WireGuard
+
+✅ **Works anywhere** - No UDP ports needed
+✅ **Auto-reconnect** - Survives network changes
+✅ **Multiple devices** - Easy to add laptop, phone, etc.
+✅ **NAT traversal** - Direct peer-to-peer when possible
+✅ **Access Control** - Manage from web dashboard
+✅ **Monitoring** - See connection status in real-time
+
+---
+
+## Security Notes
+
+🔒 **Tailscale is secure:**
+- End-to-end encrypted (WireGuard)
+- Zero-trust architecture
+- No Tailscale servers can see your traffic
+- Only authenticated devices can connect
+
+🔒 **Access control:**
+- Only devices you authorize can join
+- Revoke access anytime from dashboard
+- Set ACLs for fine-grained control
+
+---
+
+## Network Reference (Updated)
+
+**Old (WireGuard):**
+- VPS: `10.8.0.1`
+- GPU: `10.8.0.2`
+
+**New (Tailscale):**
+- VPS: `100.101.102.103` (example - use your actual IP)
+- GPU: `100.104.105.106` (example - use your actual IP)
+
+**All services now accessible via Tailscale:**
+
+**From VPS to GPU:**
+- vLLM: `http://100.104.105.106:8000`
+- ComfyUI: `http://100.104.105.106:8188`
+- JupyterLab: `http://100.104.105.106:8888`
+- Netdata: `http://100.104.105.106:19999`
+
+**From GPU to VPS:**
+- PostgreSQL: `100.101.102.103:5432`
+- Redis: `100.101.102.103:6379`
+- LiteLLM: `http://100.101.102.103:4000`
+
+---
+
+## Troubleshooting
+
+### Can't ping between devices
+
+**Check Tailscale status:**
+```bash
+tailscale status
+```
+
+Both devices should show "active" or "online".
+
+**Check connectivity:**
+```bash
+tailscale ping 100.104.105.106
+```
+
+**Restart Tailscale:**
+```bash
+tailscale down && tailscale up
+```
+
+### PostgreSQL connection refused
+
+**Check if postgres is listening on all interfaces:**
+```bash
+# On VPS
+docker exec core_postgres cat /var/lib/postgresql/data/postgresql.conf | grep listen_addresses
+```
+
+Should show: `listen_addresses = '*'`
+
+**Check pg_hba.conf allows Tailscale network:**
+```bash
+docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
+```
+
+Should have line:
+```
+host    all             all             100.0.0.0/8             scram-sha-256
+```
+
+### Device not showing in network
+
+**Re-authenticate:**
+```bash
+tailscale logout
+tailscale up
+# Click the new URL to re-authenticate
+```
+
+---
+
+## Verification Checklist
+
+Before proceeding:
+- [ ] Tailscale account created
+- [ ] Tailscale installed on VPS
+- [ ] Tailscale installed on GPU server
+- [ ] Both devices visible in `tailscale status`
+- [ ] VPS can ping GPU server (via Tailscale IP)
+- [ ] GPU server can ping VPS (via Tailscale IP)
+- [ ] PostgreSQL accessible from GPU server
+- [ ] .env file updated with Tailscale IPs
+- [ ] LiteLLM config updated with GPU Tailscale IP
+
+---
+
+## Next Steps
+
+✅ **Network configured!** Proceed to Docker & GPU setup:
+
+```bash
+cat /home/valknar/Projects/docker-compose/ai/DOCKER_GPU_SETUP.md
+```
+
+**Your Tailscale IPs (save these!):**
+- VPS: `__________________` (from `tailscale ip -4` on VPS)
+- GPU: `__________________` (from `tailscale ip -4` on GPU server)
+
+---
+
+## Bonus: Add Your Local Machine
+
+Want to access GPU server from your laptop?
+
+```bash
+# On your local machine
+curl -fsSL https://tailscale.com/install.sh | sh
+tailscale up
+
+# Now you can SSH directly via Tailscale:
+ssh root@100.104.105.106
+
+# Or access ComfyUI in browser:
+# http://100.104.105.106:8188
+```
+
+No more port forwarding needed! 🎉