docs(ai): add comprehensive GPU setup documentation and configs
- Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
417
ai/TAILSCALE_SETUP.md
Normal file
417
ai/TAILSCALE_SETUP.md
Normal file
@@ -0,0 +1,417 @@
|
||||
# Tailscale VPN Setup - Better Alternative to WireGuard
|
||||
|
||||
## Why Tailscale?
|
||||
|
||||
RunPod doesn't support UDP ports, which blocks WireGuard. Tailscale solves this by:
|
||||
- ✅ Works over HTTPS (TCP) - no UDP needed
|
||||
- ✅ Zero configuration - automatic setup
|
||||
- ✅ Free for personal use
|
||||
- ✅ Built on WireGuard (same security)
|
||||
- ✅ Automatic NAT traversal
|
||||
- ✅ Peer-to-peer when possible (low latency)
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Create Tailscale Account
|
||||
|
||||
1. Go to: https://tailscale.com/
|
||||
2. Click **"Get Started"**
|
||||
3. Sign up with **GitHub** or **Google** (easiest)
|
||||
4. You'll be redirected to the Tailscale admin console
|
||||
|
||||
**No credit card required!** Free tier is perfect for our use case.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Install Tailscale on VPS
|
||||
|
||||
**SSH into your VPS:**
|
||||
|
||||
```bash
|
||||
ssh root@vps
|
||||
```
|
||||
|
||||
**Install Tailscale:**
|
||||
|
||||
```bash
|
||||
# Download and run install script
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Start Tailscale
|
||||
tailscale up
|
||||
|
||||
# You'll see a URL like:
|
||||
# https://login.tailscale.com/a/xxxxxxxxxx
|
||||
```
|
||||
|
||||
**Authenticate:**
|
||||
1. Copy the URL and open in browser
|
||||
2. Click **"Connect"** to authorize the device
|
||||
3. Name it: `pivoine-vps`
|
||||
|
||||
**Check status:**
|
||||
```bash
|
||||
tailscale status
|
||||
```
|
||||
|
||||
You should see your VPS listed with an IP like `100.x.x.x`
|
||||
|
||||
**Save your VPS Tailscale IP:**
|
||||
```bash
|
||||
tailscale ip -4
|
||||
# Example output: 100.101.102.103
|
||||
```
|
||||
|
||||
**Write this down - you'll need it!**
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Install Tailscale on GPU Server
|
||||
|
||||
**SSH into your RunPod GPU server:**
|
||||
|
||||
```bash
|
||||
ssh root@abc123def456-12345678.runpod.io -p 12345
|
||||
```
|
||||
|
||||
**Install Tailscale:**
|
||||
|
||||
```bash
|
||||
# Download and run install script
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Start Tailscale
|
||||
tailscale up --advertise-tags=tag:gpu
|
||||
|
||||
# You'll see another URL
|
||||
```
|
||||
|
||||
**Authenticate:**
|
||||
1. Copy the URL and open in browser
|
||||
2. Click **"Connect"**
|
||||
3. Name it: `gpu-runpod`
|
||||
|
||||
**Check status:**
|
||||
```bash
|
||||
tailscale status
|
||||
```
|
||||
|
||||
You should now see BOTH devices:
|
||||
- `pivoine-vps` - 100.x.x.x
|
||||
- `gpu-runpod` - 100.x.x.x
|
||||
|
||||
**Save your GPU server Tailscale IP:**
|
||||
```bash
|
||||
tailscale ip -4
|
||||
# Example output: 100.104.105.106
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Test Connectivity
|
||||
|
||||
**From VPS, ping GPU server:**
|
||||
|
||||
```bash
|
||||
# SSH into VPS
|
||||
ssh root@vps
|
||||
|
||||
# Ping GPU server (use its Tailscale IP)
|
||||
ping 100.104.105.106 -c 4
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
PING 100.104.105.106 (100.104.105.106) 56(84) bytes of data.
|
||||
64 bytes from 100.104.105.106: icmp_seq=1 ttl=64 time=15.3 ms
|
||||
64 bytes from 100.104.105.106: icmp_seq=2 ttl=64 time=14.8 ms
|
||||
...
|
||||
```
|
||||
|
||||
**From GPU server, ping VPS:**
|
||||
|
||||
```bash
|
||||
# SSH into GPU server
|
||||
ssh root@abc123def456-12345678.runpod.io -p 12345
|
||||
|
||||
# Ping VPS (use its Tailscale IP)
|
||||
ping 100.101.102.103 -c 4
|
||||
```
|
||||
|
||||
**Both should work!** ✅
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Update Configuration Files
|
||||
|
||||
Now update the IP addresses in your configs to use Tailscale IPs.
|
||||
|
||||
### On GPU Server (.env file)
|
||||
|
||||
**Edit your .env file:**
|
||||
|
||||
```bash
|
||||
# On GPU server
|
||||
cd /workspace/gpu-stack
|
||||
|
||||
nano .env
|
||||
```
|
||||
|
||||
**Update these lines:**
|
||||
```bash
|
||||
# VPN Network (use your actual Tailscale IPs)
|
||||
VPS_IP=100.101.102.103 # Your VPS Tailscale IP
|
||||
GPU_IP=100.104.105.106 # Your GPU Tailscale IP
|
||||
|
||||
# PostgreSQL (on VPS)
|
||||
DB_HOST=100.101.102.103 # Your VPS Tailscale IP
|
||||
DB_PORT=5432
|
||||
```
|
||||
|
||||
Save and exit (Ctrl+X, Y, Enter)
|
||||
|
||||
### On VPS (LiteLLM config)
|
||||
|
||||
**Edit your LiteLLM config:**
|
||||
|
||||
```bash
|
||||
# On VPS
|
||||
ssh root@vps
|
||||
cd ~/Projects/docker-compose/ai
|
||||
|
||||
nano litellm-config-gpu.yaml
|
||||
```
|
||||
|
||||
**Update the GPU server IP:**
|
||||
|
||||
```yaml
|
||||
# Find this section and update IP:
|
||||
- model_name: llama-3.1-8b
|
||||
litellm_params:
|
||||
model: openai/meta-llama/Meta-Llama-3.1-8B-Instruct
|
||||
api_base: http://100.104.105.106:8000/v1 # Use GPU Tailscale IP
|
||||
api_key: dummy
|
||||
```
|
||||
|
||||
Save and exit.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Verify PostgreSQL Access
|
||||
|
||||
**From GPU server, test database connection:**
|
||||
|
||||
```bash
|
||||
# Install PostgreSQL client
|
||||
apt install -y postgresql-client
|
||||
|
||||
# Test connection (use your VPS Tailscale IP)
|
||||
psql -h 100.101.102.103 -U valknar -d openwebui -c "SELECT 1;"
|
||||
```
|
||||
|
||||
**If this fails, allow Tailscale network on VPS PostgreSQL:**
|
||||
|
||||
```bash
|
||||
# On VPS
|
||||
ssh root@vps
|
||||
|
||||
# Check if postgres allows Tailscale network
|
||||
docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
|
||||
|
||||
# If not present, add it:
|
||||
docker exec -it core_postgres bash
|
||||
|
||||
# Inside container:
|
||||
echo "host all all 100.0.0.0/8 scram-sha-256" >> /var/lib/postgresql/data/pg_hba.conf
|
||||
|
||||
# Restart postgres
|
||||
exit
|
||||
docker restart core_postgres
|
||||
```
|
||||
|
||||
Try connecting again - should work now!
|
||||
|
||||
---
|
||||
|
||||
## Tailscale Management
|
||||
|
||||
### View Connected Devices
|
||||
|
||||
**Web dashboard:**
|
||||
https://login.tailscale.com/admin/machines
|
||||
|
||||
You'll see all your devices with their Tailscale IPs.
|
||||
|
||||
**Command line:**
|
||||
```bash
|
||||
tailscale status
|
||||
```
|
||||
|
||||
### Disconnect/Reconnect
|
||||
|
||||
```bash
|
||||
# Stop Tailscale
|
||||
tailscale down
|
||||
|
||||
# Start Tailscale
|
||||
tailscale up
|
||||
```
|
||||
|
||||
### Remove Device
|
||||
|
||||
From web dashboard:
|
||||
1. Click on device
|
||||
2. Click "..." menu
|
||||
3. Select "Disable" or "Delete"
|
||||
|
||||
---
|
||||
|
||||
## Advantages Over WireGuard
|
||||
|
||||
✅ **Works anywhere** - No UDP ports needed
|
||||
✅ **Auto-reconnect** - Survives network changes
|
||||
✅ **Multiple devices** - Easy to add laptop, phone, etc.
|
||||
✅ **NAT traversal** - Direct peer-to-peer when possible
|
||||
✅ **Access Control** - Manage from web dashboard
|
||||
✅ **Monitoring** - See connection status in real-time
|
||||
|
||||
---
|
||||
|
||||
## Security Notes
|
||||
|
||||
🔒 **Tailscale is secure:**
|
||||
- End-to-end encrypted (WireGuard)
|
||||
- Zero-trust architecture
|
||||
- No Tailscale servers can see your traffic
|
||||
- Only authenticated devices can connect
|
||||
|
||||
🔒 **Access control:**
|
||||
- Only devices you authorize can join
|
||||
- Revoke access anytime from dashboard
|
||||
- Set ACLs for fine-grained control
|
||||
|
||||
---
|
||||
|
||||
## Network Reference (Updated)
|
||||
|
||||
**Old (WireGuard):**
|
||||
- VPS: `10.8.0.1`
|
||||
- GPU: `10.8.0.2`
|
||||
|
||||
**New (Tailscale):**
|
||||
- VPS: `100.101.102.103` (example - use your actual IP)
|
||||
- GPU: `100.104.105.106` (example - use your actual IP)
|
||||
|
||||
**All services now accessible via Tailscale:**
|
||||
|
||||
**From VPS to GPU:**
|
||||
- vLLM: `http://100.104.105.106:8000`
|
||||
- ComfyUI: `http://100.104.105.106:8188`
|
||||
- JupyterLab: `http://100.104.105.106:8888`
|
||||
- Netdata: `http://100.104.105.106:19999`
|
||||
|
||||
**From GPU to VPS:**
|
||||
- PostgreSQL: `100.101.102.103:5432`
|
||||
- Redis: `100.101.102.103:6379`
|
||||
- LiteLLM: `http://100.101.102.103:4000`
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Can't ping between devices
|
||||
|
||||
**Check Tailscale status:**
|
||||
```bash
|
||||
tailscale status
|
||||
```
|
||||
|
||||
Both devices should show "active" or "online".
|
||||
|
||||
**Check connectivity:**
|
||||
```bash
|
||||
tailscale ping 100.104.105.106
|
||||
```
|
||||
|
||||
**Restart Tailscale:**
|
||||
```bash
|
||||
tailscale down && tailscale up
|
||||
```
|
||||
|
||||
### PostgreSQL connection refused
|
||||
|
||||
**Check if postgres is listening on all interfaces:**
|
||||
```bash
|
||||
# On VPS
|
||||
docker exec core_postgres cat /var/lib/postgresql/data/postgresql.conf | grep listen_addresses
|
||||
```
|
||||
|
||||
Should show: `listen_addresses = '*'`
|
||||
|
||||
**Check pg_hba.conf allows Tailscale network:**
|
||||
```bash
|
||||
docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
|
||||
```
|
||||
|
||||
Should have line:
|
||||
```
|
||||
host all all 100.0.0.0/8 scram-sha-256
|
||||
```
|
||||
|
||||
### Device not showing in network
|
||||
|
||||
**Re-authenticate:**
|
||||
```bash
|
||||
tailscale logout
|
||||
tailscale up
|
||||
# Click the new URL to re-authenticate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
Before proceeding:
|
||||
- [ ] Tailscale account created
|
||||
- [ ] Tailscale installed on VPS
|
||||
- [ ] Tailscale installed on GPU server
|
||||
- [ ] Both devices visible in `tailscale status`
|
||||
- [ ] VPS can ping GPU server (via Tailscale IP)
|
||||
- [ ] GPU server can ping VPS (via Tailscale IP)
|
||||
- [ ] PostgreSQL accessible from GPU server
|
||||
- [ ] .env file updated with Tailscale IPs
|
||||
- [ ] LiteLLM config updated with GPU Tailscale IP
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
✅ **Network configured!** Proceed to Docker & GPU setup:
|
||||
|
||||
```bash
|
||||
cat /home/valknar/Projects/docker-compose/ai/DOCKER_GPU_SETUP.md
|
||||
```
|
||||
|
||||
**Your Tailscale IPs (save these!):**
|
||||
- VPS: `__________________` (from `tailscale ip -4` on VPS)
|
||||
- GPU: `__________________` (from `tailscale ip -4` on GPU server)
|
||||
|
||||
---
|
||||
|
||||
## Bonus: Add Your Local Machine
|
||||
|
||||
Want to access GPU server from your laptop?
|
||||
|
||||
```bash
|
||||
# On your local machine
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
tailscale up
|
||||
|
||||
# Now you can SSH directly via Tailscale:
|
||||
ssh root@100.104.105.106
|
||||
|
||||
# Or access ComfyUI in browser:
|
||||
# http://100.104.105.106:8188
|
||||
```
|
||||
|
||||
No more port forwarding needed! 🎉
|
||||
Reference in New Issue
Block a user