Files

Sebastian Krüger 8de88d96ac docs(ai): add comprehensive GPU setup documentation and configs

- Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.)
- Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml)
- Add GPU_DEPLOYMENT_LOG.md with current infrastructure details
- Add GPU_EXPANSION_PLAN.md with complete provider comparison
- Add deploy-gpu-stack.sh automation script

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-21 12:57:06 +01:00

8.0 KiB

Raw Blame History

Tailscale VPN Setup - Better Alternative to WireGuard

Why Tailscale?

RunPod doesn't support UDP ports, which blocks WireGuard. Tailscale solves this by:

✅ Works over HTTPS (TCP) - no UDP needed
✅ Zero configuration - automatic setup
✅ Free for personal use
✅ Built on WireGuard (same security)
✅ Automatic NAT traversal
✅ Peer-to-peer when possible (low latency)

Step 1: Create Tailscale Account

Go to: https://tailscale.com/
Click "Get Started"
Sign up with GitHub or Google (easiest)
You'll be redirected to the Tailscale admin console

No credit card required! Free tier is perfect for our use case.

Step 2: Install Tailscale on VPS

SSH into your VPS:

ssh root@vps

Install Tailscale:

# Download and run install script
curl -fsSL https://tailscale.com/install.sh | sh

# Start Tailscale
tailscale up

# You'll see a URL like:
# https://login.tailscale.com/a/xxxxxxxxxx

Authenticate:

Copy the URL and open in browser
Click "Connect" to authorize the device
Name it: pivoine-vps

Check status:

tailscale status

You should see your VPS listed with an IP like 100.x.x.x

Save your VPS Tailscale IP:

tailscale ip -4
# Example output: 100.101.102.103

Write this down - you'll need it!

Step 3: Install Tailscale on GPU Server

SSH into your RunPod GPU server:

ssh root@abc123def456-12345678.runpod.io -p 12345

Install Tailscale:

# Download and run install script
curl -fsSL https://tailscale.com/install.sh | sh

# Start Tailscale
tailscale up --advertise-tags=tag:gpu

# You'll see another URL

Authenticate:

Copy the URL and open in browser
Click "Connect"
Name it: gpu-runpod

Check status:

tailscale status

You should now see BOTH devices:

pivoine-vps - 100.x.x.x
gpu-runpod - 100.x.x.x

Save your GPU server Tailscale IP:

tailscale ip -4
# Example output: 100.104.105.106

Step 4: Test Connectivity

From VPS, ping GPU server:

# SSH into VPS
ssh root@vps

# Ping GPU server (use its Tailscale IP)
ping 100.104.105.106 -c 4

Expected output:

PING 100.104.105.106 (100.104.105.106) 56(84) bytes of data.
64 bytes from 100.104.105.106: icmp_seq=1 ttl=64 time=15.3 ms
64 bytes from 100.104.105.106: icmp_seq=2 ttl=64 time=14.8 ms
...

From GPU server, ping VPS:

# SSH into GPU server
ssh root@abc123def456-12345678.runpod.io -p 12345

# Ping VPS (use its Tailscale IP)
ping 100.101.102.103 -c 4

Both should work! ✅

Step 5: Update Configuration Files

Now update the IP addresses in your configs to use Tailscale IPs.

On GPU Server (.env file)

Edit your .env file:

# On GPU server
cd /workspace/gpu-stack

nano .env

Update these lines:

# VPN Network (use your actual Tailscale IPs)
VPS_IP=100.101.102.103      # Your VPS Tailscale IP
GPU_IP=100.104.105.106      # Your GPU Tailscale IP

# PostgreSQL (on VPS)
DB_HOST=100.101.102.103     # Your VPS Tailscale IP
DB_PORT=5432

Save and exit (Ctrl+X, Y, Enter)

On VPS (LiteLLM config)

Edit your LiteLLM config:

# On VPS
ssh root@vps
cd ~/Projects/docker-compose/ai

nano litellm-config-gpu.yaml

Update the GPU server IP:

# Find this section and update IP:
  - model_name: llama-3.1-8b
    litellm_params:
      model: openai/meta-llama/Meta-Llama-3.1-8B-Instruct
      api_base: http://100.104.105.106:8000/v1  # Use GPU Tailscale IP
      api_key: dummy

Save and exit.

Step 6: Verify PostgreSQL Access

From GPU server, test database connection:

# Install PostgreSQL client
apt install -y postgresql-client

# Test connection (use your VPS Tailscale IP)
psql -h 100.101.102.103 -U valknar -d openwebui -c "SELECT 1;"

If this fails, allow Tailscale network on VPS PostgreSQL:

# On VPS
ssh root@vps

# Check if postgres allows Tailscale network
docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100

# If not present, add it:
docker exec -it core_postgres bash

# Inside container:
echo "host    all             all             100.0.0.0/8             scram-sha-256" >> /var/lib/postgresql/data/pg_hba.conf

# Restart postgres
exit
docker restart core_postgres

Try connecting again - should work now!

Tailscale Management

View Connected Devices

Web dashboard: https://login.tailscale.com/admin/machines

You'll see all your devices with their Tailscale IPs.

Command line:

tailscale status

Disconnect/Reconnect

# Stop Tailscale
tailscale down

# Start Tailscale
tailscale up

Remove Device

From web dashboard:

Click on device
Click "..." menu
Select "Disable" or "Delete"

Advantages Over WireGuard

✅ Works anywhere - No UDP ports needed ✅ Auto-reconnect - Survives network changes ✅ Multiple devices - Easy to add laptop, phone, etc. ✅ NAT traversal - Direct peer-to-peer when possible ✅ Access Control - Manage from web dashboard ✅ Monitoring - See connection status in real-time

Security Notes

🔒 Tailscale is secure:

End-to-end encrypted (WireGuard)
Zero-trust architecture
No Tailscale servers can see your traffic
Only authenticated devices can connect

🔒 Access control:

Only devices you authorize can join
Revoke access anytime from dashboard
Set ACLs for fine-grained control

Network Reference (Updated)

Old (WireGuard):

VPS: 10.8.0.1
GPU: 10.8.0.2

New (Tailscale):

VPS: 100.101.102.103 (example - use your actual IP)
GPU: 100.104.105.106 (example - use your actual IP)

All services now accessible via Tailscale:

From VPS to GPU:

vLLM: http://100.104.105.106:8000
ComfyUI: http://100.104.105.106:8188
JupyterLab: http://100.104.105.106:8888
Netdata: http://100.104.105.106:19999

From GPU to VPS:

PostgreSQL: 100.101.102.103:5432
Redis: 100.101.102.103:6379
LiteLLM: http://100.101.102.103:4000

Troubleshooting

Can't ping between devices

Check Tailscale status:

tailscale status

Both devices should show "active" or "online".

Check connectivity:

tailscale ping 100.104.105.106

Restart Tailscale:

tailscale down && tailscale up

PostgreSQL connection refused

Check if postgres is listening on all interfaces:

# On VPS
docker exec core_postgres cat /var/lib/postgresql/data/postgresql.conf | grep listen_addresses

Should show: listen_addresses = '*'

Check pg_hba.conf allows Tailscale network:

docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100

Should have line:

host    all             all             100.0.0.0/8             scram-sha-256

Device not showing in network

Re-authenticate:

tailscale logout
tailscale up
# Click the new URL to re-authenticate

Verification Checklist

Before proceeding:

Tailscale account created
Tailscale installed on VPS
Tailscale installed on GPU server
Both devices visible in tailscale status
VPS can ping GPU server (via Tailscale IP)
GPU server can ping VPS (via Tailscale IP)
PostgreSQL accessible from GPU server
.env file updated with Tailscale IPs
LiteLLM config updated with GPU Tailscale IP

Next Steps

✅ Network configured! Proceed to Docker & GPU setup:

cat /home/valknar/Projects/docker-compose/ai/DOCKER_GPU_SETUP.md

Your Tailscale IPs (save these!):

VPS: __________________ (from tailscale ip -4 on VPS)
GPU: __________________ (from tailscale ip -4 on GPU server)

Bonus: Add Your Local Machine

Want to access GPU server from your laptop?

# On your local machine
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up

# Now you can SSH directly via Tailscale:
ssh root@100.104.105.106

# Or access ComfyUI in browser:
# http://100.104.105.106:8188

No more port forwarding needed! 🎉

8.0 KiB Raw Blame History