- Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.6 KiB
GPU Server Setup Guide - Week 1
Day 1-2: RunPod Account & GPU Server
Step 1: Create RunPod Account
-
Go to RunPod: https://www.runpod.io/
-
Sign up with email or GitHub
-
Add billing method:
- Credit card required
- No charges until you deploy a pod
- Recommended: Add $50 initial credit
-
Verify email and complete account setup
Step 2: Deploy Your First GPU Pod
2.1 Navigate to Pods
- Click "Deploy" in top menu
- Select "GPU Pods"
2.2 Choose GPU Type
Recommended: RTX 4090
- 24GB VRAM
- ~$0.50/hour
- Perfect for LLMs up to 14B params
- Great for SDXL/FLUX
Filter options:
- GPU Type: RTX 4090
- GPU Count: 1
- Sort by: Price (lowest first)
- Region: Europe (lower latency to Germany)
2.3 Select Template
Choose: "RunPod PyTorch" template
- Includes: CUDA, PyTorch, Python
- Pre-configured for GPU workloads
- Docker pre-installed
Alternative: "Ubuntu 22.04 with CUDA 12.1" (more control)
2.4 Configure Pod
Container Settings:
- Container Disk: 50GB (temporary, auto-included)
- Expose Ports:
- Add: 22 (SSH)
- Add: 8000 (vLLM)
- Add: 8188 (ComfyUI)
- Add: 8888 (JupyterLab)
Volume Settings:
- Click "+ Network Volume"
- Name:
gpu-models-storage - Size: 500GB
- Region: Same as pod
- Cost: ~$50/month
Environment Variables:
- Add later (not needed for initial setup)
2.5 Deploy Pod
- Review configuration
- Click "Deploy On-Demand" (not Spot for reliability)
- Wait 2-3 minutes for deployment
Expected cost:
- GPU: $0.50/hour = $360/month (24/7)
- Storage: $50/month
- Total: $410/month
Step 3: Access Your GPU Server
3.1 Get Connection Info
Once deployed, you'll see:
- Pod ID: e.g.,
abc123def456 - SSH Command:
ssh root@<pod-id>.runpod.io -p 12345 - Public IP: May not be directly accessible (use SSH)
3.2 SSH Access
RunPod automatically generates SSH keys for you:
# Copy the SSH command from RunPod dashboard
ssh root@abc123def456.runpod.io -p 12345
# First time: Accept fingerprint
# You should now be in the GPU server!
Verify GPU:
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx Driver Version: 535.xx CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 30% 45C P0 50W / 450W | 0MiB / 24564MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Step 4: Initial Server Configuration
4.1 Update System
# Update package lists
apt update
# Upgrade existing packages
apt upgrade -y
# Install essential tools
apt install -y \
vim \
htop \
tmux \
curl \
wget \
git \
net-tools \
iptables-persistent
4.2 Set Timezone
timedatectl set-timezone Europe/Berlin
date # Verify
4.3 Create Working Directory
# Create workspace
mkdir -p /workspace/{models,configs,data,scripts}
# Check network volume mount
ls -la /workspace
# Should show your 500GB volume
4.4 Configure SSH (Optional but Recommended)
Generate your own SSH key on your local machine:
# On your local machine (not GPU server)
ssh-keygen -t ed25519 -C "gpu-server-pivoine" -f ~/.ssh/gpu_pivoine
# Copy public key to GPU server
ssh-copy-id -i ~/.ssh/gpu_pivoine.pub root@abc123def456.runpod.io -p 12345
Add to your local ~/.ssh/config:
Host gpu-pivoine
HostName abc123def456.runpod.io
Port 12345
User root
IdentityFile ~/.ssh/gpu_pivoine
Now you can connect with: ssh gpu-pivoine
Step 5: Verify GPU Access
Run this test:
# Test CUDA
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count())"
Expected output:
CUDA available: True
GPU count: 1
Troubleshooting
Problem: Can't connect via SSH
- Check pod is running (not stopped)
- Verify port number in SSH command
- Try web terminal in RunPod dashboard
Problem: GPU not detected
- Run
nvidia-smi - Check RunPod selected correct GPU type
- Restart pod if needed
Problem: Network volume not mounted
- Check RunPod dashboard → Volume tab
- Verify volume is attached to pod
- Try:
df -hto see mounts
Next Steps
Once SSH access works and GPU is verified: ✅ Proceed to Day 3-4: Network Configuration (Tailscale VPN)
Save Important Info
Create a file to track your setup:
# On GPU server
cat > /workspace/SERVER_INFO.md << 'EOF'
# GPU Server Information
## Connection
- SSH: ssh root@abc123def456.runpod.io -p 12345
- Pod ID: abc123def456
- Region: [YOUR_REGION]
## Hardware
- GPU: RTX 4090 24GB
- CPU: [Check with: lscpu]
- RAM: [Check with: free -h]
- Storage: 500GB network volume at /workspace
## Costs
- GPU: $0.50/hour
- Storage: $50/month
- Total: ~$410/month (24/7)
## Deployed: [DATE]
EOF
Checkpoint ✓
Before moving to Day 3, verify:
- RunPod account created and billing added
- RTX 4090 pod deployed successfully
- 500GB network volume attached
- SSH access working
nvidia-smishows GPUtorch.cuda.is_available()returns True- Timezone set to Europe/Berlin
- Essential tools installed
Ready for Tailscale setup? Let's go!