ai/SETUP_GUIDE.md

# GPU Server Setup Guide - Week 1

## Day 1-2: RunPod Account & GPU Server

### Step 1: Create RunPod Account

1. **Go to RunPod**: https://www.runpod.io/
2. **Sign up** with email or GitHub
3. **Add billing method**:
   - Credit card required
   - No charges until you deploy a pod
   - Recommended: Add $50 initial credit

4. **Verify email** and complete account setup

### Step 2: Deploy Your First GPU Pod

#### 2.1 Navigate to Pods

1. Click **"Deploy"** in top menu
2. Select **"GPU Pods"**

#### 2.2 Choose GPU Type

**Recommended: RTX 4090**
- 24GB VRAM
- ~$0.50/hour
- Perfect for LLMs up to 14B params
- Great for SDXL/FLUX

**Filter options:**
- GPU Type: RTX 4090
- GPU Count: 1
- Sort by: Price (lowest first)
- Region: Europe (lower latency to Germany)

#### 2.3 Select Template

Choose: **"RunPod PyTorch"** template
- Includes: CUDA, PyTorch, Python
- Pre-configured for GPU workloads
- Docker pre-installed

**Alternative**: "Ubuntu 22.04 with CUDA 12.1" (more control)

#### 2.4 Configure Pod

**Container Settings:**
- **Container Disk**: 50GB (temporary, auto-included)
- **Expose Ports**:
  - Add: 22 (SSH)
  - Add: 8000 (vLLM)
  - Add: 8188 (ComfyUI)
  - Add: 8888 (JupyterLab)

**Volume Settings:**
- Click **"+ Network Volume"**
- **Name**: `gpu-models-storage`
- **Size**: 500GB
- **Region**: Same as pod
- **Cost**: ~$50/month

**Environment Variables:**
- Add later (not needed for initial setup)

#### 2.5 Deploy Pod

1. Review configuration
2. Click **"Deploy On-Demand"** (not Spot for reliability)
3. Wait 2-3 minutes for deployment

**Expected cost:**
- GPU: $0.50/hour = $360/month (24/7)
- Storage: $50/month
- **Total: $410/month**

### Step 3: Access Your GPU Server

#### 3.1 Get Connection Info

Once deployed, you'll see:
- **Pod ID**: e.g., `abc123def456`
- **SSH Command**: `ssh root@<pod-id>.runpod.io -p 12345`
- **Public IP**: May not be directly accessible (use SSH)

#### 3.2 SSH Access

RunPod automatically generates SSH keys for you:

```bash
# Copy the SSH command from RunPod dashboard
ssh root@abc123def456.runpod.io -p 12345

# First time: Accept fingerprint
# You should now be in the GPU server!
```

**Verify GPU:**
```bash
nvidia-smi
```

Expected output:
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx       Driver Version: 535.xx       CUDA Version: 12.1    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   45C    P0    50W / 450W |      0MiB / 24564MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
```

### Step 4: Initial Server Configuration

#### 4.1 Update System

```bash
# Update package lists
apt update

# Upgrade existing packages
apt upgrade -y

# Install essential tools
apt install -y \
  vim \
  htop \
  tmux \
  curl \
  wget \
  git \
  net-tools \
  iptables-persistent
```

#### 4.2 Set Timezone

```bash
timedatectl set-timezone Europe/Berlin
date  # Verify
```

#### 4.3 Create Working Directory

```bash
# Create workspace
mkdir -p /workspace/{models,configs,data,scripts}

# Check network volume mount
ls -la /workspace
# Should show your 500GB volume
```

#### 4.4 Configure SSH (Optional but Recommended)

**Generate your own SSH key on your local machine:**

```bash
# On your local machine (not GPU server)
ssh-keygen -t ed25519 -C "gpu-server-pivoine" -f ~/.ssh/gpu_pivoine

# Copy public key to GPU server
ssh-copy-id -i ~/.ssh/gpu_pivoine.pub root@abc123def456.runpod.io -p 12345
```

**Add to your local ~/.ssh/config:**

```bash
Host gpu-pivoine
    HostName abc123def456.runpod.io
    Port 12345
    User root
    IdentityFile ~/.ssh/gpu_pivoine
```

Now you can connect with: `ssh gpu-pivoine`

### Step 5: Verify GPU Access

Run this test:

```bash
# Test CUDA
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count())"
```

Expected output:
```
CUDA available: True
GPU count: 1
```

### Troubleshooting

**Problem: Can't connect via SSH**
- Check pod is running (not stopped)
- Verify port number in SSH command
- Try web terminal in RunPod dashboard

**Problem: GPU not detected**
- Run `nvidia-smi`
- Check RunPod selected correct GPU type
- Restart pod if needed

**Problem: Network volume not mounted**
- Check RunPod dashboard → Volume tab
- Verify volume is attached to pod
- Try: `df -h` to see mounts

### Next Steps

Once SSH access works and GPU is verified:
✅ Proceed to **Day 3-4: Network Configuration (Tailscale VPN)**

### Save Important Info

Create a file to track your setup:

```bash
# On GPU server
cat > /workspace/SERVER_INFO.md << 'EOF'
# GPU Server Information

## Connection
- SSH: ssh root@abc123def456.runpod.io -p 12345
- Pod ID: abc123def456
- Region: [YOUR_REGION]

## Hardware
- GPU: RTX 4090 24GB
- CPU: [Check with: lscpu]
- RAM: [Check with: free -h]
- Storage: 500GB network volume at /workspace

## Costs
- GPU: $0.50/hour
- Storage: $50/month
- Total: ~$410/month (24/7)

## Deployed: [DATE]
EOF
```

---

## Checkpoint ✓

Before moving to Day 3, verify:
- [ ] RunPod account created and billing added
- [ ] RTX 4090 pod deployed successfully
- [ ] 500GB network volume attached
- [ ] SSH access working
- [ ] `nvidia-smi` shows GPU
- [ ] `torch.cuda.is_available()` returns True
- [ ] Timezone set to Europe/Berlin
- [ ] Essential tools installed

**Ready for Tailscale setup? Let's go!**
docs(ai): add comprehensive GPU setup documentation and configs - Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-21 12:57:06 +01:00			`# GPU Server Setup Guide - Week 1`

			`## Day 1-2: RunPod Account & GPU Server`

			`### Step 1: Create RunPod Account`

			`1. Go to RunPod: https://www.runpod.io/`
			`2. Sign up with email or GitHub`
			`3. Add billing method:`
			`- Credit card required`
			`- No charges until you deploy a pod`
			`- Recommended: Add $50 initial credit`

			`4. Verify email and complete account setup`

			`### Step 2: Deploy Your First GPU Pod`

			`#### 2.1 Navigate to Pods`

			`1. Click "Deploy" in top menu`
			`2. Select "GPU Pods"`

			`#### 2.2 Choose GPU Type`

			`Recommended: RTX 4090`
			`- 24GB VRAM`
			`- ~$0.50/hour`
			`- Perfect for LLMs up to 14B params`
			`- Great for SDXL/FLUX`

			`Filter options:`
			`- GPU Type: RTX 4090`
			`- GPU Count: 1`
			`- Sort by: Price (lowest first)`
			`- Region: Europe (lower latency to Germany)`

			`#### 2.3 Select Template`

			`Choose: "RunPod PyTorch" template`
			`- Includes: CUDA, PyTorch, Python`
			`- Pre-configured for GPU workloads`
			`- Docker pre-installed`

			`Alternative: "Ubuntu 22.04 with CUDA 12.1" (more control)`

			`#### 2.4 Configure Pod`

			`Container Settings:`
			`- Container Disk: 50GB (temporary, auto-included)`
			`- Expose Ports:`
			`- Add: 22 (SSH)`
			`- Add: 8000 (vLLM)`
			`- Add: 8188 (ComfyUI)`
			`- Add: 8888 (JupyterLab)`

			`Volume Settings:`
			`- Click "+ Network Volume"`
			- Name: `gpu-models-storage`
			`- Size: 500GB`
			`- Region: Same as pod`
			`- Cost: ~$50/month`

			`Environment Variables:`
			`- Add later (not needed for initial setup)`

			`#### 2.5 Deploy Pod`

			`1. Review configuration`
			`2. Click "Deploy On-Demand" (not Spot for reliability)`
			`3. Wait 2-3 minutes for deployment`

			`Expected cost:`
			`- GPU: $0.50/hour = $360/month (24/7)`
			`- Storage: $50/month`
			`- Total: $410/month`

			`### Step 3: Access Your GPU Server`

			`#### 3.1 Get Connection Info`

			`Once deployed, you'll see:`
			- Pod ID: e.g., `abc123def456`
			- SSH Command: `ssh root@<pod-id>.runpod.io -p 12345`
			`- Public IP: May not be directly accessible (use SSH)`

			`#### 3.2 SSH Access`

			`RunPod automatically generates SSH keys for you:`

			```bash
			`# Copy the SSH command from RunPod dashboard`
			`ssh root@abc123def456.runpod.io -p 12345`

			`# First time: Accept fingerprint`
			`# You should now be in the GPU server!`
			```

			`Verify GPU:`
			```bash
			`nvidia-smi`
			```

			`Expected output:`
			```
			`+-----------------------------------------------------------------------------+`
			`\| NVIDIA-SMI 535.xx Driver Version: 535.xx CUDA Version: 12.1 \|`
			`\|-------------------------------+----------------------+----------------------+`
			`\| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \|`
			`\| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \|`
			`\|===============================+======================+======================\|`
			`\| 0 NVIDIA GeForce ... Off \| 00000000:01:00.0 Off \| N/A \|`
			`\| 30% 45C P0 50W / 450W \| 0MiB / 24564MiB \| 0% Default \|`
			`+-------------------------------+----------------------+----------------------+`
			```

			`### Step 4: Initial Server Configuration`

			`#### 4.1 Update System`

			```bash
			`# Update package lists`
			`apt update`

			`# Upgrade existing packages`
			`apt upgrade -y`

			`# Install essential tools`
			`apt install -y \`
			`vim \`
			`htop \`
			`tmux \`
			`curl \`
			`wget \`
			`git \`
			`net-tools \`
			`iptables-persistent`
			```

			`#### 4.2 Set Timezone`

			```bash
			`timedatectl set-timezone Europe/Berlin`
			`date # Verify`
			```

			`#### 4.3 Create Working Directory`

			```bash
			`# Create workspace`
			`mkdir -p /workspace/{models,configs,data,scripts}`

			`# Check network volume mount`
			`ls -la /workspace`
			`# Should show your 500GB volume`
			```

			`#### 4.4 Configure SSH (Optional but Recommended)`

			`Generate your own SSH key on your local machine:`

			```bash
			`# On your local machine (not GPU server)`
			`ssh-keygen -t ed25519 -C "gpu-server-pivoine" -f ~/.ssh/gpu_pivoine`

			`# Copy public key to GPU server`
			`ssh-copy-id -i ~/.ssh/gpu_pivoine.pub root@abc123def456.runpod.io -p 12345`
			```

			`Add to your local ~/.ssh/config:`

			```bash
			`Host gpu-pivoine`
			`HostName abc123def456.runpod.io`
			`Port 12345`
			`User root`
			`IdentityFile ~/.ssh/gpu_pivoine`
			```

			Now you can connect with: `ssh gpu-pivoine`

			`### Step 5: Verify GPU Access`

			`Run this test:`

			```bash
			`# Test CUDA`
			`python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count())"`
			```

			`Expected output:`
			```
			`CUDA available: True`
			`GPU count: 1`
			```

			`### Troubleshooting`

			`Problem: Can't connect via SSH`
			`- Check pod is running (not stopped)`
			`- Verify port number in SSH command`
			`- Try web terminal in RunPod dashboard`

			`Problem: GPU not detected`
			- Run `nvidia-smi`
			`- Check RunPod selected correct GPU type`
			`- Restart pod if needed`

			`Problem: Network volume not mounted`
			`- Check RunPod dashboard → Volume tab`
			`- Verify volume is attached to pod`
			- Try: `df -h` to see mounts

			`### Next Steps`

			`Once SSH access works and GPU is verified:`
			`✅ Proceed to Day 3-4: Network Configuration (Tailscale VPN)`

			`### Save Important Info`

			`Create a file to track your setup:`

			```bash
			`# On GPU server`
			`cat > /workspace/SERVER_INFO.md << 'EOF'`
			`# GPU Server Information`

			`## Connection`
			`- SSH: ssh root@abc123def456.runpod.io -p 12345`
			`- Pod ID: abc123def456`
			`- Region: [YOUR_REGION]`

			`## Hardware`
			`- GPU: RTX 4090 24GB`
			`- CPU: [Check with: lscpu]`
			`- RAM: [Check with: free -h]`
			`- Storage: 500GB network volume at /workspace`

			`## Costs`
			`- GPU: $0.50/hour`
			`- Storage: $50/month`
			`- Total: ~$410/month (24/7)`

			`## Deployed: [DATE]`
			`EOF`
			```

			`---`

			`## Checkpoint ✓`

			`Before moving to Day 3, verify:`
			`- [ ] RunPod account created and billing added`
			`- [ ] RTX 4090 pod deployed successfully`
			`- [ ] 500GB network volume attached`
			`- [ ] SSH access working`
			- [ ] `nvidia-smi` shows GPU
			- [ ] `torch.cuda.is_available()` returns True
			`- [ ] Timezone set to Europe/Berlin`
			`- [ ] Essential tools installed`

			`Ready for Tailscale setup? Let's go!`