# GPU Server Setup Guide - Week 1 ## Day 1-2: RunPod Account & GPU Server ### Step 1: Create RunPod Account 1. **Go to RunPod**: https://www.runpod.io/ 2. **Sign up** with email or GitHub 3. **Add billing method**: - Credit card required - No charges until you deploy a pod - Recommended: Add $50 initial credit 4. **Verify email** and complete account setup ### Step 2: Deploy Your First GPU Pod #### 2.1 Navigate to Pods 1. Click **"Deploy"** in top menu 2. Select **"GPU Pods"** #### 2.2 Choose GPU Type **Recommended: RTX 4090** - 24GB VRAM - ~$0.50/hour - Perfect for LLMs up to 14B params - Great for SDXL/FLUX **Filter options:** - GPU Type: RTX 4090 - GPU Count: 1 - Sort by: Price (lowest first) - Region: Europe (lower latency to Germany) #### 2.3 Select Template Choose: **"RunPod PyTorch"** template - Includes: CUDA, PyTorch, Python - Pre-configured for GPU workloads - Docker pre-installed **Alternative**: "Ubuntu 22.04 with CUDA 12.1" (more control) #### 2.4 Configure Pod **Container Settings:** - **Container Disk**: 50GB (temporary, auto-included) - **Expose Ports**: - Add: 22 (SSH) - Add: 8000 (vLLM) - Add: 8188 (ComfyUI) - Add: 8888 (JupyterLab) **Volume Settings:** - Click **"+ Network Volume"** - **Name**: `gpu-models-storage` - **Size**: 500GB - **Region**: Same as pod - **Cost**: ~$50/month **Environment Variables:** - Add later (not needed for initial setup) #### 2.5 Deploy Pod 1. Review configuration 2. Click **"Deploy On-Demand"** (not Spot for reliability) 3. Wait 2-3 minutes for deployment **Expected cost:** - GPU: $0.50/hour = $360/month (24/7) - Storage: $50/month - **Total: $410/month** ### Step 3: Access Your GPU Server #### 3.1 Get Connection Info Once deployed, you'll see: - **Pod ID**: e.g., `abc123def456` - **SSH Command**: `ssh root@.runpod.io -p 12345` - **Public IP**: May not be directly accessible (use SSH) #### 3.2 SSH Access RunPod automatically generates SSH keys for you: ```bash # Copy the SSH command from RunPod dashboard ssh root@abc123def456.runpod.io -p 12345 # First time: Accept fingerprint # You should now be in the GPU server! ``` **Verify GPU:** ```bash nvidia-smi ``` Expected output: ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 535.xx Driver Version: 535.xx CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 30% 45C P0 50W / 450W | 0MiB / 24564MiB | 0% Default | +-------------------------------+----------------------+----------------------+ ``` ### Step 4: Initial Server Configuration #### 4.1 Update System ```bash # Update package lists apt update # Upgrade existing packages apt upgrade -y # Install essential tools apt install -y \ vim \ htop \ tmux \ curl \ wget \ git \ net-tools \ iptables-persistent ``` #### 4.2 Set Timezone ```bash timedatectl set-timezone Europe/Berlin date # Verify ``` #### 4.3 Create Working Directory ```bash # Create workspace mkdir -p /workspace/{models,configs,data,scripts} # Check network volume mount ls -la /workspace # Should show your 500GB volume ``` #### 4.4 Configure SSH (Optional but Recommended) **Generate your own SSH key on your local machine:** ```bash # On your local machine (not GPU server) ssh-keygen -t ed25519 -C "gpu-server-pivoine" -f ~/.ssh/gpu_pivoine # Copy public key to GPU server ssh-copy-id -i ~/.ssh/gpu_pivoine.pub root@abc123def456.runpod.io -p 12345 ``` **Add to your local ~/.ssh/config:** ```bash Host gpu-pivoine HostName abc123def456.runpod.io Port 12345 User root IdentityFile ~/.ssh/gpu_pivoine ``` Now you can connect with: `ssh gpu-pivoine` ### Step 5: Verify GPU Access Run this test: ```bash # Test CUDA python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count())" ``` Expected output: ``` CUDA available: True GPU count: 1 ``` ### Troubleshooting **Problem: Can't connect via SSH** - Check pod is running (not stopped) - Verify port number in SSH command - Try web terminal in RunPod dashboard **Problem: GPU not detected** - Run `nvidia-smi` - Check RunPod selected correct GPU type - Restart pod if needed **Problem: Network volume not mounted** - Check RunPod dashboard → Volume tab - Verify volume is attached to pod - Try: `df -h` to see mounts ### Next Steps Once SSH access works and GPU is verified: ✅ Proceed to **Day 3-4: Network Configuration (Tailscale VPN)** ### Save Important Info Create a file to track your setup: ```bash # On GPU server cat > /workspace/SERVER_INFO.md << 'EOF' # GPU Server Information ## Connection - SSH: ssh root@abc123def456.runpod.io -p 12345 - Pod ID: abc123def456 - Region: [YOUR_REGION] ## Hardware - GPU: RTX 4090 24GB - CPU: [Check with: lscpu] - RAM: [Check with: free -h] - Storage: 500GB network volume at /workspace ## Costs - GPU: $0.50/hour - Storage: $50/month - Total: ~$410/month (24/7) ## Deployed: [DATE] EOF ``` --- ## Checkpoint ✓ Before moving to Day 3, verify: - [ ] RunPod account created and billing added - [ ] RTX 4090 pod deployed successfully - [ ] 500GB network volume attached - [ ] SSH access working - [ ] `nvidia-smi` shows GPU - [ ] `torch.cuda.is_available()` returns True - [ ] Timezone set to Europe/Berlin - [ ] Essential tools installed **Ready for Tailscale setup? Let's go!**