- Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server - Bypasses multiprocessing issues on RunPod - OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions) - Uses Qwen 2.5 7B Instruct model - Add comprehensive setup guides: - SETUP_GUIDE.md: RunPod account and GPU server setup - TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard) - DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit - README_GPU_SETUP.md: Main documentation hub - Add deployment configurations: - litellm-config-gpu.yaml: LiteLLM config with GPU endpoints - gpu-server-compose.yaml: Docker Compose for GPU services - deploy-gpu-stack.sh: Automated deployment script - Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation - Network: Tailscale IP 100.100.108.13 - Infrastructure: RunPod RTX 4090, 50GB disk - Known issues and troubleshooting guide - Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9.8 KiB
9.8 KiB