Sebastian Krüger
c0b1308ffe
feat(ai): add GPU server deployment with vLLM and Tailscale
- Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server
- Bypasses multiprocessing issues on RunPod
- OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions)
- Uses Qwen 2.5 7B Instruct model
- Add comprehensive setup guides:
- SETUP_GUIDE.md: RunPod account and GPU server setup
- TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard)
- DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit
- README_GPU_SETUP.md: Main documentation hub
- Add deployment configurations:
- litellm-config-gpu.yaml: LiteLLM config with GPU endpoints
- gpu-server-compose.yaml: Docker Compose for GPU services
- deploy-gpu-stack.sh: Automated deployment script
- Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation
- Network: Tailscale IP 100.100.108.13
- Infrastructure: RunPod RTX 4090, 50GB disk
- Known issues and troubleshooting guide
- Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 12:56:57 +01:00
..
2025-11-09 18:36:50 +01:00
2025-11-16 16:03:19 +01:00
2025-11-13 04:28:50 +00:00
2025-11-13 06:16:14 +01:00
2025-11-13 05:52:13 +01:00
2025-11-16 18:53:44 +01:00
2025-11-21 12:56:57 +01:00