|
|
c0b1308ffe
|
feat(ai): add GPU server deployment with vLLM and Tailscale
- Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server
- Bypasses multiprocessing issues on RunPod
- OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions)
- Uses Qwen 2.5 7B Instruct model
- Add comprehensive setup guides:
- SETUP_GUIDE.md: RunPod account and GPU server setup
- TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard)
- DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit
- README_GPU_SETUP.md: Main documentation hub
- Add deployment configurations:
- litellm-config-gpu.yaml: LiteLLM config with GPU endpoints
- gpu-server-compose.yaml: Docker Compose for GPU services
- deploy-gpu-stack.sh: Automated deployment script
- Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation
- Network: Tailscale IP 100.100.108.13
- Infrastructure: RunPod RTX 4090, 50GB disk
- Known issues and troubleshooting guide
- Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-21 12:56:57 +01:00 |
|