docker-compose

Author	SHA1	Message	Date
Sebastian Krüger	5d232c7d9b	feat: audiocraft	2025-11-26 22:54:10 +01:00
Sebastian Krüger	cef233b678	chore: remove qwen	2025-11-26 21:03:43 +01:00
Sebastian Krüger	b63ddbffbd	fix(ai): correct bge embedding model name to hosted_vllm/openai prefix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-25 06:44:33 +01:00
Sebastian Krüger	d57a1241d2	feat(ai): add bge-large-en-v1.5 embedding model to litellm - Add BGE embedding model config (port 8002) to litellm-config.yaml - Add GPU_VLLM_EMBED_URL env var to compose and .env 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-25 06:40:36 +01:00
Sebastian Krüger	ef0309838c	refactor(ai): remove crawl4ai service, add backrest config to repo - Remove crawl4ai service from ai/compose.yaml (will use local MCP instead) - Remove crawl4ai backup volume from core/compose.yaml - Add core/backrest/config.json (infrastructure as code) - Change backrest from volume to bind-mounted config - Update CLAUDE.md and README.md documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-25 06:20:22 +01:00
Sebastian Krüger	071a74a996	revert(ai): remove SUPERVISOR_LOGFILE env var from supervisor-ui Supervisor XML-RPC API v3.0 (Supervisor 4.3.0) only supports 2-parameter readLog(offset, length) calls, not 3-parameter calls with filename. The SUPERVISOR_LOGFILE environment variable is not used by the API. Testing showed: - Working: server.supervisor.readLog(-4096, 0) - Failing: server.supervisor.readLog(-4096, 4096, '/path/to/log') 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 23:01:10 +01:00
Sebastian Krüger	74b3748b23	feat(ai): add SUPERVISOR_LOGFILE env var to supervisor-ui for RunPod logs Configure supervisor-ui to use correct logfile path (/workspace/logs/supervisord.log) for RunPod Supervisor instance. Fixes logs page error on https://supervisor.ai.pivoine.art/logs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 22:49:33 +01:00
Sebastian Krüger	87216ab26a	fix: remove healthcheck from supervisor-ui service	2025-11-23 20:38:37 +01:00
Sebastian Krüger	9e2b19e7f6	feat: replace nginx supervisor proxy with modern supervisor-ui - Replaced nginx:alpine proxy with dev.pivoine.art/valknar/supervisor-ui:latest - Modern Next.js UI with real-time SSE updates, batch operations, and charts - Changed service port from 80 (nginx) to 3000 (Next.js) - Removed supervisor-nginx.conf (no longer needed) - Kept same URL (supervisor.ai.pivoine.art) and Authelia SSO protection - Added health check for /api/health endpoint - Service connects to RunPod Supervisor via Tailscale (SUPERVISOR_HOST/PORT)	2025-11-23 20:18:29 +01:00
Sebastian Krüger	a80c6b931b	fix: update compose.yaml to use new GPU_VLLM URLs	2025-11-23 16:22:54 +01:00
Sebastian Krüger	64c02228d8	fix: use EMPTY api_key for vLLM servers	2025-11-23 16:17:27 +01:00
Sebastian Krüger	55d9bef18a	fix: remove api_key from vLLM config to fix authentication error vLLM servers don't validate API keys, so LiteLLM shouldn't pass them 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 16:16:37 +01:00
Sebastian Krüger	7fc945e179	fix: update LiteLLM config for direct vLLM server access - Replace orchestrator routing with direct vLLM server connections - Qwen 2.5 7B on port 8000 (GPU_VLLM_QWEN_URL) - Llama 3.1 8B on port 8001 (GPU_VLLM_LLAMA_URL) - Simplify architecture by removing orchestrator proxy layer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 16:10:20 +01:00
Sebastian Krüger	94ab4ae6dd	feat: enable system message support for qwen-2.5-7b	2025-11-23 14:36:34 +01:00
Sebastian Krüger	779e76974d	fix: use complete URL env var for vLLM API base - Replace GPU_TAILSCALE_IP interpolation with GPU_VLLM_API_URL - LiteLLM requires full URL in api_base with os.environ/ syntax 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 13:17:37 +01:00
Sebastian Krüger	f3f32c163f	feat: consolidate GPU IP with single GPU_TAILSCALE_IP variable - Replace COMFYUI_BACKEND_HOST and SUPERVISOR_BACKEND_HOST with GPU_TAILSCALE_IP - Update LiteLLM config to use os.environ/GPU_TAILSCALE_IP for vLLM models - Add GPU_TAILSCALE_IP env var to LiteLLM service - Configure qwen-2.5-7b and llama-3.1-8b to route through orchestrator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 13:05:33 +01:00
Sebastian Krüger	e00e959543	Update backend IPs for ComfyUI and Supervisor proxies - Remove hardcoded default values from compose.yaml - Backend IPs now managed via environment variables only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 02:11:19 +01:00
Sebastian Krüger	0fd2eacad1	feat: add Supervisor proxy with Authelia SSO Add nginx reverse proxy service for Supervisor web UI at supervisor.ai.pivoine.art with Authelia authentication. Proxies to RunPod GPU instance via Tailscale (100.121.199.88:9001). Changes: - Create supervisor-nginx.conf for nginx proxy configuration - Add supervisor service to docker-compose with Traefik labels - Add supervisor.ai.pivoine.art to Authelia protected domains - Remove deprecated Flux-related files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 13:19:02 +01:00
Sebastian Krüger	bf402adb25	Add Llama 3.1 8B model to LiteLLM configuration	2025-11-21 21:30:18 +01:00
Sebastian Krüger	ae1c349b55	feat: make ComfyUI backend IP/port configurable via environment variables - Replace hardcoded IP in comfyui-nginx.conf with env vars - Add COMFYUI_BACKEND_HOST and COMFYUI_BACKEND_PORT to compose.yaml - Use envsubst to substitute variables at container startup - Defaults: 100.121.199.88:8188 (current RunPod Tailscale IP)	2025-11-21 21:24:51 +01:00
Sebastian Krüger	66d8c82e47	Remove Flux and MusicGen models from LiteLLM config ComfyUI now handles Flux image generation directly. MusicGen is not being used and has been removed.	2025-11-21 21:11:29 +01:00
Sebastian Krüger	904f7d3c2e	feat(ai): add ComfyUI proxy service with Authelia SSO - Add ComfyUI service to AI stack using nginx:alpine as reverse proxy - Proxy to RunPod ComfyUI via Tailscale (100.121.199.88:8188) - Configure Traefik routing for comfy.ai.pivoine.art - Enable Authelia SSO middleware (net-authelia) - Support WebSocket connections for real-time updates - Set appropriate timeouts for image generation (300s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 20:56:20 +01:00
Sebastian Krüger	9a964cff3c	feat: add Flux image generation function for Open WebUI - Add flux_image_gen.py manifold function for Flux.1 Schnell - Auto-mount functions via Docker volume (./functions:/app/backend/data/functions:ro) - Add comprehensive setup guide in FLUX_SETUP.md - Update CLAUDE.md with Flux integration documentation - Infrastructure as code approach - no manual import needed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 20:20:33 +01:00
Sebastian Krüger	0999e5d29f	feat: re-enable Redis caching in LiteLLM now that streaming is fixed	2025-11-21 19:40:57 +01:00
Sebastian Krüger	ec903c16c2	fix: use hosted_vllm/openai/ prefix for vLLM model via orchestrator	2025-11-21 19:18:33 +01:00
Sebastian Krüger	155016da97	debug: enable DEBUG logging for LiteLLM to troubleshoot streaming	2025-11-21 19:10:00 +01:00
Sebastian Krüger	c81f312e9e	fix: use correct vLLM model ID from /v1/models endpoint	2025-11-21 19:06:56 +01:00
Sebastian Krüger	fe0cf487ee	fix: use correct vLLM model name with hosted_vllm prefix	2025-11-21 19:02:44 +01:00
Sebastian Krüger	81d4058c5d	revert: back to openai prefix for vLLM OpenAI-compatible endpoint	2025-11-21 18:57:10 +01:00
Sebastian Krüger	4a575bc0da	fix: use hosted_vllm prefix instead of openai for vLLM streaming compatibility	2025-11-21 18:54:40 +01:00
Sebastian Krüger	01a345979b	fix: disable drop_params to preserve streaming metadata in LiteLLM - Set drop_params: false in litellm_settings - Set modify_params: false in litellm_settings - Set drop_params: false in default_litellm_params - Commented out LITELLM_DROP_PARAMS env var - Removed --drop_params command flag These settings were stripping critical streaming parameters causing vLLM streaming responses to collapse into empty deltas	2025-11-21 18:46:33 +01:00
Sebastian Krüger	c58b5d36ba	revert: remove direct WebUI connection, focus on fixing LiteLLM streaming - Reverted direct orchestrator connection to WebUI - Added stream: true parameter to qwen-2.5-7b model config - Keep LiteLLM as single proxy for all models	2025-11-21 18:42:46 +01:00
Sebastian Krüger	62fcf832da	feat: add direct RunPod orchestrator connection to WebUI for streaming bypass - Configure WebUI with both LiteLLM and direct orchestrator API base URLs - This bypasses LiteLLM's streaming issues for the qwen-2.5-7b model - WebUI will now show models from both endpoints - Allows testing if LiteLLM is the bottleneck for streaming Related to streaming fix in RunPod models/vllm/server.py	2025-11-21 18:38:31 +01:00
Sebastian Krüger	dfde1df72f	fix: add /v1 suffix to vLLM api_base for proper endpoint routing	2025-11-21 18:00:53 +01:00
Sebastian Krüger	42a68bc0b5	fix: revert to openai prefix, remove /v1 suffix from api_base - Changed back from hosted_vllm/qwen-2.5-7b to openai/qwen-2.5-7b - Removed /v1 suffix from api_base (LiteLLM adds it automatically) - Added supports_system_messages: false for vLLM compatibility	2025-11-21 17:55:10 +01:00
Sebastian Krüger	699c8537b0	fix: use LiteLLM vLLM pass-through for qwen model - Changed model from openai/qwen-2.5-7b to hosted_vllm/qwen-2.5-7b - Implements proper vLLM integration per LiteLLM docs - Fixes streaming response forwarding issue	2025-11-21 17:52:34 +01:00
Sebastian Krüger	ed4d537499	Enable verbose logging in LiteLLM for streaming debug	2025-11-21 17:43:34 +01:00
Sebastian Krüger	103bbbad51	debug: enable INFO logging in LiteLLM for troubleshooting Enable detailed logging to debug qwen model requests from WebUI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 17:13:38 +01:00
Sebastian Krüger	92a7436716	fix(ai): add 600s timeout for qwen model requests via Tailscale	2025-11-21 17:06:01 +01:00
Sebastian Krüger	6aea9d018e	feat(ai): disable Ollama API in WebUI, use LiteLLM only	2025-11-21 16:57:20 +01:00
Sebastian Krüger	e2e0927291	feat: update LiteLLM to use RunPod GPU via Tailscale - Update api_base URLs from 100.100.108.13 to 100.121.199.88 (RunPod Tailscale IP) - All self-hosted models (qwen-2.5-7b, flux-schnell, musicgen-medium) now route through Tailscale VPN - Tested and verified connectivity between VPS and RunPod GPU orchestrator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 16:42:27 +01:00
Sebastian Krüger	a5ed2be933	docs: remove outdated ai/README.md Removed outdated AI infrastructure README that referenced GPU services. VPS AI services (Open WebUI, Crawl4AI, facefusion) are documented in compose.yaml comments. GPU infrastructure docs are now in dedicated runpod repository.	2025-11-21 14:42:23 +01:00
Sebastian Krüger	d5e37dbd3f	cleanup: remove GPU/RunPod files from docker-compose repository Removed GPU orchestration files migrated to dedicated runpod repository: - Model orchestrator, vLLM, Flux, MusicGen services - GPU Docker Compose files and configs - GPU deployment scripts and documentation Kept VPS AI services and facefusion: - compose.yaml (VPS AI + facefusion) - litellm-config.yaml (VPS LiteLLM) - postgres/ (VPS PostgreSQL init) - Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion) - README.md (updated with runpod reference) GPU infrastructure now maintained at: ssh://git@dev.pivoine.art:2222/valknar/runpod.git	2025-11-21 14:41:10 +01:00
Sebastian Krüger	abcebd1d9b	docs: migrate multi-modal AI orchestration to dedicated runpod repository Multi-modal AI stack (text/image/music generation) has been moved to: Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git Updated ai/README.md to document: - VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL) - Reference to new runpod repository for GPU infrastructure - Clear separation between VPS and GPU deployments - Integration architecture via Tailscale VPN	2025-11-21 14:36:36 +01:00
Sebastian Krüger	3ed3e68271	feat(ai): add multi-modal orchestration system for text, image, and music generation Implemented a cost-optimized AI infrastructure running on single RTX 4090 GPU with automatic model switching based on request type. This enables text, image, and music generation on the same hardware with sequential loading. ## New Components Model Orchestrator (ai/model-orchestrator/): - FastAPI service managing model lifecycle - Automatic model detection and switching based on request type - OpenAI-compatible API proxy for all models - Simple YAML configuration for adding new models - Docker SDK integration for service management - Endpoints: /v1/chat/completions, /v1/images/generations, /v1/audio/generations Text Generation (ai/vllm/): - Reorganized existing vLLM server into proper structure - Qwen 2.5 7B Instruct (14GB VRAM, ~50 tok/sec) - Docker containerized with CUDA 12.4 support Image Generation (ai/flux/): - Flux.1 Schnell for fast, high-quality images - 14GB VRAM, 4-5 sec per image - OpenAI DALL-E compatible API - Pre-built image: ghcr.io/matatonic/openedai-images-flux Music Generation (ai/musicgen/): - Meta's MusicGen Medium (facebook/musicgen-medium) - Text-to-music generation (11GB VRAM) - 60-90 seconds for 30s audio clips - Custom FastAPI wrapper with AudioCraft ## Architecture ``` VPS (LiteLLM) → Tailscale VPN → GPU Orchestrator (Port 9000) ↓ ┌───────────────┼───────────────┐ vLLM (8001) Flux (8002) MusicGen (8003) [Only ONE active at a time - sequential loading] ``` ## Configuration Files - docker-compose.gpu.yaml: Main orchestration file for RunPod deployment - model-orchestrator/models.yaml: Model registry (easy to add new models) - .env.example: Environment variable template - README.md: Comprehensive deployment and usage guide ## Updated Files - litellm-config.yaml: Updated to route through orchestrator (port 9000) - GPU_DEPLOYMENT_LOG.md: Documented multi-modal architecture ## Features ✅ Automatic model switching (30-120s latency) ✅ Cost-optimized single GPU deployment (~$0.50/hr vs ~$0.75/hr multi-GPU) ✅ Easy model addition via YAML configuration ✅ OpenAI-compatible APIs for all model types ✅ Centralized routing through LiteLLM proxy ✅ GPU memory safety (only one model loaded at time) ## Usage Deploy to RunPod: ```bash scp -r ai/* gpu-pivoine:/workspace/ai/ ssh gpu-pivoine "cd /workspace/ai && docker compose -f docker-compose.gpu.yaml up -d orchestrator" ``` Test models: ```bash # Text curl http://100.100.108.13:9000/v1/chat/completions -d '{"model":"qwen-2.5-7b","messages":[...]}' # Image curl http://100.100.108.13:9000/v1/images/generations -d '{"model":"flux-schnell","prompt":"..."}' # Music curl http://100.100.108.13:9000/v1/audio/generations -d '{"model":"musicgen-medium","prompt":"..."}' ``` All models available via Open WebUI at https://ai.pivoine.art ## Adding New Models 1. Add entry to models.yaml 2. Define Docker service in docker-compose.gpu.yaml 3. Restart orchestrator That's it! The orchestrator automatically detects and manages the new model. ## Performance \| Model \| VRAM \| Startup \| Speed \| \|-------\|------\|---------\|-------\| \| Qwen 2.5 7B \| 14GB \| 120s \| ~50 tok/sec \| \| Flux.1 Schnell \| 14GB \| 60s \| 4-5s/image \| \| MusicGen Medium \| 11GB \| 45s \| 60-90s for 30s audio \| Model switching overhead: 30-120 seconds ## License Notes - vLLM: Apache 2.0 - Flux.1: Apache 2.0 - AudioCraft: MIT (code), CC-BY-NC (pre-trained weights - non-commercial) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 14:12:13 +01:00
Sebastian Krüger	bb3dabcba7	feat(ai): complete GPU deployment with self-hosted Qwen 2.5 7B model This commit finalizes the GPU infrastructure deployment on RunPod: - Added qwen-2.5-7b model to LiteLLM configuration - Self-hosted on RunPod RTX 4090 GPU server - Connected via Tailscale VPN (100.100.108.13:8000) - OpenAI-compatible API endpoint - Rate limits: 1000 RPM, 100k TPM - Marked GPU deployment as COMPLETE in deployment log - vLLM 0.6.4.post1 with custom AsyncLLMEngine server - Qwen/Qwen2.5-7B-Instruct model (14.25 GB) - 85% GPU memory utilization, 4096 context length - Successfully integrated with Open WebUI at ai.pivoine.art Infrastructure: - Provider: RunPod Spot Instance (~$0.50/hr) - GPU: NVIDIA RTX 4090 24GB - Disk: 50GB local SSD + 922TB network volume - VPN: Tailscale (replaces WireGuard due to RunPod UDP restrictions) Model now visible and accessible in Open WebUI for end users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 13:18:17 +01:00
Sebastian Krüger	8de88d96ac	docs(ai): add comprehensive GPU setup documentation and configs - Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 12:57:06 +01:00
Sebastian Krüger	c0b1308ffe	feat(ai): add GPU server deployment with vLLM and Tailscale - Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server - Bypasses multiprocessing issues on RunPod - OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions) - Uses Qwen 2.5 7B Instruct model - Add comprehensive setup guides: - SETUP_GUIDE.md: RunPod account and GPU server setup - TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard) - DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit - README_GPU_SETUP.md: Main documentation hub - Add deployment configurations: - litellm-config-gpu.yaml: LiteLLM config with GPU endpoints - gpu-server-compose.yaml: Docker Compose for GPU services - deploy-gpu-stack.sh: Automated deployment script - Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation - Network: Tailscale IP 100.100.108.13 - Infrastructure: RunPod RTX 4090, 50GB disk - Known issues and troubleshooting guide - Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 12:56:57 +01:00
Sebastian Krüger	8622f9dfa0	fix: remove drop_params from individual model configs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 18:53:44 +01:00
Sebastian Krüger	0146d1f043	fix: remove invalid supports_prompt_caching parameter Removed supports_prompt_caching parameter that was causing 400 errors. Prompt caching is automatically enabled by Anthropic when the client sends cache_control blocks in messages - no config needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:09:17 +01:00

1 2 3

110 Commits