docker-compose

Author	SHA1	Message	Date
Sebastian Krüger	abcebd1d9b	docs: migrate multi-modal AI orchestration to dedicated runpod repository Multi-modal AI stack (text/image/music generation) has been moved to: Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git Updated ai/README.md to document: - VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL) - Reference to new runpod repository for GPU infrastructure - Clear separation between VPS and GPU deployments - Integration architecture via Tailscale VPN	2025-11-21 14:36:36 +01:00
Sebastian Krüger	3ed3e68271	feat(ai): add multi-modal orchestration system for text, image, and music generation Implemented a cost-optimized AI infrastructure running on single RTX 4090 GPU with automatic model switching based on request type. This enables text, image, and music generation on the same hardware with sequential loading. ## New Components Model Orchestrator (ai/model-orchestrator/): - FastAPI service managing model lifecycle - Automatic model detection and switching based on request type - OpenAI-compatible API proxy for all models - Simple YAML configuration for adding new models - Docker SDK integration for service management - Endpoints: /v1/chat/completions, /v1/images/generations, /v1/audio/generations Text Generation (ai/vllm/): - Reorganized existing vLLM server into proper structure - Qwen 2.5 7B Instruct (14GB VRAM, ~50 tok/sec) - Docker containerized with CUDA 12.4 support Image Generation (ai/flux/): - Flux.1 Schnell for fast, high-quality images - 14GB VRAM, 4-5 sec per image - OpenAI DALL-E compatible API - Pre-built image: ghcr.io/matatonic/openedai-images-flux Music Generation (ai/musicgen/): - Meta's MusicGen Medium (facebook/musicgen-medium) - Text-to-music generation (11GB VRAM) - 60-90 seconds for 30s audio clips - Custom FastAPI wrapper with AudioCraft ## Architecture ``` VPS (LiteLLM) → Tailscale VPN → GPU Orchestrator (Port 9000) ↓ ┌───────────────┼───────────────┐ vLLM (8001) Flux (8002) MusicGen (8003) [Only ONE active at a time - sequential loading] ``` ## Configuration Files - docker-compose.gpu.yaml: Main orchestration file for RunPod deployment - model-orchestrator/models.yaml: Model registry (easy to add new models) - .env.example: Environment variable template - README.md: Comprehensive deployment and usage guide ## Updated Files - litellm-config.yaml: Updated to route through orchestrator (port 9000) - GPU_DEPLOYMENT_LOG.md: Documented multi-modal architecture ## Features ✅ Automatic model switching (30-120s latency) ✅ Cost-optimized single GPU deployment (~$0.50/hr vs ~$0.75/hr multi-GPU) ✅ Easy model addition via YAML configuration ✅ OpenAI-compatible APIs for all model types ✅ Centralized routing through LiteLLM proxy ✅ GPU memory safety (only one model loaded at time) ## Usage Deploy to RunPod: ```bash scp -r ai/* gpu-pivoine:/workspace/ai/ ssh gpu-pivoine "cd /workspace/ai && docker compose -f docker-compose.gpu.yaml up -d orchestrator" ``` Test models: ```bash # Text curl http://100.100.108.13:9000/v1/chat/completions -d '{"model":"qwen-2.5-7b","messages":[...]}' # Image curl http://100.100.108.13:9000/v1/images/generations -d '{"model":"flux-schnell","prompt":"..."}' # Music curl http://100.100.108.13:9000/v1/audio/generations -d '{"model":"musicgen-medium","prompt":"..."}' ``` All models available via Open WebUI at https://ai.pivoine.art ## Adding New Models 1. Add entry to models.yaml 2. Define Docker service in docker-compose.gpu.yaml 3. Restart orchestrator That's it! The orchestrator automatically detects and manages the new model. ## Performance \| Model \| VRAM \| Startup \| Speed \| \|-------\|------\|---------\|-------\| \| Qwen 2.5 7B \| 14GB \| 120s \| ~50 tok/sec \| \| Flux.1 Schnell \| 14GB \| 60s \| 4-5s/image \| \| MusicGen Medium \| 11GB \| 45s \| 60-90s for 30s audio \| Model switching overhead: 30-120 seconds ## License Notes - vLLM: Apache 2.0 - Flux.1: Apache 2.0 - AudioCraft: MIT (code), CC-BY-NC (pre-trained weights - non-commercial) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 14:12:13 +01:00
Sebastian Krüger	bb3dabcba7	feat(ai): complete GPU deployment with self-hosted Qwen 2.5 7B model This commit finalizes the GPU infrastructure deployment on RunPod: - Added qwen-2.5-7b model to LiteLLM configuration - Self-hosted on RunPod RTX 4090 GPU server - Connected via Tailscale VPN (100.100.108.13:8000) - OpenAI-compatible API endpoint - Rate limits: 1000 RPM, 100k TPM - Marked GPU deployment as COMPLETE in deployment log - vLLM 0.6.4.post1 with custom AsyncLLMEngine server - Qwen/Qwen2.5-7B-Instruct model (14.25 GB) - 85% GPU memory utilization, 4096 context length - Successfully integrated with Open WebUI at ai.pivoine.art Infrastructure: - Provider: RunPod Spot Instance (~$0.50/hr) - GPU: NVIDIA RTX 4090 24GB - Disk: 50GB local SSD + 922TB network volume - VPN: Tailscale (replaces WireGuard due to RunPod UDP restrictions) Model now visible and accessible in Open WebUI for end users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 13:18:17 +01:00
Sebastian Krüger	8de88d96ac	docs(ai): add comprehensive GPU setup documentation and configs - Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.) - Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml) - Add GPU_DEPLOYMENT_LOG.md with current infrastructure details - Add GPU_EXPANSION_PLAN.md with complete provider comparison - Add deploy-gpu-stack.sh automation script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 12:57:06 +01:00
Sebastian Krüger	c0b1308ffe	feat(ai): add GPU server deployment with vLLM and Tailscale - Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server - Bypasses multiprocessing issues on RunPod - OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions) - Uses Qwen 2.5 7B Instruct model - Add comprehensive setup guides: - SETUP_GUIDE.md: RunPod account and GPU server setup - TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard) - DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit - README_GPU_SETUP.md: Main documentation hub - Add deployment configurations: - litellm-config-gpu.yaml: LiteLLM config with GPU endpoints - gpu-server-compose.yaml: Docker Compose for GPU services - deploy-gpu-stack.sh: Automated deployment script - Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation - Network: Tailscale IP 100.100.108.13 - Infrastructure: RunPod RTX 4090, 50GB disk - Known issues and troubleshooting guide - Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 12:56:57 +01:00
Sebastian Krüger	8622f9dfa0	fix: remove drop_params from individual model configs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 18:53:44 +01:00
Sebastian Krüger	0146d1f043	fix: remove invalid supports_prompt_caching parameter Removed supports_prompt_caching parameter that was causing 400 errors. Prompt caching is automatically enabled by Anthropic when the client sends cache_control blocks in messages - no config needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:09:17 +01:00
Sebastian Krüger	d26310afb7	feat: enable prompt caching for all Claude models Added supports_prompt_caching: true to all Claude models: - claude-sonnet-4 - claude-sonnet-4.5 - claude-3-5-sonnet - claude-3-opus - claude-3-haiku This enables Anthropic's prompt caching feature across all models, significantly reducing latency and costs for repeated requests with the same system prompts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:07:29 +01:00
Sebastian Krüger	2014a82efb	feat: enable Redis caching for LiteLLM Configure LiteLLM to use existing Redis from core stack for caching: - Enabled cache with Redis backend - Set TTL to 1 hour for cached responses - Uses core_redis container on default port This will improve performance by caching API responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:05:14 +01:00
Sebastian Krüger	5cec1415ad	fix: disable LiteLLM cache to avoid Redis requirement Disabled cache setting that requires Redis configuration. Prompt caching at the Anthropic API level is still enabled via supports_prompt_caching setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:04:39 +01:00
Sebastian Krüger	8a18ae753d	perf: optimize LiteLLM for better performance Reduce database logging overhead and enable prompt caching: - Disabled verbose logging (set_verbose: false) - Disabled spend tracking logs to reduce DB writes - Disabled tag tracking and daily spend logs - Removed success/failure callbacks - Enabled prompt caching for claude-sonnet-4.5 - Set log level to ERROR only - Removed --detailed_debug flag from command This should significantly improve response times by eliminating unnecessary database writes for every request. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:03:19 +01:00
Sebastian Krüger	ffbcecc09d	feat: replace Basic Auth with Authelia Replace HTTP Basic Auth with Authelia ForwardAuth for consistent authentication across infrastructure: - Asciinema Admin (admin.asciinema.dev.pivoine.art): Removed Basic Auth, added Authelia protection - FaceFusion (facefusion.ai.pivoine.art): Removed Basic Auth, added Authelia protection Updated Authelia access control to include both services with one_factor policy. All services now use Authelia for authentication, eliminating the need to manage separate Basic Auth credentials. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 21:54:27 +01:00
Sebastian Krüger	51267cc674	feat: add Mailpit SMTP relay and migrate all services - Add Mailpit service to NET stack with web UI at mailpit.pivoine.art - Configure Mailpit to relay all emails through IONOS SMTP - Migrate all 11+ services to use Mailpit instead of direct IONOS SMTP: * SEXY: Directus API * UTIL: Joplin, Mattermost, Vaultwarden, Tandoor, Linkwarden * DEV: Gitea, n8n, Asciinema * AI: Open WebUI * NET: Netdata (via msmtp) - Centralize SMTP credentials in mailpit-relay.yaml - Simplify service configs (no auth/TLS for internal SMTP) - Enable email monitoring via Mailpit web UI with Basic Auth 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 18:34:38 +01:00
Sebastian Krüger	709dcd8882	fix: use correct NO_DOCS and NO_REDOC environment variables - Replace DISABLE_SWAGGER_UI with NO_DOCS and NO_REDOC - Following official LiteLLM documentation for disabling API docs - Disables both Swagger UI and Redoc documentation interfaces 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 02:17:40 +01:00
Sebastian Krüger	b66e28d874	fix: use DISABLE_SWAGGER_UI environment variable instead of invalid flag - Remove invalid --disable_swagger command flag - Add DISABLE_SWAGGER_UI=true environment variable - Fixes LiteLLM startup error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 02:15:31 +01:00
Sebastian Krüger	f1ff42f452	feat: disable Swagger UI in LiteLLM proxy - Add --disable_swagger flag to LiteLLM command - Improves security by hiding API documentation interface 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 02:14:43 +01:00
Sebastian Krüger	2934caa9ed	fix: disable Watchtower for Facefusion custom local image Watchtower was trying to pull updates from Docker Hub for facefusion-patched:3.5.0-cpu which only exists locally, causing spam errors. Disabled Watchtower monitoring for this container since it's a custom-built image with NSFW filter patches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 08:30:51 +01:00
Sebastian Krüger	f71b150263	feat: add tty flag for Gradio to start properly	2025-11-13 06:18:58 +01:00
Sebastian Krüger	0b43299ffd	fix: update content_analyser hash check in core.py for patched version	2025-11-13 06:16:14 +01:00
Sebastian Krüger	95099a443e	feat: build custom Facefusion image with NSFW filter patch baked in	2025-11-13 06:05:42 +01:00
Sebastian Krüger	8f406f62c1	fix: add command with -u flag to start Facefusion	2025-11-13 06:01:09 +01:00
Sebastian Krüger	c2d25dde59	fix: remove entrypoint override to use default Facefusion startup	2025-11-13 05:59:05 +01:00
Sebastian Krüger	3c56f05286	fix: add Gradio environment variables and remove conflicting command	2025-11-13 05:52:13 +01:00
Sebastian Krüger	65865b7bb8	fix: add listen and port flags to start Gradio server properly	2025-11-13 05:51:24 +01:00
Sebastian Krüger	539f689269	fix: use run.py to start Gradio server	2025-11-13 05:50:37 +01:00
Sebastian Krüger	025118a25e	fix: use simple run command without extra flags	2025-11-13 05:47:32 +01:00
Sebastian Krüger	72fd26f8ea	fix: use headless-run command to start Gradio server	2025-11-13 05:46:20 +01:00
Sebastian Krüger	77f945dd3f	fix: add execution flags to facefusion.py run command	2025-11-13 05:43:51 +01:00
Sebastian Krüger	7f667c371f	fix: correct patch for Facefusion 3.5.0 content_analyser.py - Fixed line number and function names to match actual source - Added validation to ensure patch was applied - Updated patch file with correct context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 04:28:50 +00:00
Sebastian Krüger	cd9c38e524	docs: add patch file for disabling NSFW filter This patch file documents the exact change made to content_analyser.py for disabling the NSFW content filter in Facefusion 3.5.0. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 04:24:36 +00:00
Sebastian Krüger	59f2e8b0fc	refactor: use source code patch instead of deleting NSFW models Cleaner solution based on Reddit community feedback: - Patch content_analyser.py to return False (always safe) - Remove unused config file - Remove config volume mount from compose - Much simpler and more reliable than file deletion approach Credit: https://www.reddit.com/r/StableDiffusion/comments/1m2w5af/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 04:23:38 +00:00
Sebastian Krüger	398ebd342c	fix: add verbose logging to NSFW model deletion - Added echo statements to track script execution - Added -v flag to rm to show deleted files - Confirmed deletion is working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 04:07:33 +00:00
Sebastian Krüger	dd9a9a44cb	fix: allow Facefusion to start by deleting NSFW models after download Previous approach caused infinite download loop. Now waits for models to download, then deletes NSFW models once, allowing Gradio to start. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 04:02:30 +00:00
Sebastian Krüger	5768fe65ff	feat: disable NSFW filter in Facefusion - Add entrypoint script to continuously delete NSFW model files - Add Facefusion config file (for future use) - NSFW content filtering now disabled 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 03:52:21 +00:00
Sebastian Krüger	c30d2d7407	chore: facefusion	2025-11-12 16:42:41 +01:00
Sebastian Krüger	8445256b0f	chore: facefusion	2025-11-12 16:38:57 +01:00
Sebastian Krüger	9f9119358a	fix: add Python unbuffered flag to see Gradio startup logs	2025-11-12 11:01:23 +01:00
Sebastian Krüger	b7f03a313f	fix: use port 7865 for both Gradio and Traefik	2025-11-12 10:56:30 +01:00
Sebastian Krüger	08cce3479f	fix: add command back with python3 and default port 7860	2025-11-12 10:51:35 +01:00
Sebastian Krüger	22eaaa9b30	fix: remove custom command and use default Gradio port 7860 for Facefusion	2025-11-12 10:50:11 +01:00
Sebastian Krüger	8ac025a14c	fix: add command to start Facefusion web UI	2025-11-12 09:42:31 +01:00
Sebastian Krüger	8b77f92028	feat: integrate Facefusion into AI stack Added Facefusion face swapping service to the AI stack: Configuration: - URL: https://facefusion.ai.pivoine.art - Image: facefusion/facefusion:3.5.0-cpu - Port: 7865 - Container: ai_facefusion - Volume: ai_facefusion_data - HTTP Basic Auth protection - CPU execution mode (GPU when available) Changes: - Added facefusion service to ai/compose.yaml - Added AI_FACEFUSION_* env vars to arty.yml - Created ai_facefusion_data volume - Removed old standalone facefusion stack - Removed ai/README-export.md and ai/webui-export.py Facefusion will run on CPU until GPU server is available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-12 09:36:52 +01:00
Sebastian Krüger	3ddc76e213	fix: add additional_drop_params at global litellm_settings level	2025-11-11 12:36:49 +01:00
Sebastian Krüger	cabac4b767	fix: use additional_drop_params to explicitly drop prompt_cache_key According to litellm docs, drop_params only drops OpenAI parameters. Since prompt_cache_key is an Anthropic-specific parameter, we need to use additional_drop_params to explicitly drop it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:33:10 +01:00
Sebastian Krüger	da0dc2363a	fix: disable prompt caching and responses API in litellm - Add LITELLM_DROP_PARAMS environment variable - Disable cache in litellm_settings - Attempt to disable responses API endpoint - Remove invalid supports_prompt_caching parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:27:06 +01:00
Sebastian Krüger	813823995c	fix: disable prompt caching for claude-sonnet-4.5 Explicitly set drop_params and supports_prompt_caching=false for claude-sonnet-4.5 model to prevent prompt_cache_key parameter from being sent to Anthropic API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:22:27 +01:00
Sebastian Krüger	f36e0fa9eb	fix: enhance litellm parameter dropping for codex compatibility Add router_settings and default_litellm_params to ensure unsupported parameters like prompt_cache_key are properly dropped when using codex with the litellm proxy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:14:00 +01:00
Sebastian Krüger	ce6c60d8e0	fix: disable responses ID security for Codex CLI compatibility Added disable_responses_id_security setting to allow Codex CLI to access the /responses endpoint without 401 errors. This removes the encryption requirement on response IDs while maintaining API key authentication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 19:00:55 +01:00
Sebastian Krüger	db69b30d06	feat: add PostgreSQL initialization script for AI stack Created database initialization script following the core stack pattern. The script automatically creates required databases on first initialization: - openwebui: Open WebUI application database - litellm: LiteLLM proxy database for API key management and tracking Changes: - Created ai/postgres/init/01-init-databases.sh - Mounted init directory in ai_postgres service - Added automatic privilege grants to AI_DB_USER Note: Init script only runs on first database creation when volume is empty. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 18:36:50 +01:00
Sebastian Krüger	5a6b007cf3	feat: connect LiteLLM to AI PostgreSQL database LiteLLM now uses the ai_postgres database instance with a dedicated 'litellm' database for API key management, usage tracking, and rate limiting. Changes: - Set DATABASE_URL to postgresql://ai:password@ai_postgres:5432/litellm - Added depends_on ai_postgres to ensure DB starts first 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 18:34:10 +01:00

1 2

67 Commits