docker-compose

Author	SHA1	Message	Date
Sebastian Krüger	bb3dabcba7	feat(ai): complete GPU deployment with self-hosted Qwen 2.5 7B model This commit finalizes the GPU infrastructure deployment on RunPod: - Added qwen-2.5-7b model to LiteLLM configuration - Self-hosted on RunPod RTX 4090 GPU server - Connected via Tailscale VPN (100.100.108.13:8000) - OpenAI-compatible API endpoint - Rate limits: 1000 RPM, 100k TPM - Marked GPU deployment as COMPLETE in deployment log - vLLM 0.6.4.post1 with custom AsyncLLMEngine server - Qwen/Qwen2.5-7B-Instruct model (14.25 GB) - 85% GPU memory utilization, 4096 context length - Successfully integrated with Open WebUI at ai.pivoine.art Infrastructure: - Provider: RunPod Spot Instance (~$0.50/hr) - GPU: NVIDIA RTX 4090 24GB - Disk: 50GB local SSD + 922TB network volume - VPN: Tailscale (replaces WireGuard due to RunPod UDP restrictions) Model now visible and accessible in Open WebUI for end users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 13:18:17 +01:00
Sebastian Krüger	8622f9dfa0	fix: remove drop_params from individual model configs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 18:53:44 +01:00
Sebastian Krüger	0146d1f043	fix: remove invalid supports_prompt_caching parameter Removed supports_prompt_caching parameter that was causing 400 errors. Prompt caching is automatically enabled by Anthropic when the client sends cache_control blocks in messages - no config needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:09:17 +01:00
Sebastian Krüger	d26310afb7	feat: enable prompt caching for all Claude models Added supports_prompt_caching: true to all Claude models: - claude-sonnet-4 - claude-sonnet-4.5 - claude-3-5-sonnet - claude-3-opus - claude-3-haiku This enables Anthropic's prompt caching feature across all models, significantly reducing latency and costs for repeated requests with the same system prompts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:07:29 +01:00
Sebastian Krüger	2014a82efb	feat: enable Redis caching for LiteLLM Configure LiteLLM to use existing Redis from core stack for caching: - Enabled cache with Redis backend - Set TTL to 1 hour for cached responses - Uses core_redis container on default port This will improve performance by caching API responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:05:14 +01:00
Sebastian Krüger	5cec1415ad	fix: disable LiteLLM cache to avoid Redis requirement Disabled cache setting that requires Redis configuration. Prompt caching at the Anthropic API level is still enabled via supports_prompt_caching setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:04:39 +01:00
Sebastian Krüger	8a18ae753d	perf: optimize LiteLLM for better performance Reduce database logging overhead and enable prompt caching: - Disabled verbose logging (set_verbose: false) - Disabled spend tracking logs to reduce DB writes - Disabled tag tracking and daily spend logs - Removed success/failure callbacks - Enabled prompt caching for claude-sonnet-4.5 - Set log level to ERROR only - Removed --detailed_debug flag from command This should significantly improve response times by eliminating unnecessary database writes for every request. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 16:03:19 +01:00
Sebastian Krüger	3ddc76e213	fix: add additional_drop_params at global litellm_settings level	2025-11-11 12:36:49 +01:00
Sebastian Krüger	cabac4b767	fix: use additional_drop_params to explicitly drop prompt_cache_key According to litellm docs, drop_params only drops OpenAI parameters. Since prompt_cache_key is an Anthropic-specific parameter, we need to use additional_drop_params to explicitly drop it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:33:10 +01:00
Sebastian Krüger	da0dc2363a	fix: disable prompt caching and responses API in litellm - Add LITELLM_DROP_PARAMS environment variable - Disable cache in litellm_settings - Attempt to disable responses API endpoint - Remove invalid supports_prompt_caching parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:27:06 +01:00
Sebastian Krüger	813823995c	fix: disable prompt caching for claude-sonnet-4.5 Explicitly set drop_params and supports_prompt_caching=false for claude-sonnet-4.5 model to prevent prompt_cache_key parameter from being sent to Anthropic API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:22:27 +01:00
Sebastian Krüger	f36e0fa9eb	fix: enhance litellm parameter dropping for codex compatibility Add router_settings and default_litellm_params to ensure unsupported parameters like prompt_cache_key are properly dropped when using codex with the litellm proxy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:14:00 +01:00
Sebastian Krüger	ce6c60d8e0	fix: disable responses ID security for Codex CLI compatibility Added disable_responses_id_security setting to allow Codex CLI to access the /responses endpoint without 401 errors. This removes the encryption requirement on response IDs while maintaining API key authentication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 19:00:55 +01:00
Sebastian Krüger	cdb8d2ef34	fix: correct LiteLLM environment variable syntax Changed API key reference from ${ANTHROPIC_API_KEY} to os.environ/ANTHROPIC_API_KEY to match LiteLLM's documented syntax. The os.environ/ prefix tells LiteLLM to use os.getenv() to retrieve the environment variable at runtime, which is the correct way to reference environment variables in LiteLLM config files. Reference: https://docs.litellm.ai/docs/proxy/deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 00:30:07 +01:00
Sebastian Krüger	424e6d044d	fix: configure LiteLLM without database requirement	2025-11-08 23:02:07 +01:00
Sebastian Krüger	8eae3c650f	feat: add LiteLLM proxy for Anthropic Claude models Added LiteLLM as an OpenAI-compatible proxy for Anthropic's API to enable Claude models in Open WebUI. New Service: litellm - Image: ghcr.io/berriai/litellm:main-latest - Internal proxy on port 4000 - Converts Anthropic API to OpenAI-compatible format - Health check with 30s intervals - Not exposed via Traefik (internal only) LiteLLM Configuration (litellm-config.yaml) - Claude Sonnet 4 (claude-sonnet-4-20250514) - Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) - Claude 3 Opus (claude-3-opus-20240229) - Claude 3 Haiku (claude-3-haiku-20240307) Open WebUI Configuration Updates - Changed OPENAI_API_BASE_URLS to point to LiteLLM proxy - URL: http://litellm:4000/v1 - Added litellm as dependency for webui service - Dummy API key for proxy authentication Why LiteLLM? Anthropic's API uses different endpoint structure and authentication headers compared to OpenAI. LiteLLM acts as a translation layer, allowing Open WebUI to use Claude models through its OpenAI-compatible interface. Available Models in Open WebUI - claude-sonnet-4 (latest Claude Sonnet 4) - claude-sonnet-4.5 (Claude Sonnet 4.5) - claude-3-5-sonnet - claude-3-opus - claude-3-haiku 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-08 22:58:09 +01:00

16 Commits