Commit Graph

117 Commits

Author SHA1 Message Date
f4dd7c7d9d fix: litellm compose 2025-11-28 08:14:13 +01:00
608b5ba793 fix: nginx audio mime types 2025-11-27 16:45:14 +01:00
2e45252793 fix: nginx proxy timeouts 2025-11-27 15:24:38 +01:00
20ba9952a1 feat: upscale service 2025-11-27 12:13:57 +01:00
69869ec3fb fix: remove vllm embedding 2025-11-27 01:11:43 +01:00
cc270c8539 fix: vllm model ids 2025-11-27 00:49:53 +01:00
8bdcde4b90 fix: supervisor env 2025-11-26 22:58:16 +01:00
5d232c7d9b feat: audiocraft 2025-11-26 22:54:10 +01:00
cef233b678 chore: remove qwen 2025-11-26 21:03:43 +01:00
b63ddbffbd fix(ai): correct bge embedding model name to hosted_vllm/openai prefix
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:44:33 +01:00
d57a1241d2 feat(ai): add bge-large-en-v1.5 embedding model to litellm
- Add BGE embedding model config (port 8002) to litellm-config.yaml
- Add GPU_VLLM_EMBED_URL env var to compose and .env

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:40:36 +01:00
ef0309838c refactor(ai): remove crawl4ai service, add backrest config to repo
- Remove crawl4ai service from ai/compose.yaml (will use local MCP instead)
- Remove crawl4ai backup volume from core/compose.yaml
- Add core/backrest/config.json (infrastructure as code)
- Change backrest from volume to bind-mounted config
- Update CLAUDE.md and README.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:20:22 +01:00
071a74a996 revert(ai): remove SUPERVISOR_LOGFILE env var from supervisor-ui
Supervisor XML-RPC API v3.0 (Supervisor 4.3.0) only supports 2-parameter
readLog(offset, length) calls, not 3-parameter calls with filename.
The SUPERVISOR_LOGFILE environment variable is not used by the API.

Testing showed:
- Working: server.supervisor.readLog(-4096, 0)
- Failing: server.supervisor.readLog(-4096, 4096, '/path/to/log')

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 23:01:10 +01:00
74b3748b23 feat(ai): add SUPERVISOR_LOGFILE env var to supervisor-ui for RunPod logs
Configure supervisor-ui to use correct logfile path (/workspace/logs/supervisord.log)
for RunPod Supervisor instance. Fixes logs page error on https://supervisor.ai.pivoine.art/logs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 22:49:33 +01:00
87216ab26a fix: remove healthcheck from supervisor-ui service 2025-11-23 20:38:37 +01:00
9e2b19e7f6 feat: replace nginx supervisor proxy with modern supervisor-ui
- Replaced nginx:alpine proxy with dev.pivoine.art/valknar/supervisor-ui:latest
- Modern Next.js UI with real-time SSE updates, batch operations, and charts
- Changed service port from 80 (nginx) to 3000 (Next.js)
- Removed supervisor-nginx.conf (no longer needed)
- Kept same URL (supervisor.ai.pivoine.art) and Authelia SSO protection
- Added health check for /api/health endpoint
- Service connects to RunPod Supervisor via Tailscale (SUPERVISOR_HOST/PORT)
2025-11-23 20:18:29 +01:00
a80c6b931b fix: update compose.yaml to use new GPU_VLLM URLs 2025-11-23 16:22:54 +01:00
64c02228d8 fix: use EMPTY api_key for vLLM servers 2025-11-23 16:17:27 +01:00
55d9bef18a fix: remove api_key from vLLM config to fix authentication error
vLLM servers don't validate API keys, so LiteLLM shouldn't pass them

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:16:37 +01:00
7fc945e179 fix: update LiteLLM config for direct vLLM server access
- Replace orchestrator routing with direct vLLM server connections
- Qwen 2.5 7B on port 8000 (GPU_VLLM_QWEN_URL)
- Llama 3.1 8B on port 8001 (GPU_VLLM_LLAMA_URL)
- Simplify architecture by removing orchestrator proxy layer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:10:20 +01:00
94ab4ae6dd feat: enable system message support for qwen-2.5-7b 2025-11-23 14:36:34 +01:00
779e76974d fix: use complete URL env var for vLLM API base
- Replace GPU_TAILSCALE_IP interpolation with GPU_VLLM_API_URL
- LiteLLM requires full URL in api_base with os.environ/ syntax

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 13:17:37 +01:00
f3f32c163f feat: consolidate GPU IP with single GPU_TAILSCALE_IP variable
- Replace COMFYUI_BACKEND_HOST and SUPERVISOR_BACKEND_HOST with GPU_TAILSCALE_IP
- Update LiteLLM config to use os.environ/GPU_TAILSCALE_IP for vLLM models
- Add GPU_TAILSCALE_IP env var to LiteLLM service
- Configure qwen-2.5-7b and llama-3.1-8b to route through orchestrator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 13:05:33 +01:00
e00e959543 Update backend IPs for ComfyUI and Supervisor proxies
- Remove hardcoded default values from compose.yaml
- Backend IPs now managed via environment variables only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 02:11:19 +01:00
0fd2eacad1 feat: add Supervisor proxy with Authelia SSO
Add nginx reverse proxy service for Supervisor web UI at supervisor.ai.pivoine.art with Authelia authentication. Proxies to RunPod GPU instance via Tailscale (100.121.199.88:9001).

Changes:
- Create supervisor-nginx.conf for nginx proxy configuration
- Add supervisor service to docker-compose with Traefik labels
- Add supervisor.ai.pivoine.art to Authelia protected domains
- Remove deprecated Flux-related files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 13:19:02 +01:00
bf402adb25 Add Llama 3.1 8B model to LiteLLM configuration 2025-11-21 21:30:18 +01:00
ae1c349b55 feat: make ComfyUI backend IP/port configurable via environment variables
- Replace hardcoded IP in comfyui-nginx.conf with env vars
- Add COMFYUI_BACKEND_HOST and COMFYUI_BACKEND_PORT to compose.yaml
- Use envsubst to substitute variables at container startup
- Defaults: 100.121.199.88:8188 (current RunPod Tailscale IP)
2025-11-21 21:24:51 +01:00
66d8c82e47 Remove Flux and MusicGen models from LiteLLM config
ComfyUI now handles Flux image generation directly.
MusicGen is not being used and has been removed.
2025-11-21 21:11:29 +01:00
904f7d3c2e feat(ai): add ComfyUI proxy service with Authelia SSO
- Add ComfyUI service to AI stack using nginx:alpine as reverse proxy
- Proxy to RunPod ComfyUI via Tailscale (100.121.199.88:8188)
- Configure Traefik routing for comfy.ai.pivoine.art
- Enable Authelia SSO middleware (net-authelia)
- Support WebSocket connections for real-time updates
- Set appropriate timeouts for image generation (300s)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 20:56:20 +01:00
9a964cff3c feat: add Flux image generation function for Open WebUI
- Add flux_image_gen.py manifold function for Flux.1 Schnell
- Auto-mount functions via Docker volume (./functions:/app/backend/data/functions:ro)
- Add comprehensive setup guide in FLUX_SETUP.md
- Update CLAUDE.md with Flux integration documentation
- Infrastructure as code approach - no manual import needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 20:20:33 +01:00
0999e5d29f feat: re-enable Redis caching in LiteLLM now that streaming is fixed 2025-11-21 19:40:57 +01:00
ec903c16c2 fix: use hosted_vllm/openai/ prefix for vLLM model via orchestrator 2025-11-21 19:18:33 +01:00
155016da97 debug: enable DEBUG logging for LiteLLM to troubleshoot streaming 2025-11-21 19:10:00 +01:00
c81f312e9e fix: use correct vLLM model ID from /v1/models endpoint 2025-11-21 19:06:56 +01:00
fe0cf487ee fix: use correct vLLM model name with hosted_vllm prefix 2025-11-21 19:02:44 +01:00
81d4058c5d revert: back to openai prefix for vLLM OpenAI-compatible endpoint 2025-11-21 18:57:10 +01:00
4a575bc0da fix: use hosted_vllm prefix instead of openai for vLLM streaming compatibility 2025-11-21 18:54:40 +01:00
01a345979b fix: disable drop_params to preserve streaming metadata in LiteLLM
- Set drop_params: false in litellm_settings
- Set modify_params: false in litellm_settings
- Set drop_params: false in default_litellm_params
- Commented out LITELLM_DROP_PARAMS env var
- Removed --drop_params command flag

These settings were stripping critical streaming parameters causing
vLLM streaming responses to collapse into empty deltas
2025-11-21 18:46:33 +01:00
c58b5d36ba revert: remove direct WebUI connection, focus on fixing LiteLLM streaming
- Reverted direct orchestrator connection to WebUI
- Added stream: true parameter to qwen-2.5-7b model config
- Keep LiteLLM as single proxy for all models
2025-11-21 18:42:46 +01:00
62fcf832da feat: add direct RunPod orchestrator connection to WebUI for streaming bypass
- Configure WebUI with both LiteLLM and direct orchestrator API base URLs
- This bypasses LiteLLM's streaming issues for the qwen-2.5-7b model
- WebUI will now show models from both endpoints
- Allows testing if LiteLLM is the bottleneck for streaming

Related to streaming fix in RunPod models/vllm/server.py
2025-11-21 18:38:31 +01:00
dfde1df72f fix: add /v1 suffix to vLLM api_base for proper endpoint routing 2025-11-21 18:00:53 +01:00
42a68bc0b5 fix: revert to openai prefix, remove /v1 suffix from api_base
- Changed back from hosted_vllm/qwen-2.5-7b to openai/qwen-2.5-7b
- Removed /v1 suffix from api_base (LiteLLM adds it automatically)
- Added supports_system_messages: false for vLLM compatibility
2025-11-21 17:55:10 +01:00
699c8537b0 fix: use LiteLLM vLLM pass-through for qwen model
- Changed model from openai/qwen-2.5-7b to hosted_vllm/qwen-2.5-7b
- Implements proper vLLM integration per LiteLLM docs
- Fixes streaming response forwarding issue
2025-11-21 17:52:34 +01:00
ed4d537499 Enable verbose logging in LiteLLM for streaming debug 2025-11-21 17:43:34 +01:00
103bbbad51 debug: enable INFO logging in LiteLLM for troubleshooting
Enable detailed logging to debug qwen model requests from WebUI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 17:13:38 +01:00
92a7436716 fix(ai): add 600s timeout for qwen model requests via Tailscale 2025-11-21 17:06:01 +01:00
6aea9d018e feat(ai): disable Ollama API in WebUI, use LiteLLM only 2025-11-21 16:57:20 +01:00
e2e0927291 feat: update LiteLLM to use RunPod GPU via Tailscale
- Update api_base URLs from 100.100.108.13 to 100.121.199.88 (RunPod Tailscale IP)
- All self-hosted models (qwen-2.5-7b, flux-schnell, musicgen-medium) now route through Tailscale VPN
- Tested and verified connectivity between VPS and RunPod GPU orchestrator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:42:27 +01:00
a5ed2be933 docs: remove outdated ai/README.md
Removed outdated AI infrastructure README that referenced GPU services.
VPS AI services (Open WebUI, Crawl4AI, facefusion) are documented in compose.yaml comments.
GPU infrastructure docs are now in dedicated runpod repository.
2025-11-21 14:42:23 +01:00
d5e37dbd3f cleanup: remove GPU/RunPod files from docker-compose repository
Removed GPU orchestration files migrated to dedicated runpod repository:
- Model orchestrator, vLLM, Flux, MusicGen services
- GPU Docker Compose files and configs
- GPU deployment scripts and documentation

Kept VPS AI services and facefusion:
- compose.yaml (VPS AI + facefusion)
- litellm-config.yaml (VPS LiteLLM)
- postgres/ (VPS PostgreSQL init)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)
- README.md (updated with runpod reference)

GPU infrastructure now maintained at: ssh://git@dev.pivoine.art:2222/valknar/runpod.git
2025-11-21 14:41:10 +01:00