- Add BGE embedding model config (port 8002) to litellm-config.yaml
- Add GPU_VLLM_EMBED_URL env var to compose and .env
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove crawl4ai service from ai/compose.yaml (will use local MCP instead)
- Remove crawl4ai backup volume from core/compose.yaml
- Add core/backrest/config.json (infrastructure as code)
- Change backrest from volume to bind-mounted config
- Update CLAUDE.md and README.md documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Supervisor XML-RPC API v3.0 (Supervisor 4.3.0) only supports 2-parameter
readLog(offset, length) calls, not 3-parameter calls with filename.
The SUPERVISOR_LOGFILE environment variable is not used by the API.
Testing showed:
- Working: server.supervisor.readLog(-4096, 0)
- Failing: server.supervisor.readLog(-4096, 4096, '/path/to/log')
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replaced nginx:alpine proxy with dev.pivoine.art/valknar/supervisor-ui:latest
- Modern Next.js UI with real-time SSE updates, batch operations, and charts
- Changed service port from 80 (nginx) to 3000 (Next.js)
- Removed supervisor-nginx.conf (no longer needed)
- Kept same URL (supervisor.ai.pivoine.art) and Authelia SSO protection
- Added health check for /api/health endpoint
- Service connects to RunPod Supervisor via Tailscale (SUPERVISOR_HOST/PORT)
- Replace GPU_TAILSCALE_IP interpolation with GPU_VLLM_API_URL
- LiteLLM requires full URL in api_base with os.environ/ syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace COMFYUI_BACKEND_HOST and SUPERVISOR_BACKEND_HOST with GPU_TAILSCALE_IP
- Update LiteLLM config to use os.environ/GPU_TAILSCALE_IP for vLLM models
- Add GPU_TAILSCALE_IP env var to LiteLLM service
- Configure qwen-2.5-7b and llama-3.1-8b to route through orchestrator
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove hardcoded default values from compose.yaml
- Backend IPs now managed via environment variables only
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add nginx reverse proxy service for Supervisor web UI at supervisor.ai.pivoine.art with Authelia authentication. Proxies to RunPod GPU instance via Tailscale (100.121.199.88:9001).
Changes:
- Create supervisor-nginx.conf for nginx proxy configuration
- Add supervisor service to docker-compose with Traefik labels
- Add supervisor.ai.pivoine.art to Authelia protected domains
- Remove deprecated Flux-related files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace hardcoded IP in comfyui-nginx.conf with env vars
- Add COMFYUI_BACKEND_HOST and COMFYUI_BACKEND_PORT to compose.yaml
- Use envsubst to substitute variables at container startup
- Defaults: 100.121.199.88:8188 (current RunPod Tailscale IP)
- Add ComfyUI service to AI stack using nginx:alpine as reverse proxy
- Proxy to RunPod ComfyUI via Tailscale (100.121.199.88:8188)
- Configure Traefik routing for comfy.ai.pivoine.art
- Enable Authelia SSO middleware (net-authelia)
- Support WebSocket connections for real-time updates
- Set appropriate timeouts for image generation (300s)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Set drop_params: false in litellm_settings
- Set modify_params: false in litellm_settings
- Set drop_params: false in default_litellm_params
- Commented out LITELLM_DROP_PARAMS env var
- Removed --drop_params command flag
These settings were stripping critical streaming parameters causing
vLLM streaming responses to collapse into empty deltas
- Reverted direct orchestrator connection to WebUI
- Added stream: true parameter to qwen-2.5-7b model config
- Keep LiteLLM as single proxy for all models
- Configure WebUI with both LiteLLM and direct orchestrator API base URLs
- This bypasses LiteLLM's streaming issues for the qwen-2.5-7b model
- WebUI will now show models from both endpoints
- Allows testing if LiteLLM is the bottleneck for streaming
Related to streaming fix in RunPod models/vllm/server.py
Reduce database logging overhead and enable prompt caching:
- Disabled verbose logging (set_verbose: false)
- Disabled spend tracking logs to reduce DB writes
- Disabled tag tracking and daily spend logs
- Removed success/failure callbacks
- Enabled prompt caching for claude-sonnet-4.5
- Set log level to ERROR only
- Removed --detailed_debug flag from command
This should significantly improve response times by eliminating
unnecessary database writes for every request.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replace HTTP Basic Auth with Authelia ForwardAuth for consistent
authentication across infrastructure:
- Asciinema Admin (admin.asciinema.dev.pivoine.art): Removed Basic Auth,
added Authelia protection
- FaceFusion (facefusion.ai.pivoine.art): Removed Basic Auth, added
Authelia protection
Updated Authelia access control to include both services with one_factor
policy.
All services now use Authelia for authentication, eliminating the need
to manage separate Basic Auth credentials.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add Mailpit service to NET stack with web UI at mailpit.pivoine.art
- Configure Mailpit to relay all emails through IONOS SMTP
- Migrate all 11+ services to use Mailpit instead of direct IONOS SMTP:
* SEXY: Directus API
* UTIL: Joplin, Mattermost, Vaultwarden, Tandoor, Linkwarden
* DEV: Gitea, n8n, Asciinema
* AI: Open WebUI
* NET: Netdata (via msmtp)
- Centralize SMTP credentials in mailpit-relay.yaml
- Simplify service configs (no auth/TLS for internal SMTP)
- Enable email monitoring via Mailpit web UI with Basic Auth
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace DISABLE_SWAGGER_UI with NO_DOCS and NO_REDOC
- Following official LiteLLM documentation for disabling API docs
- Disables both Swagger UI and Redoc documentation interfaces
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add --disable_swagger flag to LiteLLM command
- Improves security by hiding API documentation interface
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Watchtower was trying to pull updates from Docker Hub for facefusion-patched:3.5.0-cpu
which only exists locally, causing spam errors. Disabled Watchtower monitoring for this
container since it's a custom-built image with NSFW filter patches.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added Facefusion face swapping service to the AI stack:
**Configuration:**
- URL: https://facefusion.ai.pivoine.art
- Image: facefusion/facefusion:3.5.0-cpu
- Port: 7865
- Container: ai_facefusion
- Volume: ai_facefusion_data
- HTTP Basic Auth protection
- CPU execution mode (GPU when available)
**Changes:**
- Added facefusion service to ai/compose.yaml
- Added AI_FACEFUSION_* env vars to arty.yml
- Created ai_facefusion_data volume
- Removed old standalone facefusion stack
- Removed ai/README-export.md and ai/webui-export.py
Facefusion will run on CPU until GPU server is available.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Created database initialization script following the core stack pattern.
The script automatically creates required databases on first initialization:
- openwebui: Open WebUI application database
- litellm: LiteLLM proxy database for API key management and tracking
Changes:
- Created ai/postgres/init/01-init-databases.sh
- Mounted init directory in ai_postgres service
- Added automatic privilege grants to AI_DB_USER
Note: Init script only runs on first database creation when volume is empty.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
LiteLLM now uses the ai_postgres database instance with a dedicated
'litellm' database for API key management, usage tracking, and rate limiting.
Changes:
- Set DATABASE_URL to postgresql://ai:password@ai_postgres:5432/litellm
- Added depends_on ai_postgres to ensure DB starts first
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed authentication middleware to simplify access. LiteLLM now relies
solely on Bearer token authentication via LITELLM_MASTER_KEY.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Re-enabled LITELLM_MASTER_KEY for proper API key authentication.
LiteLLM supports master key without database for simple auth scenarios.
- LiteLLM validates Bearer token against master key
- Open WebUI uses same key for internal communication
- External access requires both HTTP Basic Auth + API key
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed LITELLM_MASTER_KEY as it requires a database for virtual key
management. Security is already provided by HTTP Basic Auth on the
public Traefik endpoint. Internal Open WebUI communication doesn't
need additional API key auth.
Security layers:
- Public access: HTTP Basic Auth via Traefik
- Internal LiteLLM: Network isolation (no auth needed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>