Compare commits

..

79 Commits

Author SHA1 Message Date
ea210c73dd fix(core): mount backrest config directory instead of file
Backrest requires atomic file operations for config updates,
which fail with single-file bind mounts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 17:25:50 +01:00
c4fd23855b feat(media): add Immich photo/video management service
- Add immich_server, immich_ml, and immich_postgres services
- Use dedicated PostgreSQL with vector extensions (vectorchord + pgvectors)
- Connect to core Redis for job queues
- Configure Traefik routing for immich.media.pivoine.art
- Add backup volumes and plan for Backrest (daily at 12:00)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 17:23:25 +01:00
3cc9db6632 fix(core): remove filestash backup volume reference
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 17:08:28 +01:00
4851ca10f0 fix(media): remove filestash service
- Remove filestash service and volume from media/compose.yaml
- Remove filestash-backup plan from backrest config
- Remove MEDIA_FILESTASH_* environment variables from arty.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 17:01:07 +01:00
c55f41408a fix(ai): litellm config 2025-11-30 23:03:32 +01:00
7bca766247 fix(ai): litellm config 2025-11-30 22:32:49 +01:00
120bf7c385 feat(ai): bge over litellm 2025-11-30 20:12:07 +01:00
35e0f232f9 revert: remove GPU_TAILSCALE_HOST from arty.yml
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:58:44 +01:00
cd9256f09c fix: use Tailscale IP for GPU_TAILSCALE_HOST (MagicDNS doesn't work from Docker)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:53:14 +01:00
c9e3a5cc4f fix: add resolver for runtime DNS resolution in nginx
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:48:25 +01:00
b2b444fb98 fix: add Tailscale DNS to GPU proxy containers
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:47:06 +01:00
dcc29d20f0 fix: traefik labels 2025-11-28 09:39:39 +01:00
19ad30e8c4 revert: tailscale docker sidecar 2025-11-28 09:31:21 +01:00
99e39ee6e6 fix: tailscale sidecar dns 2025-11-28 09:12:52 +01:00
ed83e64727 fix: tailscale sidecar dns 2025-11-28 09:09:10 +01:00
18e6741596 fix: tailscale sidecar dns 2025-11-28 09:07:12 +01:00
c83b77ebdb feat: tailscale sidecar 2025-11-28 08:59:42 +01:00
6568dd10b5 feat: tailscale sidecar 2025-11-28 08:42:40 +01:00
a6e4540e84 feat: tailscale sidecar 2025-11-28 08:41:30 +01:00
dbdf33e78e feat: tailscale sidecar 2025-11-28 08:40:30 +01:00
74f618bcbb feat: tailscale sidecar 2025-11-28 08:36:50 +01:00
0c7fe219f7 feat: tailscale sidecar 2025-11-28 08:32:26 +01:00
6d0a15a969 fix: ai compose tailscale dns 2025-11-28 08:21:23 +01:00
f4dd7c7d9d fix: litellm compose 2025-11-28 08:14:13 +01:00
608b5ba793 fix: nginx audio mime types 2025-11-27 16:45:14 +01:00
2e45252793 fix: nginx proxy timeouts 2025-11-27 15:24:38 +01:00
20ba9952a1 feat: upscale service 2025-11-27 12:13:57 +01:00
69869ec3fb fix: remove vllm embedding 2025-11-27 01:11:43 +01:00
cc270c8539 fix: vllm model ids 2025-11-27 00:49:53 +01:00
8bdcde4b90 fix: supervisor env 2025-11-26 22:58:16 +01:00
2ab43e8fd3 fix: authelia for audiocraft 2025-11-26 22:56:30 +01:00
5d232c7d9b feat: audiocraft 2025-11-26 22:54:10 +01:00
cef233b678 chore: remove qwen 2025-11-26 21:03:43 +01:00
b63ddbffbd fix(ai): correct bge embedding model name to hosted_vllm/openai prefix
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:44:33 +01:00
d57a1241d2 feat(ai): add bge-large-en-v1.5 embedding model to litellm
- Add BGE embedding model config (port 8002) to litellm-config.yaml
- Add GPU_VLLM_EMBED_URL env var to compose and .env

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:40:36 +01:00
ef0309838c refactor(ai): remove crawl4ai service, add backrest config to repo
- Remove crawl4ai service from ai/compose.yaml (will use local MCP instead)
- Remove crawl4ai backup volume from core/compose.yaml
- Add core/backrest/config.json (infrastructure as code)
- Change backrest from volume to bind-mounted config
- Update CLAUDE.md and README.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:20:22 +01:00
071a74a996 revert(ai): remove SUPERVISOR_LOGFILE env var from supervisor-ui
Supervisor XML-RPC API v3.0 (Supervisor 4.3.0) only supports 2-parameter
readLog(offset, length) calls, not 3-parameter calls with filename.
The SUPERVISOR_LOGFILE environment variable is not used by the API.

Testing showed:
- Working: server.supervisor.readLog(-4096, 0)
- Failing: server.supervisor.readLog(-4096, 4096, '/path/to/log')

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 23:01:10 +01:00
74b3748b23 feat(ai): add SUPERVISOR_LOGFILE env var to supervisor-ui for RunPod logs
Configure supervisor-ui to use correct logfile path (/workspace/logs/supervisord.log)
for RunPod Supervisor instance. Fixes logs page error on https://supervisor.ai.pivoine.art/logs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 22:49:33 +01:00
87216ab26a fix: remove healthcheck from supervisor-ui service 2025-11-23 20:38:37 +01:00
9e2b19e7f6 feat: replace nginx supervisor proxy with modern supervisor-ui
- Replaced nginx:alpine proxy with dev.pivoine.art/valknar/supervisor-ui:latest
- Modern Next.js UI with real-time SSE updates, batch operations, and charts
- Changed service port from 80 (nginx) to 3000 (Next.js)
- Removed supervisor-nginx.conf (no longer needed)
- Kept same URL (supervisor.ai.pivoine.art) and Authelia SSO protection
- Added health check for /api/health endpoint
- Service connects to RunPod Supervisor via Tailscale (SUPERVISOR_HOST/PORT)
2025-11-23 20:18:29 +01:00
a80c6b931b fix: update compose.yaml to use new GPU_VLLM URLs 2025-11-23 16:22:54 +01:00
64c02228d8 fix: use EMPTY api_key for vLLM servers 2025-11-23 16:17:27 +01:00
55d9bef18a fix: remove api_key from vLLM config to fix authentication error
vLLM servers don't validate API keys, so LiteLLM shouldn't pass them

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:16:37 +01:00
7fc945e179 fix: update LiteLLM config for direct vLLM server access
- Replace orchestrator routing with direct vLLM server connections
- Qwen 2.5 7B on port 8000 (GPU_VLLM_QWEN_URL)
- Llama 3.1 8B on port 8001 (GPU_VLLM_LLAMA_URL)
- Simplify architecture by removing orchestrator proxy layer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:10:20 +01:00
94ab4ae6dd feat: enable system message support for qwen-2.5-7b 2025-11-23 14:36:34 +01:00
779e76974d fix: use complete URL env var for vLLM API base
- Replace GPU_TAILSCALE_IP interpolation with GPU_VLLM_API_URL
- LiteLLM requires full URL in api_base with os.environ/ syntax

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 13:17:37 +01:00
f3f32c163f feat: consolidate GPU IP with single GPU_TAILSCALE_IP variable
- Replace COMFYUI_BACKEND_HOST and SUPERVISOR_BACKEND_HOST with GPU_TAILSCALE_IP
- Update LiteLLM config to use os.environ/GPU_TAILSCALE_IP for vLLM models
- Add GPU_TAILSCALE_IP env var to LiteLLM service
- Configure qwen-2.5-7b and llama-3.1-8b to route through orchestrator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 13:05:33 +01:00
e00e959543 Update backend IPs for ComfyUI and Supervisor proxies
- Remove hardcoded default values from compose.yaml
- Backend IPs now managed via environment variables only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 02:11:19 +01:00
0fd2eacad1 feat: add Supervisor proxy with Authelia SSO
Add nginx reverse proxy service for Supervisor web UI at supervisor.ai.pivoine.art with Authelia authentication. Proxies to RunPod GPU instance via Tailscale (100.121.199.88:9001).

Changes:
- Create supervisor-nginx.conf for nginx proxy configuration
- Add supervisor service to docker-compose with Traefik labels
- Add supervisor.ai.pivoine.art to Authelia protected domains
- Remove deprecated Flux-related files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 13:19:02 +01:00
bf402adb25 Add Llama 3.1 8B model to LiteLLM configuration 2025-11-21 21:30:18 +01:00
ae1c349b55 feat: make ComfyUI backend IP/port configurable via environment variables
- Replace hardcoded IP in comfyui-nginx.conf with env vars
- Add COMFYUI_BACKEND_HOST and COMFYUI_BACKEND_PORT to compose.yaml
- Use envsubst to substitute variables at container startup
- Defaults: 100.121.199.88:8188 (current RunPod Tailscale IP)
2025-11-21 21:24:51 +01:00
66d8c82e47 Remove Flux and MusicGen models from LiteLLM config
ComfyUI now handles Flux image generation directly.
MusicGen is not being used and has been removed.
2025-11-21 21:11:29 +01:00
ea81634ef3 feat: add ComfyUI to Authelia protected domains
- Add comfy.ai.pivoine.art to one_factor authentication policy
- Enables SSO protection for ComfyUI image generation service
2025-11-21 21:05:24 +01:00
25bd020b93 docs: document ComfyUI setup and integration
- Add ComfyUI service to AI stack service list
- Document ComfyUI proxy architecture and configuration
- Include deployment instructions via Ansible
- Explain network topology and access flow
- Add proxy configuration details (nginx, Tailscale, Authelia)
- Document RunPod setup process and model integration
2025-11-21 21:03:35 +01:00
904f7d3c2e feat(ai): add ComfyUI proxy service with Authelia SSO
- Add ComfyUI service to AI stack using nginx:alpine as reverse proxy
- Proxy to RunPod ComfyUI via Tailscale (100.121.199.88:8188)
- Configure Traefik routing for comfy.ai.pivoine.art
- Enable Authelia SSO middleware (net-authelia)
- Support WebSocket connections for real-time updates
- Set appropriate timeouts for image generation (300s)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 20:56:20 +01:00
9a964cff3c feat: add Flux image generation function for Open WebUI
- Add flux_image_gen.py manifold function for Flux.1 Schnell
- Auto-mount functions via Docker volume (./functions:/app/backend/data/functions:ro)
- Add comprehensive setup guide in FLUX_SETUP.md
- Update CLAUDE.md with Flux integration documentation
- Infrastructure as code approach - no manual import needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 20:20:33 +01:00
0999e5d29f feat: re-enable Redis caching in LiteLLM now that streaming is fixed 2025-11-21 19:40:57 +01:00
ec903c16c2 fix: use hosted_vllm/openai/ prefix for vLLM model via orchestrator 2025-11-21 19:18:33 +01:00
155016da97 debug: enable DEBUG logging for LiteLLM to troubleshoot streaming 2025-11-21 19:10:00 +01:00
c81f312e9e fix: use correct vLLM model ID from /v1/models endpoint 2025-11-21 19:06:56 +01:00
fe0cf487ee fix: use correct vLLM model name with hosted_vllm prefix 2025-11-21 19:02:44 +01:00
81d4058c5d revert: back to openai prefix for vLLM OpenAI-compatible endpoint 2025-11-21 18:57:10 +01:00
4a575bc0da fix: use hosted_vllm prefix instead of openai for vLLM streaming compatibility 2025-11-21 18:54:40 +01:00
01a345979b fix: disable drop_params to preserve streaming metadata in LiteLLM
- Set drop_params: false in litellm_settings
- Set modify_params: false in litellm_settings
- Set drop_params: false in default_litellm_params
- Commented out LITELLM_DROP_PARAMS env var
- Removed --drop_params command flag

These settings were stripping critical streaming parameters causing
vLLM streaming responses to collapse into empty deltas
2025-11-21 18:46:33 +01:00
c58b5d36ba revert: remove direct WebUI connection, focus on fixing LiteLLM streaming
- Reverted direct orchestrator connection to WebUI
- Added stream: true parameter to qwen-2.5-7b model config
- Keep LiteLLM as single proxy for all models
2025-11-21 18:42:46 +01:00
62fcf832da feat: add direct RunPod orchestrator connection to WebUI for streaming bypass
- Configure WebUI with both LiteLLM and direct orchestrator API base URLs
- This bypasses LiteLLM's streaming issues for the qwen-2.5-7b model
- WebUI will now show models from both endpoints
- Allows testing if LiteLLM is the bottleneck for streaming

Related to streaming fix in RunPod models/vllm/server.py
2025-11-21 18:38:31 +01:00
dfde1df72f fix: add /v1 suffix to vLLM api_base for proper endpoint routing 2025-11-21 18:00:53 +01:00
42a68bc0b5 fix: revert to openai prefix, remove /v1 suffix from api_base
- Changed back from hosted_vllm/qwen-2.5-7b to openai/qwen-2.5-7b
- Removed /v1 suffix from api_base (LiteLLM adds it automatically)
- Added supports_system_messages: false for vLLM compatibility
2025-11-21 17:55:10 +01:00
699c8537b0 fix: use LiteLLM vLLM pass-through for qwen model
- Changed model from openai/qwen-2.5-7b to hosted_vllm/qwen-2.5-7b
- Implements proper vLLM integration per LiteLLM docs
- Fixes streaming response forwarding issue
2025-11-21 17:52:34 +01:00
ed4d537499 Enable verbose logging in LiteLLM for streaming debug 2025-11-21 17:43:34 +01:00
103bbbad51 debug: enable INFO logging in LiteLLM for troubleshooting
Enable detailed logging to debug qwen model requests from WebUI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 17:13:38 +01:00
92a7436716 fix(ai): add 600s timeout for qwen model requests via Tailscale 2025-11-21 17:06:01 +01:00
6aea9d018e feat(ai): disable Ollama API in WebUI, use LiteLLM only 2025-11-21 16:57:20 +01:00
e2e0927291 feat: update LiteLLM to use RunPod GPU via Tailscale
- Update api_base URLs from 100.100.108.13 to 100.121.199.88 (RunPod Tailscale IP)
- All self-hosted models (qwen-2.5-7b, flux-schnell, musicgen-medium) now route through Tailscale VPN
- Tested and verified connectivity between VPS and RunPod GPU orchestrator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:42:27 +01:00
a5ed2be933 docs: remove outdated ai/README.md
Removed outdated AI infrastructure README that referenced GPU services.
VPS AI services (Open WebUI, Crawl4AI, facefusion) are documented in compose.yaml comments.
GPU infrastructure docs are now in dedicated runpod repository.
2025-11-21 14:42:23 +01:00
d5e37dbd3f cleanup: remove GPU/RunPod files from docker-compose repository
Removed GPU orchestration files migrated to dedicated runpod repository:
- Model orchestrator, vLLM, Flux, MusicGen services
- GPU Docker Compose files and configs
- GPU deployment scripts and documentation

Kept VPS AI services and facefusion:
- compose.yaml (VPS AI + facefusion)
- litellm-config.yaml (VPS LiteLLM)
- postgres/ (VPS PostgreSQL init)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)
- README.md (updated with runpod reference)

GPU infrastructure now maintained at: ssh://git@dev.pivoine.art:2222/valknar/runpod.git
2025-11-21 14:41:10 +01:00
abcebd1d9b docs: migrate multi-modal AI orchestration to dedicated runpod repository
Multi-modal AI stack (text/image/music generation) has been moved to:
Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git

Updated ai/README.md to document:
- VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL)
- Reference to new runpod repository for GPU infrastructure
- Clear separation between VPS and GPU deployments
- Integration architecture via Tailscale VPN
2025-11-21 14:36:36 +01:00
3ed3e68271 feat(ai): add multi-modal orchestration system for text, image, and music generation
Implemented a cost-optimized AI infrastructure running on single RTX 4090 GPU with
automatic model switching based on request type. This enables text, image, and
music generation on the same hardware with sequential loading.

## New Components

**Model Orchestrator** (ai/model-orchestrator/):
- FastAPI service managing model lifecycle
- Automatic model detection and switching based on request type
- OpenAI-compatible API proxy for all models
- Simple YAML configuration for adding new models
- Docker SDK integration for service management
- Endpoints: /v1/chat/completions, /v1/images/generations, /v1/audio/generations

**Text Generation** (ai/vllm/):
- Reorganized existing vLLM server into proper structure
- Qwen 2.5 7B Instruct (14GB VRAM, ~50 tok/sec)
- Docker containerized with CUDA 12.4 support

**Image Generation** (ai/flux/):
- Flux.1 Schnell for fast, high-quality images
- 14GB VRAM, 4-5 sec per image
- OpenAI DALL-E compatible API
- Pre-built image: ghcr.io/matatonic/openedai-images-flux

**Music Generation** (ai/musicgen/):
- Meta's MusicGen Medium (facebook/musicgen-medium)
- Text-to-music generation (11GB VRAM)
- 60-90 seconds for 30s audio clips
- Custom FastAPI wrapper with AudioCraft

## Architecture

```
VPS (LiteLLM) → Tailscale VPN → GPU Orchestrator (Port 9000)
                                       ↓
                       ┌───────────────┼───────────────┐
                  vLLM (8001)    Flux (8002)    MusicGen (8003)
                   [Only ONE active at a time - sequential loading]
```

## Configuration Files

- docker-compose.gpu.yaml: Main orchestration file for RunPod deployment
- model-orchestrator/models.yaml: Model registry (easy to add new models)
- .env.example: Environment variable template
- README.md: Comprehensive deployment and usage guide

## Updated Files

- litellm-config.yaml: Updated to route through orchestrator (port 9000)
- GPU_DEPLOYMENT_LOG.md: Documented multi-modal architecture

## Features

 Automatic model switching (30-120s latency)
 Cost-optimized single GPU deployment (~$0.50/hr vs ~$0.75/hr multi-GPU)
 Easy model addition via YAML configuration
 OpenAI-compatible APIs for all model types
 Centralized routing through LiteLLM proxy
 GPU memory safety (only one model loaded at time)

## Usage

Deploy to RunPod:
```bash
scp -r ai/* gpu-pivoine:/workspace/ai/
ssh gpu-pivoine "cd /workspace/ai && docker compose -f docker-compose.gpu.yaml up -d orchestrator"
```

Test models:
```bash
# Text
curl http://100.100.108.13:9000/v1/chat/completions -d '{"model":"qwen-2.5-7b","messages":[...]}'

# Image
curl http://100.100.108.13:9000/v1/images/generations -d '{"model":"flux-schnell","prompt":"..."}'

# Music
curl http://100.100.108.13:9000/v1/audio/generations -d '{"model":"musicgen-medium","prompt":"..."}'
```

All models available via Open WebUI at https://ai.pivoine.art

## Adding New Models

1. Add entry to models.yaml
2. Define Docker service in docker-compose.gpu.yaml
3. Restart orchestrator

That's it! The orchestrator automatically detects and manages the new model.

## Performance

| Model | VRAM | Startup | Speed |
|-------|------|---------|-------|
| Qwen 2.5 7B | 14GB | 120s | ~50 tok/sec |
| Flux.1 Schnell | 14GB | 60s | 4-5s/image |
| MusicGen Medium | 11GB | 45s | 60-90s for 30s audio |

Model switching overhead: 30-120 seconds

## License Notes

- vLLM: Apache 2.0
- Flux.1: Apache 2.0
- AudioCraft: MIT (code), CC-BY-NC (pre-trained weights - non-commercial)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:12:13 +01:00
bb3dabcba7 feat(ai): complete GPU deployment with self-hosted Qwen 2.5 7B model
This commit finalizes the GPU infrastructure deployment on RunPod:

- Added qwen-2.5-7b model to LiteLLM configuration
  - Self-hosted on RunPod RTX 4090 GPU server
  - Connected via Tailscale VPN (100.100.108.13:8000)
  - OpenAI-compatible API endpoint
  - Rate limits: 1000 RPM, 100k TPM

- Marked GPU deployment as COMPLETE in deployment log
  - vLLM 0.6.4.post1 with custom AsyncLLMEngine server
  - Qwen/Qwen2.5-7B-Instruct model (14.25 GB)
  - 85% GPU memory utilization, 4096 context length
  - Successfully integrated with Open WebUI at ai.pivoine.art

Infrastructure:
- Provider: RunPod Spot Instance (~$0.50/hr)
- GPU: NVIDIA RTX 4090 24GB
- Disk: 50GB local SSD + 922TB network volume
- VPN: Tailscale (replaces WireGuard due to RunPod UDP restrictions)

Model now visible and accessible in Open WebUI for end users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 13:18:17 +01:00
22 changed files with 1012 additions and 4640 deletions

View File

@@ -25,7 +25,7 @@ Root `compose.yaml` uses Docker Compose's `include` directive to orchestrate mul
- **kit**: Unified toolkit with Vert file converter and miniPaint image editor (path-routed) - **kit**: Unified toolkit with Vert file converter and miniPaint image editor (path-routed)
- **jelly**: Jellyfin media server with hardware transcoding - **jelly**: Jellyfin media server with hardware transcoding
- **drop**: PairDrop peer-to-peer file sharing - **drop**: PairDrop peer-to-peer file sharing
- **ai**: AI infrastructure with Open WebUI, Crawl4AI, and pgvector (PostgreSQL) - **ai**: AI infrastructure with Open WebUI, ComfyUI proxy, Crawl4AI, and pgvector (PostgreSQL)
- **asciinema**: Terminal recording and sharing platform (PostgreSQL) - **asciinema**: Terminal recording and sharing platform (PostgreSQL)
- **restic**: Backrest backup system with restic backend - **restic**: Backrest backup system with restic backend
- **netdata**: Real-time infrastructure monitoring - **netdata**: Real-time infrastructure monitoring
@@ -451,11 +451,13 @@ AI infrastructure with Open WebUI, Crawl4AI, and dedicated PostgreSQL with pgvec
- User signup enabled - User signup enabled
- Data persisted in `ai_webui_data` volume - Data persisted in `ai_webui_data` volume
- **crawl4ai**: Crawl4AI web scraping service (internal API, no public access) - **comfyui**: ComfyUI reverse proxy exposed at `comfy.ai.pivoine.art:80`
- Optimized web scraper for LLM content preparation - Nginx-based proxy to ComfyUI running on RunPod GPU server
- Internal API on port 11235 (not exposed via Traefik) - Node-based UI for Flux.1 Schnell image generation workflows
- Designed for integration with Open WebUI and n8n workflows - Proxies to RunPod via Tailscale VPN (100.121.199.88:8188)
- Data persisted in `ai_crawl4ai_data` volume - Protected by Authelia SSO authentication
- WebSocket support for real-time updates
- Stateless architecture (no data persistence on VPS)
**Configuration**: **Configuration**:
- **Claude Integration**: Uses Anthropic API with OpenAI-compatible endpoint - **Claude Integration**: Uses Anthropic API with OpenAI-compatible endpoint
@@ -476,11 +478,71 @@ AI infrastructure with Open WebUI, Crawl4AI, and dedicated PostgreSQL with pgvec
4. Use web search feature for current information 4. Use web search feature for current information
5. Integrate with n8n workflows for automation 5. Integrate with n8n workflows for automation
**Flux Image Generation** (`functions/flux_image_gen.py`):
Open WebUI function for generating images via Flux.1 Schnell on RunPod GPU:
- Manifold function adds "Flux.1 Schnell (4-5s)" model to Open WebUI
- Routes requests through LiteLLM → Orchestrator → RunPod Flux
- Generates 1024x1024 images in 4-5 seconds
- Returns images as base64-encoded markdown
- Configuration via Valves (API base, timeout, default size)
- **Automatically loaded via Docker volume mount** (`./functions:/app/backend/data/functions:ro`)
**Deployment**:
- Function file tracked in `ai/functions/` directory
- Automatically available after `pnpm arty up -d ai_webui`
- No manual import required - infrastructure as code
See `ai/FLUX_SETUP.md` for detailed setup instructions and troubleshooting.
**ComfyUI Image Generation**:
ComfyUI provides a professional node-based interface for creating Flux image generation workflows:
**Architecture**:
```
User → Traefik (VPS) → Authelia SSO → ComfyUI Proxy (nginx) → Tailscale → ComfyUI (RunPod:8188) → Flux Model (GPU)
```
**Access**:
1. Navigate to https://comfy.ai.pivoine.art
2. Authenticate via Authelia SSO
3. Create node-based workflows in ComfyUI interface
4. Use Flux.1 Schnell model from HuggingFace cache at `/workspace/ComfyUI/models/huggingface_cache`
**RunPod Setup** (via Ansible):
ComfyUI is installed on RunPod using the Ansible playbook at `/home/valknar/Projects/runpod/playbook.yml`:
- Clone ComfyUI from https://github.com/comfyanonymous/ComfyUI
- Install dependencies from `models/comfyui/requirements.txt`
- Create model directory structure (checkpoints, unet, vae, loras, clip, controlnet)
- Symlink Flux model from HuggingFace cache
- Start service via `models/comfyui/start.sh` on port 8188
**To deploy ComfyUI on RunPod**:
```bash
# Run Ansible playbook with comfyui tag
ssh -p 16186 root@213.173.110.150
cd /workspace/ai
ansible-playbook playbook.yml --tags comfyui --skip-tags always
# Start ComfyUI service
bash models/comfyui/start.sh &
```
**Proxy Configuration**:
The VPS runs an nginx proxy (`ai/comfyui-nginx.conf`) that:
- Listens on port 80 inside container
- Forwards to RunPod via Tailscale (100.121.199.88:8188)
- Supports WebSocket upgrades for real-time updates
- Handles large file uploads (100M limit)
- Uses extended timeouts for long-running generations (300s)
**Note**: ComfyUI runs directly on RunPod GPU server, not in a container. All data is stored on RunPod's `/workspace` volume.
**Integration Points**: **Integration Points**:
- **n8n**: Workflow automation with AI tasks (scraping, RAG ingestion, webhooks) - **n8n**: Workflow automation with AI tasks (scraping, RAG ingestion, webhooks)
- **Mattermost**: Can send AI-generated notifications via webhooks - **Mattermost**: Can send AI-generated notifications via webhooks
- **Crawl4AI**: Internal API for advanced web scraping - **Crawl4AI**: Internal API for advanced web scraping
- **Claude API**: Primary LLM provider via Anthropic - **Claude API**: Primary LLM provider via Anthropic
- **Flux via RunPod**: Image generation through orchestrator (GPU server) or ComfyUI
**Future Enhancements**: **Future Enhancements**:
- GPU server integration (IONOS A10 planned) - GPU server integration (IONOS A10 planned)
@@ -659,7 +721,7 @@ Backrest backup system with restic backend:
- Retention: 7 daily, 4 weekly, 3 monthly - Retention: 7 daily, 4 weekly, 3 monthly
16. **ai-backup** (3 AM daily) 16. **ai-backup** (3 AM daily)
- Paths: `/volumes/ai_postgres_data`, `/volumes/ai_webui_data`, `/volumes/ai_crawl4ai_data` - Paths: `/volumes/ai_postgres_data`, `/volumes/ai_webui_data`
- Retention: 7 daily, 4 weekly, 6 monthly, 2 yearly - Retention: 7 daily, 4 weekly, 6 monthly, 2 yearly
17. **asciinema-backup** (11 AM daily) 17. **asciinema-backup** (11 AM daily)
@@ -670,8 +732,7 @@ Backrest backup system with restic backend:
All Docker volumes are mounted read-only to `/volumes/` with prefixed names (e.g., `backup_core_postgres_data`) to avoid naming conflicts with other compose stacks. All Docker volumes are mounted read-only to `/volumes/` with prefixed names (e.g., `backup_core_postgres_data`) to avoid naming conflicts with other compose stacks.
**Configuration Management**: **Configuration Management**:
- `config.json` template in repository defines all backup plans - `core/backrest/config.json` in repository defines all backup plans (bind-mounted to container)
- On first run, copy config into volume: `docker cp restic/config.json restic_app:/config/config.json`
- Config version must be `4` for Backrest 1.10.1 compatibility - Config version must be `4` for Backrest 1.10.1 compatibility
- Backrest manages auth automatically (username: `valknar`, password set via web UI on first access) - Backrest manages auth automatically (username: `valknar`, password set via web UI on first access)
@@ -709,7 +770,7 @@ Each service uses named volumes prefixed with project name:
- `vault_data`: Vaultwarden password vault (SQLite database) - `vault_data`: Vaultwarden password vault (SQLite database)
- `joplin_data`: Joplin note-taking data - `joplin_data`: Joplin note-taking data
- `jelly_config`: Jellyfin media server configuration - `jelly_config`: Jellyfin media server configuration
- `ai_postgres_data`, `ai_webui_data`, `ai_crawl4ai_data`: AI stack databases and application data - `ai_postgres_data`, `ai_webui_data`: AI stack databases and application data
- `netdata_config`: Netdata monitoring configuration - `netdata_config`: Netdata monitoring configuration
- `restic_data`, `restic_config`, `restic_cache`, `restic_tmp`: Backrest backup system - `restic_data`, `restic_config`, `restic_cache`, `restic_tmp`: Backrest backup system
- `proxy_letsencrypt_data`: SSL certificates - `proxy_letsencrypt_data`: SSL certificates

View File

@@ -406,11 +406,10 @@ THE FALCON (falcon_network)
│ ├─ vaultwarden [vault.pivoine.art] → Password Manager │ ├─ vaultwarden [vault.pivoine.art] → Password Manager
│ └─ tandoor [tandoor.pivoine.art] → Recipe Manager │ └─ tandoor [tandoor.pivoine.art] → Recipe Manager
├─ 🤖 AI STACK (5 services) ├─ 🤖 AI STACK (4 services)
│ ├─ ai_postgres [Internal] → pgvector Database │ ├─ ai_postgres [Internal] → pgvector Database
│ ├─ webui [ai.pivoine.art] → Open WebUI (Claude) │ ├─ webui [ai.pivoine.art] → Open WebUI (Claude)
│ ├─ litellm [llm.ai.pivoine.art] → API Proxy │ ├─ litellm [llm.ai.pivoine.art] → API Proxy
│ ├─ crawl4ai [Internal:11235] → Web Scraper
│ └─ facefusion [facefusion.ai.pivoine.art] → Face AI │ └─ facefusion [facefusion.ai.pivoine.art] → Face AI
├─ 🛡️ NET STACK (4 services) ├─ 🛡️ NET STACK (4 services)
@@ -435,7 +434,7 @@ THE FALCON (falcon_network)
├─ Core: postgres_data, redis_data, backrest_* ├─ Core: postgres_data, redis_data, backrest_*
├─ Sexy: directus_uploads, directus_bundle ├─ Sexy: directus_uploads, directus_bundle
├─ Util: pairdrop_*, joplin_data, linkwarden_*, mattermost_*, vaultwarden_data, tandoor_* ├─ Util: pairdrop_*, joplin_data, linkwarden_*, mattermost_*, vaultwarden_data, tandoor_*
├─ AI: ai_postgres_data, ai_webui_data, ai_crawl4ai_data, facefusion_* ├─ AI: ai_postgres_data, ai_webui_data, facefusion_*
├─ Net: letsencrypt_data, netdata_* ├─ Net: letsencrypt_data, netdata_*
├─ Media: jelly_config, jelly_cache, filestash_data ├─ Media: jelly_config, jelly_cache, filestash_data
└─ Dev: gitea_*, coolify_data, n8n_data, asciinema_data └─ Dev: gitea_*, coolify_data, n8n_data, asciinema_data

View File

@@ -1,430 +0,0 @@
# Docker & NVIDIA Container Toolkit Setup
## Day 5: Docker Configuration on GPU Server
This guide sets up Docker with GPU support on your RunPod server.
---
## Step 1: Install Docker
### Quick Install (Recommended)
```bash
# SSH into GPU server
ssh gpu-pivoine
# Download and run Docker install script
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
# Verify installation
docker --version
docker compose version
```
Expected output:
```
Docker version 24.0.7, build afdd53b
Docker Compose version v2.23.0
```
### Manual Install (Alternative)
```bash
# Add Docker's official GPG key
apt-get update
apt-get install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
# Add repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Start Docker
systemctl enable docker
systemctl start docker
```
---
## Step 2: Install NVIDIA Container Toolkit
This enables Docker containers to use the GPU.
```bash
# Add NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install toolkit
apt-get update
apt-get install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
nvidia-ctk runtime configure --runtime=docker
# Restart Docker
systemctl restart docker
```
---
## Step 3: Test GPU Access in Docker
### Test 1: Basic CUDA Container
```bash
docker run --rm --runtime=nvidia --gpus all \
nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```
Expected output: Same as `nvidia-smi` output showing your RTX 4090.
### Test 2: PyTorch Container
```bash
docker run --rm --runtime=nvidia --gpus all \
pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime \
python -c "import torch; print('CUDA:', torch.cuda.is_available(), 'Device:', torch.cuda.get_device_name(0))"
```
Expected output:
```
CUDA: True Device: NVIDIA GeForce RTX 4090
```
### Test 3: Multi-GPU Query (if you have multiple GPUs)
```bash
docker run --rm --runtime=nvidia --gpus all \
nvidia/cuda:12.1.0-base-ubuntu22.04 \
bash -c "echo 'GPU Count:' && nvidia-smi --list-gpus"
```
---
## Step 4: Configure Docker Compose with GPU Support
Docker Compose needs to know about NVIDIA runtime.
### Create daemon.json
```bash
cat > /etc/docker/daemon.json << 'EOF'
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
EOF
# Restart Docker
systemctl restart docker
```
---
## Step 5: Create GPU Project Structure
```bash
cd /workspace
# Create directory structure
mkdir -p gpu-stack/{vllm,comfyui,training,jupyter,monitoring}
cd gpu-stack
# Create .env file
cat > .env << 'EOF'
# GPU Stack Environment Variables
# Timezone
TIMEZONE=Europe/Berlin
# VPN Network
VPS_IP=10.8.0.1
GPU_IP=10.8.0.2
# Model Storage
MODELS_PATH=/workspace/models
# Hugging Face (optional, for private models)
HF_TOKEN=
# PostgreSQL (on VPS)
DB_HOST=10.8.0.1
DB_PORT=5432
DB_USER=valknar
DB_PASSWORD=ragnarok98
DB_NAME=openwebui
# Weights & Biases (optional, for training logging)
WANDB_API_KEY=
EOF
chmod 600 .env
```
---
## Step 6: Test Full Stack (Quick Smoke Test)
Let's deploy a minimal vLLM container to verify everything works:
```bash
cd /workspace/gpu-stack
# Create test compose file
cat > test-compose.yaml << 'EOF'
services:
test-vllm:
image: vllm/vllm-openai:latest
container_name: test_vllm
runtime: nvidia
environment:
NVIDIA_VISIBLE_DEVICES: all
command:
- --model
- facebook/opt-125m # Tiny model for testing
- --host
- 0.0.0.0
- --port
- 8000
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
EOF
# Start test
docker compose -f test-compose.yaml up -d
# Wait 30 seconds for model download
sleep 30
# Check logs
docker compose -f test-compose.yaml logs
# Test inference
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"prompt": "Hello, my name is",
"max_tokens": 10
}'
```
Expected output (JSON response with generated text).
**Clean up test:**
```bash
docker compose -f test-compose.yaml down
```
---
## Step 7: Install Additional Tools
```bash
# Python tools
apt install -y python3-pip python3-venv
# Monitoring tools
apt install -y htop nvtop iotop
# Network tools
apt install -y iperf3 tcpdump
# Development tools
apt install -y build-essential
# Git LFS (for large model files)
apt install -y git-lfs
git lfs install
```
---
## Step 8: Configure Automatic Updates (Optional)
```bash
# Install unattended-upgrades
apt install -y unattended-upgrades
# Configure
dpkg-reconfigure -plow unattended-upgrades
# Enable automatic security updates
cat > /etc/apt/apt.conf.d/50unattended-upgrades << 'EOF'
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
EOF
```
---
## Troubleshooting
### Docker can't access GPU
**Problem:** `docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]`
**Solution:**
```bash
# Verify NVIDIA runtime is configured
docker info | grep -i runtime
# Should show nvidia in runtimes list
# If not, reinstall nvidia-container-toolkit
# Check daemon.json
cat /etc/docker/daemon.json
# Restart Docker
systemctl restart docker
```
### Permission denied on docker commands
**Solution:**
```bash
# Add your user to docker group (if not root)
usermod -aG docker $USER
# Or always use sudo
sudo docker ...
```
### Out of disk space
**Check usage:**
```bash
df -h
du -sh /var/lib/docker
docker system df
```
**Clean up:**
```bash
# Remove unused images
docker image prune -a
# Remove unused volumes
docker volume prune
# Full cleanup
docker system prune -a --volumes
```
---
## Verification Checklist
Before deploying the full stack:
- [ ] Docker installed and running
- [ ] `docker --version` shows 24.x or newer
- [ ] `docker compose version` works
- [ ] NVIDIA Container Toolkit installed
- [ ] `docker run --gpus all nvidia/cuda:12.1.0-base nvidia-smi` works
- [ ] PyTorch container can see GPU
- [ ] Test vLLM deployment successful
- [ ] /workspace directory structure created
- [ ] .env file configured with VPN IPs
- [ ] Additional tools installed (nvtop, htop, etc.)
---
## Performance Monitoring Commands
**GPU Monitoring:**
```bash
# Real-time GPU stats
watch -n 1 nvidia-smi
# Or with nvtop (prettier)
nvtop
# GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
```
**Docker Stats:**
```bash
# Container resource usage
docker stats
# Specific container
docker stats vllm --no-stream
```
**System Resources:**
```bash
# Overall system
htop
# I/O stats
iotop
# Network
iftop
```
---
## Next: Deploy Production Stack
Now you're ready to deploy the full GPU stack with vLLM, ComfyUI, and training tools.
**Proceed to:** Deploying the production docker-compose.yaml
**Save your progress:**
```bash
cat >> /workspace/SERVER_INFO.md << 'EOF'
## Docker Configuration
- Docker Version: [docker --version]
- NVIDIA Runtime: Enabled
- GPU Access in Containers: ✓
- Test vLLM Deployment: Successful
- Directory: /workspace/gpu-stack
## Tools Installed
- nvtop: GPU monitoring
- htop: System monitoring
- Docker Compose: v2.x
- Git LFS: Large file support
EOF
```

View File

@@ -1,173 +0,0 @@
# GPU Server Deployment Log
## Current Deployment (2025-11-21)
### Infrastructure
- **Provider**: RunPod (Spot Instance)
- **GPU**: NVIDIA RTX 4090 24GB
- **Disk**: 50GB local SSD (expanded from 20GB)
- **Network Volume**: 922TB at `/workspace`
- **Region**: Europe
- **Cost**: ~$0.50/hour (~$360/month if running 24/7)
### Network Configuration
- **VPN**: Tailscale (replaces WireGuard due to RunPod UDP restrictions)
- **GPU Server Tailscale IP**: 100.100.108.13
- **VPS Tailscale IP**: (get with `tailscale ip -4` on VPS)
### SSH Access
```
Host gpu-pivoine
HostName 213.173.102.232
Port 29695
User root
IdentityFile ~/.ssh/id_ed25519
```
**Note**: RunPod Spot instances can be terminated and restarted with new ports/IPs. Update SSH config accordingly.
### Software Stack
- **Python**: 3.11.10
- **vLLM**: 0.6.4.post1 (installed with pip)
- **PyTorch**: 2.5.1 with CUDA 12.4
- **Tailscale**: Installed via official script
### vLLM Deployment
**Custom Server**: `ai/simple_vllm_server.py`
- Uses `AsyncLLMEngine` directly to bypass multiprocessing issues
- OpenAI-compatible API endpoints:
- `GET /v1/models` - List available models
- `POST /v1/completions` - Text completion
- `POST /v1/chat/completions` - Chat completion
- Default model: Qwen/Qwen2.5-7B-Instruct
- Cache directory: `/workspace/huggingface_cache`
**Deployment Command**:
```bash
# Copy server script to GPU server
scp ai/simple_vllm_server.py gpu-pivoine:/workspace/
# Start server
ssh gpu-pivoine "cd /workspace && nohup python3 simple_vllm_server.py > vllm.log 2>&1 &"
# Check status
ssh gpu-pivoine "curl http://localhost:8000/v1/models"
```
**Server Configuration** (environment variables):
- `VLLM_HOST`: 0.0.0.0 (default)
- `VLLM_PORT`: 8000 (default)
### Model Configuration
- **Model**: Qwen/Qwen2.5-7B-Instruct (no auth required)
- **Context Length**: 4096 tokens
- **GPU Memory**: 85% utilization
- **Tensor Parallel**: 1 (single GPU)
### Known Issues & Solutions
#### Issue 1: vLLM Multiprocessing Errors
**Problem**: Default vLLM v1 engine fails with ZMQ/CUDA multiprocessing errors on RunPod.
**Solution**: Custom `AsyncLLMEngine` FastAPI server bypasses multiprocessing layer entirely.
#### Issue 2: Disk Space (Solved)
**Problem**: Original 20GB disk filled up with Hugging Face cache.
**Solution**: Expanded to 50GB and use `/workspace` for model cache.
#### Issue 3: Gated Models
**Problem**: Llama models require Hugging Face authentication.
**Solution**: Use Qwen 2.5 7B Instruct (no auth required) or set `HF_TOKEN` environment variable.
#### Issue 4: Spot Instance Volatility
**Problem**: RunPod Spot instances can be terminated anytime.
**Solution**: Accept as trade-off for cost savings. Document SSH details for quick reconnection.
### Monitoring
**Check vLLM logs**:
```bash
ssh gpu-pivoine "tail -f /workspace/vllm.log"
```
**Check GPU usage**:
```bash
ssh gpu-pivoine "nvidia-smi"
```
**Check Tailscale status**:
```bash
ssh gpu-pivoine "tailscale status"
```
**Test API locally (on GPU server)**:
```bash
ssh gpu-pivoine "curl http://localhost:8000/v1/models"
```
**Test API via Tailscale (from VPS)**:
```bash
curl http://100.100.108.13:8000/v1/models
```
### LiteLLM Integration
Update VPS LiteLLM config at `ai/litellm-config-gpu.yaml`:
```yaml
# Replace old WireGuard IP (10.8.0.2) with Tailscale IP
- model_name: qwen-2.5-7b
litellm_params:
model: openai/qwen-2.5-7b
api_base: http://100.100.108.13:8000/v1 # Tailscale IP
api_key: dummy
rpm: 1000
tpm: 100000
```
Restart LiteLLM:
```bash
arty restart litellm
```
### Troubleshooting
**Server not responding**:
1. Check if process is running: `pgrep -f simple_vllm_server`
2. Check logs: `tail -100 /workspace/vllm.log`
3. Check GPU availability: `nvidia-smi`
4. Restart server: `pkill -f simple_vllm_server && python3 /workspace/simple_vllm_server.py &`
**Tailscale not connected**:
1. Check status: `tailscale status`
2. Check daemon: `ps aux | grep tailscaled`
3. Restart: `tailscale down && tailscale up`
**Model download failing**:
1. Check disk space: `df -h`
2. Check cache directory: `ls -lah /workspace/huggingface_cache`
3. Clear cache if needed: `rm -rf /workspace/huggingface_cache/*`
### Next Steps
1. ✅ Deploy vLLM with Qwen 2.5 7B
2. ⏳ Test API endpoints locally and via Tailscale
3. ⏳ Update VPS LiteLLM configuration
4. ⏳ Test end-to-end: Open WebUI → LiteLLM → vLLM
5. ⏹️ Monitor performance and costs
6. ⏹️ Consider adding more models (Mistral, DeepSeek Coder)
7. ⏹️ Set up auto-stop for idle periods to save costs
### Cost Optimization Ideas
1. **Auto-stop**: Configure RunPod to auto-stop after 30 minutes idle
2. **Spot Instances**: Already using Spot for 50% cost reduction
3. **Scheduled Operation**: Run only during business hours (8 hours/day = $120/month)
4. **Smaller Models**: Use Mistral 7B or quantized models for lighter workloads
5. **Pay-as-you-go**: Manually start/stop pod as needed
### Performance Benchmarks
*To be measured after deployment*
Expected (based on RTX 4090):
- Qwen 2.5 7B: 50-80 tokens/second
- Context processing: ~2-3 seconds for 1000 tokens
- First token latency: ~200-300ms

File diff suppressed because it is too large Load Diff

View File

@@ -1,444 +0,0 @@
# GPU-Enhanced AI Stack - Implementation Guide
Welcome to your GPU expansion setup! This directory contains everything you need to deploy a production-ready GPU server for LLM hosting, image generation, and model training.
## 📚 Documentation Files
### Planning & Architecture
- **`GPU_EXPANSION_PLAN.md`** - Complete 70-page plan with provider comparison, architecture, and roadmap
- **`README_GPU_SETUP.md`** - This file
### Step-by-Step Setup Guides
1. **`SETUP_GUIDE.md`** - Day 1-2: RunPod account & GPU server deployment
2. **`WIREGUARD_SETUP.md`** - Day 3-4: VPN connection between VPS and GPU server
3. **`DOCKER_GPU_SETUP.md`** - Day 5: Docker + NVIDIA Container Toolkit configuration
### Configuration Files
- **`gpu-server-compose.yaml`** - Production Docker Compose for GPU server
- **`litellm-config-gpu.yaml`** - Updated LiteLLM config with self-hosted models
- **`deploy-gpu-stack.sh`** - Automated deployment script
---
## 🚀 Quick Start (Week 1 Checklist)
### Day 1-2: RunPod & GPU Server ✓
- [ ] Create RunPod account at https://www.runpod.io/
- [ ] Add billing method ($50 initial credit recommended)
- [ ] Deploy RTX 4090 pod with PyTorch template
- [ ] Configure 500GB network volume
- [ ] Verify SSH access
- [ ] Test GPU with `nvidia-smi`
- [ ] **Guide:** `SETUP_GUIDE.md`
### Day 3-4: Network Configuration ✓
- [ ] Install Tailscale on VPS
- [ ] Install Tailscale on GPU server
- [ ] Authenticate both devices
- [ ] Test VPN connectivity
- [ ] Configure firewall rules
- [ ] Verify VPS can reach GPU server
- [ ] **Guide:** `TAILSCALE_SETUP.md`
### Day 5: Docker & GPU Setup ✓
- [ ] Install Docker on GPU server
- [ ] Install NVIDIA Container Toolkit
- [ ] Test GPU access in containers
- [ ] Create /workspace/gpu-stack directory
- [ ] Copy configuration files
- [ ] **Guide:** `DOCKER_GPU_SETUP.md`
### Day 6-7: Deploy Services ✓
- [ ] Copy `gpu-server-compose.yaml` to GPU server
- [ ] Edit `.env` with your settings
- [ ] Run `./deploy-gpu-stack.sh`
- [ ] Wait for vLLM to load model (~5 minutes)
- [ ] Test vLLM: `curl http://localhost:8000/v1/models`
- [ ] Access ComfyUI: `http://[tailscale-ip]:8188`
- [ ] **Script:** `deploy-gpu-stack.sh`
---
## 📦 Services Included
### vLLM (http://[tailscale-ip]:8000)
**Purpose:** High-performance LLM inference
**Default Model:** Llama 3.1 8B Instruct
**Performance:** 50-80 tokens/second on RTX 4090
**Use for:** General chat, Q&A, code generation, summarization
**Switch models:**
Edit `gpu-server-compose.yaml`, change `--model` parameter, restart:
```bash
docker compose restart vllm
```
### ComfyUI (http://[tailscale-ip]:8188)
**Purpose:** Advanced Stable Diffusion interface
**Features:** FLUX, SDXL, ControlNet, LoRA
**Use for:** Image generation, img2img, inpainting
**Download models:**
Access web UI → ComfyUI Manager → Install Models
### JupyterLab (http://[tailscale-ip]:8888)
**Purpose:** Interactive development environment
**Token:** `pivoine-ai-2025` (change in `.env`)
**Use for:** Research, experimentation, custom training scripts
### Axolotl (Training - on-demand)
**Purpose:** LLM fine-tuning framework
**Start:** `docker compose --profile training up -d axolotl`
**Use for:** LoRA training, full fine-tuning, RLHF
### Netdata (http://[tailscale-ip]:19999)
**Purpose:** System & GPU monitoring
**Features:** Real-time metrics, GPU utilization, memory usage
**Use for:** Performance monitoring, troubleshooting
---
## 🔧 Configuration
### Environment Variables (.env)
```bash
# VPN Network (Tailscale)
VPS_IP=100.x.x.x # Your VPS Tailscale IP (get with: tailscale ip -4)
GPU_IP=100.x.x.x # GPU server Tailscale IP (get with: tailscale ip -4)
# Model Storage
MODELS_PATH=/workspace/models
# Hugging Face Token (for gated models like Llama)
HF_TOKEN=hf_xxxxxxxxxxxxx
# Weights & Biases (for training logging)
WANDB_API_KEY=
# JupyterLab Access
JUPYTER_TOKEN=pivoine-ai-2025
# PostgreSQL (on VPS)
DB_HOST=100.x.x.x # Your VPS Tailscale IP
DB_PORT=5432
DB_USER=valknar
DB_PASSWORD=ragnarok98
DB_NAME=openwebui
```
### Updating LiteLLM on VPS
After GPU server is running, update your VPS LiteLLM config:
```bash
# On VPS
cd ~/Projects/docker-compose/ai
# Backup current config
cp litellm-config.yaml litellm-config.yaml.backup
# Copy new config with GPU models
cp litellm-config-gpu.yaml litellm-config.yaml
# Restart LiteLLM
arty restart litellm
```
Now Open WebUI will have access to both Claude (API) and Llama (self-hosted)!
---
## 💰 Cost Management
### Current Costs (24/7 Operation)
- **GPU Server:** RTX 4090 @ $0.50/hour = $360/month
- **Storage:** 500GB network volume = $50/month
- **Total:** **$410/month**
### Cost-Saving Options
**1. Pay-as-you-go (8 hours/day)**
- GPU: $0.50 × 8 × 30 = $120/month
- Storage: $50/month
- **Total: $170/month**
**2. Auto-stop idle pods**
RunPod can auto-stop after X minutes idle:
- Dashboard → Pod Settings → Auto-stop after 30 minutes
**3. Use smaller models**
- Mistral 7B instead of Llama 8B: Faster, cheaper GPU
- Quantized models: 4-bit = 1/4 the VRAM
**4. Batch image generation**
- Generate multiple images at once
- Use scheduled jobs (cron) during off-peak hours
### Cost Tracking
**Check GPU usage:**
```bash
# On RunPod dashboard
Billing → Usage History
# See hourly costs, total spent
```
**Check API vs GPU savings:**
```bash
# On VPS, check LiteLLM logs
docker logs ai_litellm | grep "model="
# Count requests to llama-3.1-8b vs claude-*
```
**Expected savings:**
- 80% of requests → self-hosted = $0 cost
- 20% of requests → Claude = API cost
- Break-even if currently spending >$500/month on APIs
---
## 🔍 Monitoring & Troubleshooting
### Check Service Status
```bash
# On GPU server
cd /workspace/gpu-stack
# View all services
docker compose ps
# Check specific service logs
docker compose logs -f vllm
docker compose logs -f comfyui
docker compose logs -f jupyter
# Check GPU usage
nvidia-smi
# or prettier:
nvtop
```
### Common Issues
**vLLM not loading model:**
```bash
# Check logs
docker compose logs vllm
# Common causes:
# - Model download in progress (wait 5-10 minutes)
# - Out of VRAM (try smaller model)
# - Missing HF_TOKEN (for gated models like Llama)
```
**ComfyUI slow/crashing:**
```bash
# Check GPU memory
nvidia-smi
# If VRAM full:
# - Close vLLM temporarily
# - Use smaller models
# - Reduce batch size in ComfyUI
```
**Can't access from VPS:**
```bash
# Test VPN
ping [tailscale-ip]
# If fails:
# - Check Tailscale status: tailscale status
# - Restart Tailscale: tailscale down && tailscale up
# - Check firewall: ufw status
```
**Docker can't see GPU:**
```bash
# Test GPU access
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base nvidia-smi
# If fails:
# - Check NVIDIA driver: nvidia-smi
# - Check nvidia-docker: nvidia-ctk --version
# - Restart Docker: systemctl restart docker
```
---
## 📊 Performance Benchmarks
### Expected Performance (RTX 4090)
**LLM Inference (vLLM):**
- Llama 3.1 8B: 50-80 tokens/second
- Qwen 2.5 14B: 30-50 tokens/second
- Batch size 32: ~1500 tokens/second
**Image Generation (ComfyUI):**
- SDXL (1024×1024): ~4-6 seconds
- FLUX (1024×1024): ~8-12 seconds
- SD 1.5 (512×512): ~1-2 seconds
**Training (Axolotl):**
- LoRA fine-tuning (8B model): ~3-5 hours for 3 epochs
- Full fine-tuning: Not recommended on 24GB VRAM
---
## 🔐 Security Best Practices
### Network Security
✅ All services behind Tailscale VPN (end-to-end encrypted)
✅ No public exposure (except RunPod's SSH)
✅ Firewall configured (no additional ports needed)
### Access Control
✅ JupyterLab password-protected
✅ ComfyUI accessible via VPN only
✅ vLLM internal API (no auth needed)
### SSH Security
```bash
# On GPU server, harden SSH
nano /etc/ssh/sshd_config
# Set:
PermitRootLogin prohibit-password
PasswordAuthentication no
PubkeyAuthentication yes
systemctl restart sshd
```
### Regular Updates
```bash
# Weekly updates
apt update && apt upgrade -y
# Update Docker images
docker compose pull
docker compose up -d
```
---
## 📈 Scaling Up
### When to Add More GPUs
**Current limitations (1× RTX 4090):**
- Can run ONE of these at a time:
- 8B LLM at full speed
- 14B LLM at moderate speed
- SDXL image generation
- Training job
**Add 2nd GPU if:**
- You want LLM + image gen simultaneously
- Training + inference at same time
- Multiple users with high demand
**Multi-GPU options:**
- 2× RTX 4090: Run vLLM + ComfyUI separately ($720/month)
- 1× A100 40GB: Larger models (70B with quantization) ($1,080/month)
- Mix: RTX 4090 (inference) + A100 (training) (~$1,300/month)
### Deploying Larger Models
**70B models (need 2× A100 or 4× RTX 4090):**
```yaml
# In gpu-server-compose.yaml
vllm:
command:
- --model
- meta-llama/Meta-Llama-3.1-70B-Instruct
- --tensor-parallel-size
- "2" # Split across 2 GPUs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2 # Use 2 GPUs
capabilities: [gpu]
```
---
## 🎯 Next Steps (Week 2+)
### Week 2: LLM Production Deployment
- [ ] Test Llama 3.1 8B performance
- [ ] Download additional models (Qwen, Mistral)
- [ ] Configure model routing in LiteLLM
- [ ] Set up usage monitoring
- [ ] Benchmark tokens/second for each model
### Week 3: Image Generation
- [ ] Download FLUX and SDXL models
- [ ] Install ComfyUI Manager
- [ ] Download ControlNet models
- [ ] Create sample workflows
- [ ] Test API integration with Open WebUI
### Week 4: Training Infrastructure
- [ ] Prepare a sample dataset
- [ ] Test LoRA fine-tuning with Axolotl
- [ ] Set up Weights & Biases logging
- [ ] Create training documentation
- [ ] Benchmark training speed
---
## 🆘 Getting Help
### Resources
- **RunPod Docs:** https://docs.runpod.io/
- **vLLM Docs:** https://docs.vllm.ai/
- **ComfyUI Wiki:** https://github.com/comfyanonymous/ComfyUI/wiki
- **Axolotl Docs:** https://github.com/OpenAccess-AI-Collective/axolotl
### Community
- **RunPod Discord:** https://discord.gg/runpod
- **vLLM Discord:** https://discord.gg/vllm
- **r/LocalLLaMA:** https://reddit.com/r/LocalLLaMA
### Support
If you encounter issues:
1. Check logs: `docker compose logs -f [service]`
2. Check GPU: `nvidia-smi`
3. Check VPN: `wg show`
4. Restart service: `docker compose restart [service]`
5. Full restart: `docker compose down && docker compose up -d`
---
## ✅ Success Criteria
You're ready to proceed when:
- [ ] GPU server responds to `ping [tailscale-ip]` from VPS
- [ ] vLLM returns models: `curl http://[tailscale-ip]:8000/v1/models`
- [ ] ComfyUI web interface loads: `http://[tailscale-ip]:8188`
- [ ] JupyterLab accessible with token
- [ ] Netdata shows GPU metrics
- [ ] Open WebUI shows both Claude and Llama models
**Total setup time:** 4-6 hours (if following guides sequentially)
---
## 🎉 You're All Set!
Your GPU-enhanced AI stack is ready. You now have:
- ✅ Self-hosted LLM inference (saves $$$)
- ✅ Advanced image generation (FLUX, SDXL)
- ✅ Model training capabilities (LoRA, fine-tuning)
- ✅ Secure VPN connection
- ✅ Full monitoring and logging
Enjoy building with your new AI infrastructure! 🚀

View File

@@ -1,261 +0,0 @@
# GPU Server Setup Guide - Week 1
## Day 1-2: RunPod Account & GPU Server
### Step 1: Create RunPod Account
1. **Go to RunPod**: https://www.runpod.io/
2. **Sign up** with email or GitHub
3. **Add billing method**:
- Credit card required
- No charges until you deploy a pod
- Recommended: Add $50 initial credit
4. **Verify email** and complete account setup
### Step 2: Deploy Your First GPU Pod
#### 2.1 Navigate to Pods
1. Click **"Deploy"** in top menu
2. Select **"GPU Pods"**
#### 2.2 Choose GPU Type
**Recommended: RTX 4090**
- 24GB VRAM
- ~$0.50/hour
- Perfect for LLMs up to 14B params
- Great for SDXL/FLUX
**Filter options:**
- GPU Type: RTX 4090
- GPU Count: 1
- Sort by: Price (lowest first)
- Region: Europe (lower latency to Germany)
#### 2.3 Select Template
Choose: **"RunPod PyTorch"** template
- Includes: CUDA, PyTorch, Python
- Pre-configured for GPU workloads
- Docker pre-installed
**Alternative**: "Ubuntu 22.04 with CUDA 12.1" (more control)
#### 2.4 Configure Pod
**Container Settings:**
- **Container Disk**: 50GB (temporary, auto-included)
- **Expose Ports**:
- Add: 22 (SSH)
- Add: 8000 (vLLM)
- Add: 8188 (ComfyUI)
- Add: 8888 (JupyterLab)
**Volume Settings:**
- Click **"+ Network Volume"**
- **Name**: `gpu-models-storage`
- **Size**: 500GB
- **Region**: Same as pod
- **Cost**: ~$50/month
**Environment Variables:**
- Add later (not needed for initial setup)
#### 2.5 Deploy Pod
1. Review configuration
2. Click **"Deploy On-Demand"** (not Spot for reliability)
3. Wait 2-3 minutes for deployment
**Expected cost:**
- GPU: $0.50/hour = $360/month (24/7)
- Storage: $50/month
- **Total: $410/month**
### Step 3: Access Your GPU Server
#### 3.1 Get Connection Info
Once deployed, you'll see:
- **Pod ID**: e.g., `abc123def456`
- **SSH Command**: `ssh root@<pod-id>.runpod.io -p 12345`
- **Public IP**: May not be directly accessible (use SSH)
#### 3.2 SSH Access
RunPod automatically generates SSH keys for you:
```bash
# Copy the SSH command from RunPod dashboard
ssh root@abc123def456.runpod.io -p 12345
# First time: Accept fingerprint
# You should now be in the GPU server!
```
**Verify GPU:**
```bash
nvidia-smi
```
Expected output:
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx Driver Version: 535.xx CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 30% 45C P0 50W / 450W | 0MiB / 24564MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
```
### Step 4: Initial Server Configuration
#### 4.1 Update System
```bash
# Update package lists
apt update
# Upgrade existing packages
apt upgrade -y
# Install essential tools
apt install -y \
vim \
htop \
tmux \
curl \
wget \
git \
net-tools \
iptables-persistent
```
#### 4.2 Set Timezone
```bash
timedatectl set-timezone Europe/Berlin
date # Verify
```
#### 4.3 Create Working Directory
```bash
# Create workspace
mkdir -p /workspace/{models,configs,data,scripts}
# Check network volume mount
ls -la /workspace
# Should show your 500GB volume
```
#### 4.4 Configure SSH (Optional but Recommended)
**Generate your own SSH key on your local machine:**
```bash
# On your local machine (not GPU server)
ssh-keygen -t ed25519 -C "gpu-server-pivoine" -f ~/.ssh/gpu_pivoine
# Copy public key to GPU server
ssh-copy-id -i ~/.ssh/gpu_pivoine.pub root@abc123def456.runpod.io -p 12345
```
**Add to your local ~/.ssh/config:**
```bash
Host gpu-pivoine
HostName abc123def456.runpod.io
Port 12345
User root
IdentityFile ~/.ssh/gpu_pivoine
```
Now you can connect with: `ssh gpu-pivoine`
### Step 5: Verify GPU Access
Run this test:
```bash
# Test CUDA
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count())"
```
Expected output:
```
CUDA available: True
GPU count: 1
```
### Troubleshooting
**Problem: Can't connect via SSH**
- Check pod is running (not stopped)
- Verify port number in SSH command
- Try web terminal in RunPod dashboard
**Problem: GPU not detected**
- Run `nvidia-smi`
- Check RunPod selected correct GPU type
- Restart pod if needed
**Problem: Network volume not mounted**
- Check RunPod dashboard → Volume tab
- Verify volume is attached to pod
- Try: `df -h` to see mounts
### Next Steps
Once SSH access works and GPU is verified:
✅ Proceed to **Day 3-4: Network Configuration (Tailscale VPN)**
### Save Important Info
Create a file to track your setup:
```bash
# On GPU server
cat > /workspace/SERVER_INFO.md << 'EOF'
# GPU Server Information
## Connection
- SSH: ssh root@abc123def456.runpod.io -p 12345
- Pod ID: abc123def456
- Region: [YOUR_REGION]
## Hardware
- GPU: RTX 4090 24GB
- CPU: [Check with: lscpu]
- RAM: [Check with: free -h]
- Storage: 500GB network volume at /workspace
## Costs
- GPU: $0.50/hour
- Storage: $50/month
- Total: ~$410/month (24/7)
## Deployed: [DATE]
EOF
```
---
## Checkpoint ✓
Before moving to Day 3, verify:
- [ ] RunPod account created and billing added
- [ ] RTX 4090 pod deployed successfully
- [ ] 500GB network volume attached
- [ ] SSH access working
- [ ] `nvidia-smi` shows GPU
- [ ] `torch.cuda.is_available()` returns True
- [ ] Timezone set to Europe/Berlin
- [ ] Essential tools installed
**Ready for Tailscale setup? Let's go!**

View File

@@ -1,417 +0,0 @@
# Tailscale VPN Setup - Better Alternative to WireGuard
## Why Tailscale?
RunPod doesn't support UDP ports, which blocks WireGuard. Tailscale solves this by:
- ✅ Works over HTTPS (TCP) - no UDP needed
- ✅ Zero configuration - automatic setup
- ✅ Free for personal use
- ✅ Built on WireGuard (same security)
- ✅ Automatic NAT traversal
- ✅ Peer-to-peer when possible (low latency)
---
## Step 1: Create Tailscale Account
1. Go to: https://tailscale.com/
2. Click **"Get Started"**
3. Sign up with **GitHub** or **Google** (easiest)
4. You'll be redirected to the Tailscale admin console
**No credit card required!** Free tier is perfect for our use case.
---
## Step 2: Install Tailscale on VPS
**SSH into your VPS:**
```bash
ssh root@vps
```
**Install Tailscale:**
```bash
# Download and run install script
curl -fsSL https://tailscale.com/install.sh | sh
# Start Tailscale
tailscale up
# You'll see a URL like:
# https://login.tailscale.com/a/xxxxxxxxxx
```
**Authenticate:**
1. Copy the URL and open in browser
2. Click **"Connect"** to authorize the device
3. Name it: `pivoine-vps`
**Check status:**
```bash
tailscale status
```
You should see your VPS listed with an IP like `100.x.x.x`
**Save your VPS Tailscale IP:**
```bash
tailscale ip -4
# Example output: 100.101.102.103
```
**Write this down - you'll need it!**
---
## Step 3: Install Tailscale on GPU Server
**SSH into your RunPod GPU server:**
```bash
ssh root@abc123def456-12345678.runpod.io -p 12345
```
**Install Tailscale:**
```bash
# Download and run install script
curl -fsSL https://tailscale.com/install.sh | sh
# Start Tailscale
tailscale up --advertise-tags=tag:gpu
# You'll see another URL
```
**Authenticate:**
1. Copy the URL and open in browser
2. Click **"Connect"**
3. Name it: `gpu-runpod`
**Check status:**
```bash
tailscale status
```
You should now see BOTH devices:
- `pivoine-vps` - 100.x.x.x
- `gpu-runpod` - 100.x.x.x
**Save your GPU server Tailscale IP:**
```bash
tailscale ip -4
# Example output: 100.104.105.106
```
---
## Step 4: Test Connectivity
**From VPS, ping GPU server:**
```bash
# SSH into VPS
ssh root@vps
# Ping GPU server (use its Tailscale IP)
ping 100.104.105.106 -c 4
```
Expected output:
```
PING 100.104.105.106 (100.104.105.106) 56(84) bytes of data.
64 bytes from 100.104.105.106: icmp_seq=1 ttl=64 time=15.3 ms
64 bytes from 100.104.105.106: icmp_seq=2 ttl=64 time=14.8 ms
...
```
**From GPU server, ping VPS:**
```bash
# SSH into GPU server
ssh root@abc123def456-12345678.runpod.io -p 12345
# Ping VPS (use its Tailscale IP)
ping 100.101.102.103 -c 4
```
**Both should work!**
---
## Step 5: Update Configuration Files
Now update the IP addresses in your configs to use Tailscale IPs.
### On GPU Server (.env file)
**Edit your .env file:**
```bash
# On GPU server
cd /workspace/gpu-stack
nano .env
```
**Update these lines:**
```bash
# VPN Network (use your actual Tailscale IPs)
VPS_IP=100.101.102.103 # Your VPS Tailscale IP
GPU_IP=100.104.105.106 # Your GPU Tailscale IP
# PostgreSQL (on VPS)
DB_HOST=100.101.102.103 # Your VPS Tailscale IP
DB_PORT=5432
```
Save and exit (Ctrl+X, Y, Enter)
### On VPS (LiteLLM config)
**Edit your LiteLLM config:**
```bash
# On VPS
ssh root@vps
cd ~/Projects/docker-compose/ai
nano litellm-config-gpu.yaml
```
**Update the GPU server IP:**
```yaml
# Find this section and update IP:
- model_name: llama-3.1-8b
litellm_params:
model: openai/meta-llama/Meta-Llama-3.1-8B-Instruct
api_base: http://100.104.105.106:8000/v1 # Use GPU Tailscale IP
api_key: dummy
```
Save and exit.
---
## Step 6: Verify PostgreSQL Access
**From GPU server, test database connection:**
```bash
# Install PostgreSQL client
apt install -y postgresql-client
# Test connection (use your VPS Tailscale IP)
psql -h 100.101.102.103 -U valknar -d openwebui -c "SELECT 1;"
```
**If this fails, allow Tailscale network on VPS PostgreSQL:**
```bash
# On VPS
ssh root@vps
# Check if postgres allows Tailscale network
docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
# If not present, add it:
docker exec -it core_postgres bash
# Inside container:
echo "host all all 100.0.0.0/8 scram-sha-256" >> /var/lib/postgresql/data/pg_hba.conf
# Restart postgres
exit
docker restart core_postgres
```
Try connecting again - should work now!
---
## Tailscale Management
### View Connected Devices
**Web dashboard:**
https://login.tailscale.com/admin/machines
You'll see all your devices with their Tailscale IPs.
**Command line:**
```bash
tailscale status
```
### Disconnect/Reconnect
```bash
# Stop Tailscale
tailscale down
# Start Tailscale
tailscale up
```
### Remove Device
From web dashboard:
1. Click on device
2. Click "..." menu
3. Select "Disable" or "Delete"
---
## Advantages Over WireGuard
**Works anywhere** - No UDP ports needed
**Auto-reconnect** - Survives network changes
**Multiple devices** - Easy to add laptop, phone, etc.
**NAT traversal** - Direct peer-to-peer when possible
**Access Control** - Manage from web dashboard
**Monitoring** - See connection status in real-time
---
## Security Notes
🔒 **Tailscale is secure:**
- End-to-end encrypted (WireGuard)
- Zero-trust architecture
- No Tailscale servers can see your traffic
- Only authenticated devices can connect
🔒 **Access control:**
- Only devices you authorize can join
- Revoke access anytime from dashboard
- Set ACLs for fine-grained control
---
## Network Reference (Updated)
**Old (WireGuard):**
- VPS: `10.8.0.1`
- GPU: `10.8.0.2`
**New (Tailscale):**
- VPS: `100.101.102.103` (example - use your actual IP)
- GPU: `100.104.105.106` (example - use your actual IP)
**All services now accessible via Tailscale:**
**From VPS to GPU:**
- vLLM: `http://100.104.105.106:8000`
- ComfyUI: `http://100.104.105.106:8188`
- JupyterLab: `http://100.104.105.106:8888`
- Netdata: `http://100.104.105.106:19999`
**From GPU to VPS:**
- PostgreSQL: `100.101.102.103:5432`
- Redis: `100.101.102.103:6379`
- LiteLLM: `http://100.101.102.103:4000`
---
## Troubleshooting
### Can't ping between devices
**Check Tailscale status:**
```bash
tailscale status
```
Both devices should show "active" or "online".
**Check connectivity:**
```bash
tailscale ping 100.104.105.106
```
**Restart Tailscale:**
```bash
tailscale down && tailscale up
```
### PostgreSQL connection refused
**Check if postgres is listening on all interfaces:**
```bash
# On VPS
docker exec core_postgres cat /var/lib/postgresql/data/postgresql.conf | grep listen_addresses
```
Should show: `listen_addresses = '*'`
**Check pg_hba.conf allows Tailscale network:**
```bash
docker exec core_postgres cat /var/lib/postgresql/data/pg_hba.conf | grep 100
```
Should have line:
```
host all all 100.0.0.0/8 scram-sha-256
```
### Device not showing in network
**Re-authenticate:**
```bash
tailscale logout
tailscale up
# Click the new URL to re-authenticate
```
---
## Verification Checklist
Before proceeding:
- [ ] Tailscale account created
- [ ] Tailscale installed on VPS
- [ ] Tailscale installed on GPU server
- [ ] Both devices visible in `tailscale status`
- [ ] VPS can ping GPU server (via Tailscale IP)
- [ ] GPU server can ping VPS (via Tailscale IP)
- [ ] PostgreSQL accessible from GPU server
- [ ] .env file updated with Tailscale IPs
- [ ] LiteLLM config updated with GPU Tailscale IP
---
## Next Steps
**Network configured!** Proceed to Docker & GPU setup:
```bash
cat /home/valknar/Projects/docker-compose/ai/DOCKER_GPU_SETUP.md
```
**Your Tailscale IPs (save these!):**
- VPS: `__________________` (from `tailscale ip -4` on VPS)
- GPU: `__________________` (from `tailscale ip -4` on GPU server)
---
## Bonus: Add Your Local Machine
Want to access GPU server from your laptop?
```bash
# On your local machine
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up
# Now you can SSH directly via Tailscale:
ssh root@100.104.105.106
# Or access ComfyUI in browser:
# http://100.104.105.106:8188
```
No more port forwarding needed! 🎉

View File

@@ -1,393 +0,0 @@
# WireGuard VPN Setup - Connecting GPU Server to VPS
## Day 3-4: Network Configuration
This guide connects your RunPod GPU server to your VPS via WireGuard VPN, enabling secure, low-latency communication.
### Architecture
```
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ VPS (pivoine.art) │ │ GPU Server (RunPod) │
│ 10.8.0.1 (WireGuard) │◄───────►│ 10.8.0.2 (WireGuard) │
├─────────────────────────────┤ ├──────────────────────────────┤
│ - LiteLLM Proxy │ │ - vLLM (10.8.0.2:8000) │
│ - Open WebUI │ │ - ComfyUI (10.8.0.2:8188) │
│ - PostgreSQL │ │ - Training │
└─────────────────────────────┘ └──────────────────────────────┘
```
### Prerequisites
- ✅ VPS with root access
- ✅ GPU server with root access
- ✅ Both servers have public IPs
---
## Method 1: Using Existing wg-easy (Recommended)
You already have `wg-easy` running on your VPS. Let's use it!
### Step 1: Access wg-easy Dashboard
**On your local machine:**
1. Open browser: https://vpn.pivoine.art (or whatever your wg-easy URL is)
2. Login with admin password
**Don't have wg-easy set up? Skip to Method 2.**
### Step 2: Create GPU Server Client
1. In wg-easy dashboard, click **"+ New Client"**
2. **Name**: `gpu-server-runpod`
3. Click **"Create"**
4. **Download** configuration file (or copy QR code data)
You'll get a file like: `gpu-server-runpod.conf`
### Step 3: Install WireGuard on GPU Server
**SSH into GPU server:**
```bash
ssh gpu-pivoine # or your SSH command
# Install WireGuard
apt update
apt install -y wireguard wireguard-tools
```
### Step 4: Configure WireGuard on GPU Server
**Upload the config file:**
```bash
# On your local machine, copy the config to GPU server
scp gpu-server-runpod.conf gpu-pivoine:/etc/wireguard/wg0.conf
# Or manually create it on GPU server:
nano /etc/wireguard/wg0.conf
# Paste the configuration from wg-easy
```
**Example config (yours will be different):**
```ini
[Interface]
PrivateKey = <PRIVATE_KEY_FROM_WG_EASY>
Address = 10.8.0.2/24
DNS = 10.8.0.1
[Peer]
PublicKey = <VPS_PUBLIC_KEY_FROM_WG_EASY>
PresharedKey = <PRESHARED_KEY>
AllowedIPs = 10.8.0.0/24
Endpoint = <VPS_PUBLIC_IP>:51820
PersistentKeepalive = 25
```
### Step 5: Start WireGuard
```bash
# Enable IP forwarding
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
# Set permissions
chmod 600 /etc/wireguard/wg0.conf
# Start WireGuard
systemctl enable wg-quick@wg0
systemctl start wg-quick@wg0
# Check status
systemctl status wg-quick@wg0
wg show
```
Expected output:
```
interface: wg0
public key: <GPU_SERVER_PUBLIC_KEY>
private key: (hidden)
listening port: 51820
peer: <VPS_PUBLIC_KEY>
endpoint: <VPS_IP>:51820
allowed ips: 10.8.0.0/24
latest handshake: 1 second ago
transfer: 1.2 KiB received, 892 B sent
persistent keepalive: every 25 seconds
```
### Step 6: Test Connectivity
**From GPU server, ping VPS:**
```bash
ping 10.8.0.1 -c 4
```
Expected output:
```
PING 10.8.0.1 (10.8.0.1) 56(84) bytes of data.
64 bytes from 10.8.0.1: icmp_seq=1 ttl=64 time=25.3 ms
64 bytes from 10.8.0.1: icmp_seq=2 ttl=64 time=24.8 ms
...
```
**From VPS, ping GPU server:**
```bash
ssh root@vps
ping 10.8.0.2 -c 4
```
**Test PostgreSQL access from GPU server:**
```bash
# On GPU server
apt install -y postgresql-client
# Try connecting to VPS postgres
psql -h 10.8.0.1 -U valknar -d openwebui -c "SELECT 1;"
# Should work if postgres allows 10.8.0.0/24
```
---
## Method 2: Manual WireGuard Setup (If no wg-easy)
### Step 1: Install WireGuard on Both Servers
**On VPS:**
```bash
ssh root@vps
apt update
apt install -y wireguard wireguard-tools
```
**On GPU Server:**
```bash
ssh gpu-pivoine
apt update
apt install -y wireguard wireguard-tools
```
### Step 2: Generate Keys
**On VPS:**
```bash
cd /etc/wireguard
umask 077
wg genkey | tee vps-private.key | wg pubkey > vps-public.key
```
**On GPU Server:**
```bash
cd /etc/wireguard
umask 077
wg genkey | tee gpu-private.key | wg pubkey > gpu-public.key
```
### Step 3: Create Config on VPS
**On VPS (`/etc/wireguard/wg0.conf`):**
```bash
cat > /etc/wireguard/wg0.conf << 'EOF'
[Interface]
PrivateKey = <VPS_PRIVATE_KEY>
Address = 10.8.0.1/24
ListenPort = 51820
SaveConfig = false
# GPU Server Peer
[Peer]
PublicKey = <GPU_PUBLIC_KEY>
AllowedIPs = 10.8.0.2/32
PersistentKeepalive = 25
EOF
```
Replace `<VPS_PRIVATE_KEY>` with contents of `vps-private.key`
Replace `<GPU_PUBLIC_KEY>` with contents from GPU server's `gpu-public.key`
### Step 4: Create Config on GPU Server
**On GPU Server (`/etc/wireguard/wg0.conf`):**
```bash
cat > /etc/wireguard/wg0.conf << 'EOF'
[Interface]
PrivateKey = <GPU_PRIVATE_KEY>
Address = 10.8.0.2/24
[Peer]
PublicKey = <VPS_PUBLIC_KEY>
AllowedIPs = 10.8.0.0/24
Endpoint = <VPS_PUBLIC_IP>:51820
PersistentKeepalive = 25
EOF
```
Replace:
- `<GPU_PRIVATE_KEY>` with contents of `gpu-private.key`
- `<VPS_PUBLIC_KEY>` with contents from VPS's `vps-public.key`
- `<VPS_PUBLIC_IP>` with your VPS's public IP address
### Step 5: Start WireGuard on Both
**On VPS:**
```bash
# Enable IP forwarding
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
# Start WireGuard
chmod 600 /etc/wireguard/wg0.conf
systemctl enable wg-quick@wg0
systemctl start wg-quick@wg0
```
**On GPU Server:**
```bash
# Enable IP forwarding
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
# Start WireGuard
chmod 600 /etc/wireguard/wg0.conf
systemctl enable wg-quick@wg0
systemctl start wg-quick@wg0
```
### Step 6: Configure Firewall
**On VPS:**
```bash
# Allow WireGuard port
ufw allow 51820/udp
ufw reload
# Or with iptables
iptables -A INPUT -p udp --dport 51820 -j ACCEPT
iptables-save > /etc/iptables/rules.v4
```
**On GPU Server (RunPod):**
```bash
# Allow WireGuard
ufw allow 51820/udp
ufw reload
```
### Step 7: Test Connection
Same as Method 1 Step 6.
---
## Troubleshooting
### No handshake
**Check:**
```bash
wg show
```
If "latest handshake" shows "never":
1. Verify public keys are correct (easy to swap them!)
2. Check firewall allows UDP 51820
3. Verify endpoint IP is correct
4. Check `systemctl status wg-quick@wg0` for errors
### Can ping but can't access services
**On VPS, check PostgreSQL allows 10.8.0.0/24:**
```bash
# Edit postgresql.conf
nano /var/lib/postgresql/data/postgresql.conf
# Add or modify:
listen_addresses = '*'
# Edit pg_hba.conf
nano /var/lib/postgresql/data/pg_hba.conf
# Add:
host all all 10.8.0.0/24 scram-sha-256
# Restart
docker restart core_postgres
```
### WireGuard won't start
```bash
# Check logs
journalctl -u wg-quick@wg0 -n 50
# Common issues:
# - Wrong permissions: chmod 600 /etc/wireguard/wg0.conf
# - Invalid keys: regenerate with wg genkey
# - Port already in use: lsof -i :51820
```
---
## Verification Checklist
Before proceeding to Day 5:
- [ ] WireGuard installed on both VPS and GPU server
- [ ] VPN tunnel established (wg show shows handshake)
- [ ] GPU server can ping VPS (10.8.0.1)
- [ ] VPS can ping GPU server (10.8.0.2)
- [ ] Firewall allows WireGuard (UDP 51820)
- [ ] PostgreSQL accessible from GPU server
- [ ] WireGuard starts on boot (systemctl enable)
---
## Network Reference
**VPN IPs:**
- VPS: `10.8.0.1`
- GPU Server: `10.8.0.2`
**Service Access from GPU Server:**
- PostgreSQL: `postgresql://valknar:password@10.8.0.1:5432/dbname`
- Redis: `10.8.0.1:6379`
- LiteLLM: `http://10.8.0.1:4000`
- Mailpit: `10.8.0.1:1025`
**Service Access from VPS:**
- vLLM: `http://10.8.0.2:8000`
- ComfyUI: `http://10.8.0.2:8188`
- JupyterLab: `http://10.8.0.2:8888`
---
## Next: Docker & GPU Setup
Once VPN is working, proceed to **Day 5: Docker & NVIDIA Container Toolkit Setup**.
**Save connection info:**
```bash
# On GPU server
cat >> /workspace/SERVER_INFO.md << 'EOF'
## VPN Configuration
- VPN IP: 10.8.0.2
- VPS VPN IP: 10.8.0.1
- WireGuard Status: Active
- Latest Handshake: [Check with: wg show]
## Network Access
- Can reach VPS services: ✓
- VPS can reach GPU services: ✓
EOF
```

View File

@@ -15,7 +15,7 @@ services:
- ai_postgres_data:/var/lib/postgresql/data - ai_postgres_data:/var/lib/postgresql/data
- ./postgres/init:/docker-entrypoint-initdb.d - ./postgres/init:/docker-entrypoint-initdb.d
healthcheck: healthcheck:
test: ['CMD-SHELL', 'pg_isready -U ${AI_DB_USER}'] test: ["CMD-SHELL", "pg_isready -U ${AI_DB_USER}"]
interval: 30s interval: 30s
timeout: 10s timeout: 10s
retries: 3 retries: 3
@@ -38,6 +38,10 @@ services:
OPENAI_API_BASE_URLS: http://litellm:4000 OPENAI_API_BASE_URLS: http://litellm:4000
OPENAI_API_KEYS: ${AI_LITELLM_API_KEY} OPENAI_API_KEYS: ${AI_LITELLM_API_KEY}
# Disable Ollama (we only use LiteLLM)
ENABLE_OLLAMA_API: false
OLLAMA_BASE_URLS: ""
# WebUI configuration # WebUI configuration
WEBUI_NAME: ${AI_WEBUI_NAME:-Pivoine AI} WEBUI_NAME: ${AI_WEBUI_NAME:-Pivoine AI}
WEBUI_URL: https://${AI_TRAEFIK_HOST} WEBUI_URL: https://${AI_TRAEFIK_HOST}
@@ -62,102 +66,87 @@ services:
volumes: volumes:
- ai_webui_data:/app/backend/data - ai_webui_data:/app/backend/data
- ./functions:/app/backend/data/functions:ro
depends_on: depends_on:
- ai_postgres - ai_postgres
- litellm - litellm
networks: networks:
- compose_network - compose_network
labels: labels:
- 'traefik.enable=${AI_TRAEFIK_ENABLED}' - "traefik.enable=${AI_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-redirect-web-secure' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-redirect-web-secure"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.rule=Host(`${AI_TRAEFIK_HOST}`)' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.rule=Host(`${AI_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.entrypoints=web' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web.entrypoints=web"
# HTTPS router # HTTPS router
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.rule=Host(`${AI_TRAEFIK_HOST}`)' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.rule=Host(`${AI_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.entrypoints=web-secure' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.entrypoints=web-secure"
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-web-secure-compress.compress=true' - "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-web-secure-compress.compress=true"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-web-secure-compress,security-headers@file' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-web-secure-compress,security-headers@file"
# Service # Service
- 'traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-web-secure.loadbalancer.server.port=8080' - "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-web-secure.loadbalancer.server.port=8080"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# LiteLLM - Proxy to convert Anthropic API to OpenAI-compatible format # LiteLLM - Proxy to convert Anthropic API to OpenAI-compatible format
litellm: litellm:
image: ghcr.io/berriai/litellm:main-latest image: ghcr.io/berriai/litellm:main-latest
container_name: ${AI_COMPOSE_PROJECT_NAME}_litellm container_name: ${AI_COMPOSE_PROJECT_NAME}_litellm
restart: unless-stopped restart: unless-stopped
dns:
- 100.100.100.100
- 8.8.8.8
environment: environment:
TZ: ${TIMEZONE:-Europe/Berlin} TZ: ${TIMEZONE:-Europe/Berlin}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY} ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
LITELLM_MASTER_KEY: ${AI_LITELLM_API_KEY} LITELLM_MASTER_KEY: ${AI_LITELLM_API_KEY}
DATABASE_URL: postgresql://${AI_DB_USER}:${AI_DB_PASSWORD}@ai_postgres:5432/litellm DATABASE_URL: postgresql://${AI_DB_USER}:${AI_DB_PASSWORD}@ai_postgres:5432/litellm
LITELLM_DROP_PARAMS: 'true' GPU_VLLM_LLAMA_URL: ${GPU_VLLM_LLAMA_URL}
NO_DOCS: 'true' GPU_VLLM_BGE_URL: ${GPU_VLLM_BGE_URL}
NO_REDOC: 'true' # LITELLM_DROP_PARAMS: 'true' # DISABLED: Was breaking streaming
NO_DOCS: "true"
NO_REDOC: "true"
# Performance optimizations # Performance optimizations
LITELLM_LOG: 'ERROR' # Only log errors LITELLM_LOG: "DEBUG" # Enable detailed logging for debugging streaming issues
LITELLM_MODE: 'PRODUCTION' # Production mode for better performance LITELLM_MODE: "PRODUCTION" # Production mode for better performance
volumes: volumes:
- ./litellm-config.yaml:/app/litellm-config.yaml:ro - ./litellm-config.yaml:/app/litellm-config.yaml:ro
command: command:
[ [
'--config', "--config",
'/app/litellm-config.yaml', "/app/litellm-config.yaml",
'--host', "--host",
'0.0.0.0', "0.0.0.0",
'--port', "--port",
'4000', "4000",
'--drop_params'
] ]
depends_on: depends_on:
- ai_postgres - ai_postgres
networks:
- compose_network
healthcheck: healthcheck:
disable: true disable: true
labels:
- 'traefik.enable=${AI_TRAEFIK_ENABLED}'
# HTTP to HTTPS redirect
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-litellm-redirect-web-secure.redirectscheme.scheme=https'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-litellm-redirect-web-secure'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.rule=Host(`${AI_LITELLM_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.entrypoints=web'
# HTTPS router
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.rule=Host(`${AI_LITELLM_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.tls.certresolver=resolver'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.entrypoints=web-secure'
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure-compress.compress=true'
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure-compress,security-headers@file'
# Service
- 'traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.loadbalancer.server.port=4000'
- 'traefik.docker.network=${NETWORK_NAME}'
# Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
# Crawl4AI - Web scraping for LLMs (internal API, no public access)
crawl4ai:
image: ${AI_CRAWL4AI_IMAGE:-unclecode/crawl4ai:latest}
container_name: ${AI_COMPOSE_PROJECT_NAME}_crawl4ai
restart: unless-stopped
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
# API configuration
PORT: ${AI_CRAWL4AI_PORT:-11235}
volumes:
- ai_crawl4ai_data:/app/.crawl4ai
networks: networks:
- compose_network - compose_network
labels: labels:
# No Traefik exposure - internal only - "traefik.enable=${AI_TRAEFIK_ENABLED}"
- 'traefik.enable=false' # HTTP to HTTPS redirect
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-litellm-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-litellm-redirect-web-secure"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.rule=Host(`${AI_LITELLM_TRAEFIK_HOST}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web.entrypoints=web"
# HTTPS router
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.rule=Host(`${AI_LITELLM_TRAEFIK_HOST}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.tls.certresolver=resolver"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.entrypoints=web-secure"
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure-compress.compress=true"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure-compress,security-headers@file"
# Service
- "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-litellm-web-secure.loadbalancer.server.port=4000"
- "traefik.docker.network=${NETWORK_NAME}"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# Facefusion - AI face swapping and enhancement # Facefusion - AI face swapping and enhancement
facefusion: facefusion:
build: build:
@@ -167,7 +156,7 @@ services:
container_name: ${AI_COMPOSE_PROJECT_NAME}_facefusion container_name: ${AI_COMPOSE_PROJECT_NAME}_facefusion
restart: unless-stopped restart: unless-stopped
tty: true tty: true
command: ['python', '-u', 'facefusion.py', 'run'] command: ["python", "-u", "facefusion.py", "run"]
environment: environment:
TZ: ${TIMEZONE:-Europe/Berlin} TZ: ${TIMEZONE:-Europe/Berlin}
GRADIO_SERVER_NAME: "0.0.0.0" GRADIO_SERVER_NAME: "0.0.0.0"
@@ -177,30 +166,175 @@ services:
networks: networks:
- compose_network - compose_network
labels: labels:
- 'traefik.enable=${AI_FACEFUSION_TRAEFIK_ENABLED}' - "traefik.enable=${AI_FACEFUSION_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-facefusion-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-facefusion-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-facefusion-redirect-web-secure' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-facefusion-redirect-web-secure"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.rule=Host(`${AI_FACEFUSION_TRAEFIK_HOST}`)' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.rule=Host(`${AI_FACEFUSION_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.entrypoints=web' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web.entrypoints=web"
# HTTPS router with Authelia # HTTPS router with Authelia
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.rule=Host(`${AI_FACEFUSION_TRAEFIK_HOST}`)' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.rule=Host(`${AI_FACEFUSION_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.entrypoints=web-secure' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.entrypoints=web-secure"
- 'traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure-compress.compress=true' - "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure-compress.compress=true"
- 'traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure-compress,net-authelia,security-headers@file' - "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure-compress,net-authelia,security-headers@file"
# Service # Service
- 'traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.loadbalancer.server.port=7860' - "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-facefusion-web-secure.loadbalancer.server.port=7860"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Watchtower - disabled for custom local image # Watchtower - disabled for custom local image
- 'com.centurylinklabs.watchtower.enable=false' - "com.centurylinklabs.watchtower.enable=false"
# ComfyUI - Node-based UI for Flux image generation (proxies to RunPod GPU)
comfyui:
image: nginx:alpine
container_name: ${AI_COMPOSE_PROJECT_NAME}_comfyui
restart: unless-stopped
dns:
- 100.100.100.100
- 8.8.8.8
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
GPU_SERVICE_HOST: ${GPU_TAILSCALE_HOST:-runpod-ai-orchestrator}
GPU_SERVICE_PORT: ${COMFYUI_BACKEND_PORT:-8188}
volumes:
- ./nginx.conf.template:/etc/nginx/nginx.conf.template:ro
command: /bin/sh -c "envsubst '$${GPU_SERVICE_HOST},$${GPU_SERVICE_PORT}' < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && exec nginx -g 'daemon off;'"
networks:
- compose_network
labels:
- "traefik.enable=${AI_COMFYUI_TRAEFIK_ENABLED:-true}"
# HTTP to HTTPS redirect
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-comfyui-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-comfyui-redirect-web-secure"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web.rule=Host(`${AI_COMFYUI_TRAEFIK_HOST:-comfy.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web.entrypoints=web"
# HTTPS router with Authelia SSO
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure.rule=Host(`${AI_COMFYUI_TRAEFIK_HOST:-comfy.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure.tls.certresolver=resolver"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure.entrypoints=web-secure"
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure-compress.compress=true"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure-compress,net-authelia,security-headers@file"
# Service
- "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-comfyui-web-secure.loadbalancer.server.port=80"
- "traefik.docker.network=${NETWORK_NAME}"
# Watchtower
- "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
audiocraft:
image: nginx:alpine
container_name: ${AI_COMPOSE_PROJECT_NAME}_audiocraft
restart: unless-stopped
dns:
- 100.100.100.100
- 8.8.8.8
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
GPU_SERVICE_HOST: ${GPU_TAILSCALE_HOST:-runpod-ai-orchestrator}
GPU_SERVICE_PORT: ${AUDIOCRAFT_BACKEND_PORT:-7860}
volumes:
- ./nginx.conf.template:/etc/nginx/nginx.conf.template:ro
command: /bin/sh -c "envsubst '$${GPU_SERVICE_HOST},$${GPU_SERVICE_PORT}' < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && exec nginx -g 'daemon off;'"
networks:
- compose_network
labels:
- "traefik.enable=${AI_AUDIOCRAFT_TRAEFIK_ENABLED:-true}"
# HTTP to HTTPS redirect
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-audiocraft-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-audiocraft-redirect-web-secure"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web.rule=Host(`${AI_AUDIOCRAFT_TRAEFIK_HOST:-audiocraft.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web.entrypoints=web"
# HTTPS router with Authelia SSO
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure.rule=Host(`${AI_AUDIOCRAFT_TRAEFIK_HOST:-audiocraft.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure.tls.certresolver=resolver"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure.entrypoints=web-secure"
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure-compress.compress=true"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure-compress,net-authelia,security-headers@file"
# Service
- "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-audiocraft-web-secure.loadbalancer.server.port=80"
- "traefik.docker.network=${NETWORK_NAME}"
# Watchtower
- "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
upscale:
image: nginx:alpine
container_name: ${AI_COMPOSE_PROJECT_NAME}_upscale
restart: unless-stopped
dns:
- 100.100.100.100
- 8.8.8.8
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
GPU_SERVICE_HOST: ${GPU_TAILSCALE_HOST:-runpod-ai-orchestrator}
GPU_SERVICE_PORT: ${UPSCALE_BACKEND_PORT:-8080}
volumes:
- ./nginx.conf.template:/etc/nginx/nginx.conf.template:ro
command: /bin/sh -c "envsubst '$${GPU_SERVICE_HOST},$${GPU_SERVICE_PORT}' < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && exec nginx -g 'daemon off;'"
networks:
- compose_network
labels:
- "traefik.enable=${AI_UPSCALE_TRAEFIK_ENABLED:-true}"
# HTTP to HTTPS redirect
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-upscale-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-upscale-redirect-web-secure"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web.rule=Host(`${AI_UPSCALE_TRAEFIK_HOST:-upscale.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web.entrypoints=web"
# HTTPS router with Authelia SSO
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure.rule=Host(`${AI_UPSCALE_TRAEFIK_HOST:-upscale.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure.tls.certresolver=resolver"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure.entrypoints=web-secure"
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure-compress.compress=true"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure-compress,net-authelia,security-headers@file"
# Service
- "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-upscale-web-secure.loadbalancer.server.port=80"
- "traefik.docker.network=${NETWORK_NAME}"
# Watchtower
- "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# Supervisor UI - Modern web interface for RunPod process management
supervisor:
image: dev.pivoine.art/valknar/supervisor-ui:latest
container_name: ${AI_COMPOSE_PROJECT_NAME}_supervisor_ui
restart: unless-stopped
dns:
- 100.100.100.100
- 8.8.8.8
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
NODE_ENV: production
# Connect to RunPod Supervisor via Tailscale (host Tailscale provides DNS)
SUPERVISOR_HOST: ${GPU_TAILSCALE_HOST:-runpod-ai-orchestrator}
SUPERVISOR_PORT: ${SUPERVISOR_BACKEND_PORT:-9001}
# No auth needed - Supervisor has auth disabled (protected by Authelia)
networks:
- compose_network
labels:
- "traefik.enable=${AI_SUPERVISOR_TRAEFIK_ENABLED:-true}"
# HTTP to HTTPS redirect
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-supervisor-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web.middlewares=${AI_COMPOSE_PROJECT_NAME}-supervisor-redirect-web-secure"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web.rule=Host(`${AI_SUPERVISOR_TRAEFIK_HOST:-supervisor.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web.entrypoints=web"
# HTTPS router with Authelia SSO
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure.rule=Host(`${AI_SUPERVISOR_TRAEFIK_HOST:-supervisor.ai.pivoine.art}`)"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure.tls.certresolver=resolver"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure.entrypoints=web-secure"
- "traefik.http.middlewares.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure-compress.compress=true"
- "traefik.http.routers.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure.middlewares=${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure-compress,net-authelia,security-headers@file"
# Service (port 3000 for Next.js app)
- "traefik.http.services.${AI_COMPOSE_PROJECT_NAME}-supervisor-web-secure.loadbalancer.server.port=3000"
- "traefik.docker.network=${NETWORK_NAME}"
# Watchtower
- "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
volumes: volumes:
ai_postgres_data: ai_postgres_data:
name: ${AI_COMPOSE_PROJECT_NAME}_postgres_data name: ${AI_COMPOSE_PROJECT_NAME}_postgres_data
ai_webui_data: ai_webui_data:
name: ${AI_COMPOSE_PROJECT_NAME}_webui_data name: ${AI_COMPOSE_PROJECT_NAME}_webui_data
ai_crawl4ai_data:
name: ${AI_COMPOSE_PROJECT_NAME}_crawl4ai_data
ai_facefusion_data: ai_facefusion_data:
name: ${AI_COMPOSE_PROJECT_NAME}_facefusion_data name: ${AI_COMPOSE_PROJECT_NAME}_facefusion_data
networks:
compose_network:
name: ${NETWORK_NAME}
external: true

View File

@@ -1,229 +0,0 @@
#!/bin/bash
# GPU Stack Deployment Script
# Run this on the GPU server after SSH access is established
set -e # Exit on error
echo "=================================="
echo "GPU Stack Deployment Script"
echo "=================================="
echo ""
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Functions
print_success() {
echo -e "${GREEN}$1${NC}"
}
print_error() {
echo -e "${RED}$1${NC}"
}
print_info() {
echo -e "${YELLOW}$1${NC}"
}
# Check if running as root
if [[ $EUID -ne 0 ]]; then
print_error "This script must be run as root (use sudo)"
exit 1
fi
# Step 1: Check prerequisites
print_info "Checking prerequisites..."
if ! command -v docker &> /dev/null; then
print_error "Docker is not installed. Please run DOCKER_GPU_SETUP.md first."
exit 1
fi
print_success "Docker installed"
if ! command -v nvidia-smi &> /dev/null; then
print_error "nvidia-smi not found. Is this a GPU server?"
exit 1
fi
print_success "NVIDIA GPU detected"
if ! docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi &> /dev/null; then
print_error "Docker cannot access GPU. Please configure NVIDIA Container Toolkit."
exit 1
fi
print_success "Docker GPU access working"
# Step 2: Create directory structure
print_info "Creating directory structure..."
mkdir -p /workspace/gpu-stack/{vllm,comfyui,training/{configs,data,output},notebooks,monitoring}
cd /workspace/gpu-stack
print_success "Directory structure created"
# Step 3: Create .env file
if [ ! -f .env ]; then
print_info "Creating .env file..."
cat > .env << 'EOF'
# GPU Stack Environment Variables
# Timezone
TIMEZONE=Europe/Berlin
# VPN Network
VPS_IP=10.8.0.1
GPU_IP=10.8.0.2
# Model Storage (network volume)
MODELS_PATH=/workspace/models
# Hugging Face Token (optional, for gated models like Llama)
# Get from: https://huggingface.co/settings/tokens
HF_TOKEN=
# Weights & Biases (optional, for training logging)
# Get from: https://wandb.ai/authorize
WANDB_API_KEY=
# JupyterLab Access Token
JUPYTER_TOKEN=pivoine-ai-2025
# PostgreSQL (on VPS)
DB_HOST=10.8.0.1
DB_PORT=5432
DB_USER=valknar
DB_PASSWORD=ragnarok98
DB_NAME=openwebui
EOF
chmod 600 .env
print_success ".env file created (please edit with your tokens)"
else
print_success ".env file already exists"
fi
# Step 4: Download docker-compose.yaml
print_info "Downloading docker-compose.yaml..."
# In production, this would be copied from the repo
# For now, assume it's already in the current directory
if [ ! -f docker-compose.yaml ]; then
print_error "docker-compose.yaml not found. Please copy gpu-server-compose.yaml to docker-compose.yaml"
exit 1
fi
print_success "docker-compose.yaml found"
# Step 5: Pre-download models (optional but recommended)
print_info "Do you want to pre-download models? (y/n)"
read -r response
if [[ "$response" =~ ^[Yy]$ ]]; then
print_info "Downloading Llama 3.1 8B Instruct (this will take a while)..."
mkdir -p /workspace/models
# Use huggingface-cli to download
pip install -q huggingface-hub
huggingface-cli download \
meta-llama/Meta-Llama-3.1-8B-Instruct \
--local-dir /workspace/models/Meta-Llama-3.1-8B-Instruct \
--local-dir-use-symlinks False || print_error "Model download failed (may need HF_TOKEN)"
print_success "Model downloaded to /workspace/models"
fi
# Step 6: Start services
print_info "Starting GPU stack services..."
docker compose up -d vllm comfyui jupyter netdata
print_success "Services starting (this may take a few minutes)..."
# Step 7: Wait for services
print_info "Waiting for services to be ready..."
sleep 10
# Check service health
print_info "Checking service status..."
if docker ps | grep -q gpu_vllm; then
print_success "vLLM container running"
else
print_error "vLLM container not running"
fi
if docker ps | grep -q gpu_comfyui; then
print_success "ComfyUI container running"
else
print_error "ComfyUI container not running"
fi
if docker ps | grep -q gpu_jupyter; then
print_success "JupyterLab container running"
else
print_error "JupyterLab container not running"
fi
if docker ps | grep -q gpu_netdata; then
print_success "Netdata container running"
else
print_error "Netdata container not running"
fi
# Step 8: Display access information
echo ""
echo "=================================="
echo "Deployment Complete!"
echo "=================================="
echo ""
echo "Services accessible via VPN (from VPS):"
echo " - vLLM API: http://10.8.0.2:8000"
echo " - ComfyUI: http://10.8.0.2:8188"
echo " - JupyterLab: http://10.8.0.2:8888 (token: pivoine-ai-2025)"
echo " - Netdata: http://10.8.0.2:19999"
echo ""
echo "Local access (from GPU server):"
echo " - vLLM API: http://localhost:8000"
echo " - ComfyUI: http://localhost:8188"
echo " - JupyterLab: http://localhost:8888"
echo " - Netdata: http://localhost:19999"
echo ""
echo "Useful commands:"
echo " - View logs: docker compose logs -f"
echo " - Check status: docker compose ps"
echo " - Stop all: docker compose down"
echo " - Restart service: docker compose restart vllm"
echo " - Start training: docker compose --profile training up -d axolotl"
echo ""
echo "Next steps:"
echo " 1. Wait for vLLM to load model (check logs: docker compose logs -f vllm)"
echo " 2. Test vLLM: curl http://localhost:8000/v1/models"
echo " 3. Configure LiteLLM on VPS to use http://10.8.0.2:8000"
echo " 4. Download ComfyUI models via web interface"
echo ""
# Step 9: Create helpful aliases
print_info "Creating helpful aliases..."
cat >> ~/.bashrc << 'EOF'
# GPU Stack Aliases
alias gpu-logs='cd /workspace/gpu-stack && docker compose logs -f'
alias gpu-ps='cd /workspace/gpu-stack && docker compose ps'
alias gpu-restart='cd /workspace/gpu-stack && docker compose restart'
alias gpu-down='cd /workspace/gpu-stack && docker compose down'
alias gpu-up='cd /workspace/gpu-stack && docker compose up -d'
alias gpu-stats='watch -n 1 nvidia-smi'
alias gpu-top='nvtop'
EOF
print_success "Aliases added to ~/.bashrc (reload with: source ~/.bashrc)"
echo ""
print_success "All done! 🚀"

View File

@@ -1,237 +0,0 @@
# GPU Server Docker Compose Configuration
# Deploy on RunPod GPU server (10.8.0.2)
# Services accessible from VPS (10.8.0.1) via WireGuard VPN
version: '3.8'
services:
# =============================================================================
# vLLM - High-performance LLM Inference Server
# =============================================================================
vllm:
image: vllm/vllm-openai:latest
container_name: gpu_vllm
restart: unless-stopped
runtime: nvidia
environment:
NVIDIA_VISIBLE_DEVICES: all
CUDA_VISIBLE_DEVICES: "0"
HF_TOKEN: ${HF_TOKEN:-}
volumes:
- ${MODELS_PATH:-/workspace/models}:/root/.cache/huggingface
command:
- --model
- meta-llama/Meta-Llama-3.1-8B-Instruct # Change model here
- --host
- 0.0.0.0
- --port
- 8000
- --tensor-parallel-size
- "1"
- --gpu-memory-utilization
- "0.85" # Leave 15% for other tasks
- --max-model-len
- "8192"
- --dtype
- auto
- --trust-remote-code
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 120s # Model loading takes time
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
labels:
- "service=vllm"
- "stack=gpu-ai"
# =============================================================================
# ComfyUI - Advanced Stable Diffusion Interface
# =============================================================================
comfyui:
image: ghcr.io/ai-dock/comfyui:latest
container_name: gpu_comfyui
restart: unless-stopped
runtime: nvidia
environment:
NVIDIA_VISIBLE_DEVICES: all
TZ: ${TIMEZONE:-Europe/Berlin}
# ComfyUI auto-installs custom nodes on first run
COMFYUI_FLAGS: "--listen 0.0.0.0 --port 8188"
volumes:
- comfyui_data:/data
- ${MODELS_PATH:-/workspace/models}/comfyui:/opt/ComfyUI/models
- comfyui_output:/opt/ComfyUI/output
- comfyui_input:/opt/ComfyUI/input
ports:
- "8188:8188"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8188/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
labels:
- "service=comfyui"
- "stack=gpu-ai"
# =============================================================================
# Axolotl - LLM Fine-tuning Framework
# =============================================================================
# Note: This service uses "profiles" - only starts when explicitly requested
# Start with: docker compose --profile training up -d axolotl
axolotl:
image: winglian/axolotl:main-py3.11-cu121-2.2.2
container_name: gpu_training
runtime: nvidia
volumes:
- ./training/configs:/workspace/configs
- ./training/data:/workspace/data
- ./training/output:/workspace/output
- ${MODELS_PATH:-/workspace/models}:/workspace/models
- training_cache:/root/.cache
environment:
NVIDIA_VISIBLE_DEVICES: all
WANDB_API_KEY: ${WANDB_API_KEY:-}
HF_TOKEN: ${HF_TOKEN:-}
working_dir: /workspace
# Default command - override when running specific training
command: sleep infinity
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
profiles:
- training
labels:
- "service=axolotl"
- "stack=gpu-ai"
# =============================================================================
# JupyterLab - Interactive Development Environment
# =============================================================================
jupyter:
image: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-devel
container_name: gpu_jupyter
restart: unless-stopped
runtime: nvidia
volumes:
- ./notebooks:/workspace/notebooks
- ${MODELS_PATH:-/workspace/models}:/workspace/models
- jupyter_cache:/root/.cache
ports:
- "8888:8888"
environment:
NVIDIA_VISIBLE_DEVICES: all
JUPYTER_ENABLE_LAB: "yes"
JUPYTER_TOKEN: ${JUPYTER_TOKEN:-pivoine-ai-2025}
HF_TOKEN: ${HF_TOKEN:-}
command: |
bash -c "
pip install --quiet jupyterlab transformers datasets accelerate bitsandbytes peft trl sentencepiece protobuf &&
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser --NotebookApp.token='${JUPYTER_TOKEN:-pivoine-ai-2025}'
"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8888/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
labels:
- "service=jupyter"
- "stack=gpu-ai"
# =============================================================================
# Netdata - System & GPU Monitoring
# =============================================================================
netdata:
image: netdata/netdata:latest
container_name: gpu_netdata
restart: unless-stopped
runtime: nvidia
hostname: gpu-runpod
cap_add:
- SYS_PTRACE
- SYS_ADMIN
security_opt:
- apparmor:unconfined
environment:
NVIDIA_VISIBLE_DEVICES: all
TZ: ${TIMEZONE:-Europe/Berlin}
volumes:
- /sys:/host/sys:ro
- /proc:/host/proc:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /etc/os-release:/host/etc/os-release:ro
- netdata_config:/etc/netdata
- netdata_cache:/var/cache/netdata
- netdata_lib:/var/lib/netdata
ports:
- "19999:19999"
labels:
- "service=netdata"
- "stack=gpu-ai"
# =============================================================================
# Volumes
# =============================================================================
volumes:
# ComfyUI data
comfyui_data:
driver: local
comfyui_output:
driver: local
comfyui_input:
driver: local
# Training data
training_cache:
driver: local
# Jupyter data
jupyter_cache:
driver: local
# Netdata data
netdata_config:
driver: local
netdata_cache:
driver: local
netdata_lib:
driver: local
# =============================================================================
# Networks
# =============================================================================
networks:
default:
driver: bridge
ipam:
config:
- subnet: 172.25.0.0/24

View File

@@ -1,199 +0,0 @@
# LiteLLM Configuration with GPU Server Integration
# This config includes both Anthropic Claude (API) and self-hosted models (vLLM on GPU server)
model_list:
# =============================================================================
# Anthropic Claude Models (API-based, for complex reasoning)
# =============================================================================
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-sonnet-4.5
litellm_params:
model: anthropic/claude-sonnet-4-5-20250929
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-3-opus
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-3-haiku
litellm_params:
model: anthropic/claude-3-haiku-20240307
api_key: os.environ/ANTHROPIC_API_KEY
# =============================================================================
# Self-Hosted Models (vLLM on GPU server via WireGuard VPN)
# =============================================================================
# Llama 3.1 8B Instruct - Fast, general-purpose, good for routine tasks
- model_name: llama-3.1-8b
litellm_params:
model: openai/meta-llama/Meta-Llama-3.1-8B-Instruct
api_base: http://10.8.0.2:8000/v1
api_key: dummy # vLLM doesn't require auth
rpm: 1000 # Rate limit: requests per minute
tpm: 100000 # Rate limit: tokens per minute
# Alternative models (uncomment and configure on GPU server as needed)
# Qwen 2.5 14B Instruct - Excellent multilingual, stronger reasoning
# - model_name: qwen-2.5-14b
# litellm_params:
# model: openai/Qwen/Qwen2.5-14B-Instruct
# api_base: http://10.8.0.2:8000/v1
# api_key: dummy
# rpm: 800
# tpm: 80000
# Mistral 7B Instruct - Very fast, lightweight
# - model_name: mistral-7b
# litellm_params:
# model: openai/mistralai/Mistral-7B-Instruct-v0.3
# api_base: http://10.8.0.2:8000/v1
# api_key: dummy
# rpm: 1200
# tpm: 120000
# DeepSeek Coder 6.7B - Code generation specialist
# - model_name: deepseek-coder-6.7b
# litellm_params:
# model: openai/deepseek-ai/deepseek-coder-6.7b-instruct
# api_base: http://10.8.0.2:8000/v1
# api_key: dummy
# rpm: 1000
# tpm: 100000
# =============================================================================
# Router Settings - Intelligent Model Selection
# =============================================================================
# Model aliases for easy switching in Open WebUI
model_name_map:
# Default model (self-hosted, fast)
gpt-3.5-turbo: llama-3.1-8b
# Power users can use Claude for complex tasks
gpt-4: claude-sonnet-4.5
gpt-4-turbo: claude-sonnet-4.5
# LiteLLM Settings
litellm_settings:
drop_params: true
set_verbose: false # Disable verbose logging for better performance
# Enable caching with Redis for better performance
cache: true
cache_params:
type: redis
host: redis
port: 6379
ttl: 3600 # Cache for 1 hour
# Force strip specific parameters globally
allowed_fails: 0
# Modify params before sending to provider
modify_params: true
# Enable success and failure logging but minimize overhead
success_callback: [] # Disable all success callbacks to reduce DB writes
failure_callback: [] # Disable all failure callbacks
# Router Settings
router_settings:
allowed_fails: 0
# Routing strategy: Try self-hosted first, fallback to Claude on failure
routing_strategy: simple-shuffle
# Cooldown for failed models
cooldown_time: 30 # seconds
# Drop unsupported parameters
default_litellm_params:
drop_params: true
# General Settings
general_settings:
disable_responses_id_security: true
# Disable spend tracking to reduce database overhead
disable_spend_logs: false # Keep enabled to track API vs GPU costs
# Disable tag tracking
disable_tag_tracking: true
# Disable daily spend updates
disable_daily_spend_logs: false # Keep enabled for cost analysis
# Master key for authentication (set via env var)
master_key: os.environ/LITELLM_MASTER_KEY
# Database for logging (optional but recommended for cost tracking)
database_url: os.environ/DATABASE_URL
# Enable OpenAPI docs
docs_url: /docs
# =============================================================================
# Usage Guidelines (for Open WebUI users)
# =============================================================================
#
# Model Selection Guide:
#
# Use llama-3.1-8b for:
# - General chat and Q&A
# - Simple code generation
# - Data extraction
# - Summarization
# - Translation
# - Most routine tasks
# Cost: ~$0/month (self-hosted)
# Speed: ~50-80 tokens/second
#
# Use qwen-2.5-14b for:
# - Complex reasoning
# - Multi-step problems
# - Advanced code generation
# - Multilingual tasks
# Cost: ~$0/month (self-hosted)
# Speed: ~30-50 tokens/second
#
# Use claude-sonnet-4.5 for:
# - Very complex reasoning
# - Long documents (200K context)
# - Production-critical code
# - When quality matters most
# Cost: ~$3/million input tokens, ~$15/million output tokens
# Speed: ~30-40 tokens/second
#
# Use claude-3-haiku for:
# - API fallback (if self-hosted down)
# - Very fast responses needed
# Cost: ~$0.25/million input tokens, ~$1.25/million output tokens
# Speed: ~60-80 tokens/second
#
# =============================================================================
# Health Check Configuration
health_check:
# Check vLLM health endpoint
enabled: true
interval: 30 # seconds
timeout: 5 # seconds
# Fallback Configuration
# If GPU server is down, automatically use Claude
fallback:
- ["llama-3.1-8b", "claude-3-haiku"]
- ["qwen-2.5-14b", "claude-sonnet-4.5"]

View File

@@ -24,30 +24,57 @@ model_list:
model: anthropic/claude-3-haiku-20240307 model: anthropic/claude-3-haiku-20240307
api_key: os.environ/ANTHROPIC_API_KEY api_key: os.environ/ANTHROPIC_API_KEY
# ===========================================================================
# SELF-HOSTED MODELS - DIRECT vLLM SERVERS (GPU Server via Tailscale VPN)
# ===========================================================================
# Direct connections to dedicated vLLM servers (no orchestrator)
# Text Generation - Llama 3.1 8B (Port 8001)
- model_name: llama-3.1-8b
litellm_params:
model: hosted_vllm/meta-llama/Llama-3.1-8B-Instruct # hosted_vllm/openai/ prefix for proper streaming
api_base: os.environ/GPU_VLLM_LLAMA_URL # Direct to vLLM Llama server
api_key: "EMPTY" # vLLM doesn't validate API keys
rpm: 1000
tpm: 100000
timeout: 600 # 10 minutes for generation
stream_timeout: 600
supports_system_messages: true # Llama supports system messages
stream: true # Enable streaming by default
# Embeddings - BGE Large (Port 8002)
- model_name: bge-large-en
litellm_params:
model: openai/BAAI/bge-large-en-v1.5
api_base: os.environ/GPU_VLLM_BGE_URL
api_key: "EMPTY"
rpm: 1000
tpm: 500000
litellm_settings: litellm_settings:
drop_params: true drop_params: false # DISABLED: Was breaking streaming
set_verbose: false # Disable verbose logging for better performance set_verbose: true # Enable verbose logging for debugging streaming issues
# Enable caching with Redis for better performance # Enable caching now that streaming is fixed
cache: true cache: true
cache_params: cache_params:
type: redis type: redis
host: redis host: core_redis
port: 6379 port: 6379
ttl: 3600 # Cache for 1 hour ttl: 3600 # Cache for 1 hour
# Force strip specific parameters globally # Force strip specific parameters globally
allowed_fails: 0 allowed_fails: 0
# Modify params before sending to provider # Modify params before sending to provider
modify_params: true modify_params: false # DISABLED: Was breaking streaming
# Enable success and failure logging but minimize overhead # Enable success and failure logging but minimize overhead
success_callback: [] # Disable all success callbacks to reduce DB writes success_callback: [] # Disable all success callbacks to reduce DB writes
failure_callback: [] # Disable all failure callbacks failure_callback: [] # Disable all failure callbacks
router_settings: router_settings:
allowed_fails: 0 allowed_fails: 0
# Drop unsupported parameters # Drop unsupported parameters
default_litellm_params: default_litellm_params:
drop_params: true drop_params: false # DISABLED: Was breaking streaming
general_settings: general_settings:
disable_responses_id_security: true disable_responses_id_security: true

60
ai/nginx.conf.template Normal file
View File

@@ -0,0 +1,60 @@
events {
worker_connections 1024;
}
http {
# MIME types
include /etc/nginx/mime.types;
default_type application/octet-stream;
# DNS resolver for Tailscale MagicDNS
resolver 100.100.100.100 8.8.8.8 valid=30s;
resolver_timeout 5s;
# Proxy settings
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts for long-running audio/image generation
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
server {
listen 80;
server_name _;
# Increase client body size for image uploads
client_max_body_size 100M;
location / {
# Proxy to service on RunPod via Tailscale
# Use variable to force runtime DNS resolution (not startup)
set $backend http://${GPU_SERVICE_HOST}:${GPU_SERVICE_PORT};
proxy_pass $backend;
# WebSocket upgrade
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Proxy headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Disable buffering for real-time updates
proxy_buffering off;
}
}
}

View File

@@ -1,302 +0,0 @@
#!/usr/bin/env python3
"""
Simple vLLM server using AsyncLLMEngine directly
Bypasses the multiprocessing issues we hit with the default vLLM API server
OpenAI-compatible endpoints: /v1/models and /v1/completions
"""
import asyncio
import json
import logging
import os
from typing import AsyncIterator, Dict, List, Optional
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel, Field
from vllm import AsyncLLMEngine, AsyncEngineArgs, SamplingParams
from vllm.utils import random_uuid
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# FastAPI app
app = FastAPI(title="Simple vLLM Server", version="1.0.0")
# Global engine instance
engine: Optional[AsyncLLMEngine] = None
model_name: str = "Qwen/Qwen2.5-7B-Instruct"
# Request/Response models
class CompletionRequest(BaseModel):
"""OpenAI-compatible completion request"""
model: str = Field(default="qwen-2.5-7b")
prompt: str | List[str] = Field(..., description="Text prompt(s)")
max_tokens: int = Field(default=512, ge=1, le=4096)
temperature: float = Field(default=0.7, ge=0.0, le=2.0)
top_p: float = Field(default=1.0, ge=0.0, le=1.0)
n: int = Field(default=1, ge=1, le=10)
stream: bool = Field(default=False)
stop: Optional[str | List[str]] = None
presence_penalty: float = Field(default=0.0, ge=-2.0, le=2.0)
frequency_penalty: float = Field(default=0.0, ge=-2.0, le=2.0)
class ChatMessage(BaseModel):
"""Chat message format"""
role: str = Field(..., description="Role: system, user, or assistant")
content: str = Field(..., description="Message content")
class ChatCompletionRequest(BaseModel):
"""OpenAI-compatible chat completion request"""
model: str = Field(default="qwen-2.5-7b")
messages: List[ChatMessage] = Field(..., description="Chat messages")
max_tokens: int = Field(default=512, ge=1, le=4096)
temperature: float = Field(default=0.7, ge=0.0, le=2.0)
top_p: float = Field(default=1.0, ge=0.0, le=1.0)
n: int = Field(default=1, ge=1, le=10)
stream: bool = Field(default=False)
stop: Optional[str | List[str]] = None
@app.on_event("startup")
async def startup_event():
"""Initialize vLLM engine on startup"""
global engine, model_name
logger.info(f"Initializing vLLM AsyncLLMEngine with model: {model_name}")
# Configure engine
engine_args = AsyncEngineArgs(
model=model_name,
tensor_parallel_size=1, # Single GPU
gpu_memory_utilization=0.85, # Use 85% of GPU memory
max_model_len=4096, # Context length
dtype="auto", # Auto-detect dtype
download_dir="/workspace/huggingface_cache", # Large disk
trust_remote_code=True, # Some models require this
enforce_eager=False, # Use CUDA graphs for better performance
)
# Create async engine
engine = AsyncLLMEngine.from_engine_args(engine_args)
logger.info("vLLM AsyncLLMEngine initialized successfully")
@app.get("/")
async def root():
"""Health check endpoint"""
return {"status": "ok", "model": model_name}
@app.get("/health")
async def health():
"""Detailed health check"""
return {
"status": "healthy" if engine else "initializing",
"model": model_name,
"ready": engine is not None
}
@app.get("/v1/models")
async def list_models():
"""OpenAI-compatible models endpoint"""
return {
"object": "list",
"data": [
{
"id": "qwen-2.5-7b",
"object": "model",
"created": 1234567890,
"owned_by": "pivoine-gpu",
"permission": [],
"root": model_name,
"parent": None,
}
]
}
def messages_to_prompt(messages: List[ChatMessage]) -> str:
"""Convert chat messages to a single prompt string"""
# Qwen 2.5 chat template format
prompt_parts = []
for msg in messages:
role = msg.role
content = msg.content
if role == "system":
prompt_parts.append(f"<|im_start|>system\n{content}<|im_end|>")
elif role == "user":
prompt_parts.append(f"<|im_start|>user\n{content}<|im_end|>")
elif role == "assistant":
prompt_parts.append(f"<|im_start|>assistant\n{content}<|im_end|>")
# Add final assistant prompt
prompt_parts.append("<|im_start|>assistant\n")
return "\n".join(prompt_parts)
@app.post("/v1/completions")
async def create_completion(request: CompletionRequest):
"""OpenAI-compatible completion endpoint"""
if not engine:
return JSONResponse(
status_code=503,
content={"error": "Engine not initialized"}
)
# Handle both single prompt and batch prompts
prompts = [request.prompt] if isinstance(request.prompt, str) else request.prompt
# Configure sampling parameters
sampling_params = SamplingParams(
temperature=request.temperature,
top_p=request.top_p,
max_tokens=request.max_tokens,
n=request.n,
stop=request.stop if request.stop else [],
presence_penalty=request.presence_penalty,
frequency_penalty=request.frequency_penalty,
)
# Generate completions
results = []
for prompt in prompts:
request_id = random_uuid()
if request.stream:
# Streaming response
async def generate_stream():
async for output in engine.generate(prompt, sampling_params, request_id):
chunk = {
"id": request_id,
"object": "text_completion",
"created": 1234567890,
"model": request.model,
"choices": [
{
"text": output.outputs[0].text,
"index": 0,
"logprobs": None,
"finish_reason": output.outputs[0].finish_reason,
}
]
}
yield f"data: {json.dumps(chunk)}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate_stream(), media_type="text/event-stream")
else:
# Non-streaming response
async for output in engine.generate(prompt, sampling_params, request_id):
final_output = output
results.append({
"text": final_output.outputs[0].text,
"index": len(results),
"logprobs": None,
"finish_reason": final_output.outputs[0].finish_reason,
})
return {
"id": random_uuid(),
"object": "text_completion",
"created": 1234567890,
"model": request.model,
"choices": results,
"usage": {
"prompt_tokens": 0, # vLLM doesn't expose this easily
"completion_tokens": 0,
"total_tokens": 0,
}
}
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest):
"""OpenAI-compatible chat completion endpoint"""
if not engine:
return JSONResponse(
status_code=503,
content={"error": "Engine not initialized"}
)
# Convert messages to prompt
prompt = messages_to_prompt(request.messages)
# Configure sampling parameters
sampling_params = SamplingParams(
temperature=request.temperature,
top_p=request.top_p,
max_tokens=request.max_tokens,
n=request.n,
stop=request.stop if request.stop else ["<|im_end|>"],
)
request_id = random_uuid()
if request.stream:
# Streaming response
async def generate_stream():
async for output in engine.generate(prompt, sampling_params, request_id):
chunk = {
"id": request_id,
"object": "chat.completion.chunk",
"created": 1234567890,
"model": request.model,
"choices": [
{
"index": 0,
"delta": {"content": output.outputs[0].text},
"finish_reason": output.outputs[0].finish_reason,
}
]
}
yield f"data: {json.dumps(chunk)}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate_stream(), media_type="text/event-stream")
else:
# Non-streaming response
async for output in engine.generate(prompt, sampling_params, request_id):
final_output = output
return {
"id": request_id,
"object": "chat.completion",
"created": 1234567890,
"model": request.model,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": final_output.outputs[0].text,
},
"finish_reason": final_output.outputs[0].finish_reason,
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
}
}
if __name__ == "__main__":
import uvicorn
# Get configuration from environment
host = os.getenv("VLLM_HOST", "0.0.0.0")
port = int(os.getenv("VLLM_PORT", "8000"))
logger.info(f"Starting vLLM server on {host}:{port}")
uvicorn.run(
app,
host=host,
port=port,
log_level="info",
access_log=True,
)

View File

@@ -78,16 +78,20 @@ envs:
UTIL_JOPLIN_DB_NAME: joplin UTIL_JOPLIN_DB_NAME: joplin
# PairDrop # PairDrop
UTIL_DROP_TRAEFIK_HOST: drop.pivoine.art UTIL_DROP_TRAEFIK_HOST: drop.pivoine.art
# Media Stack (Jellyfin, Filestash, Pinchflat) # Media Stack (Jellyfin, Pinchflat)
MEDIA_TRAEFIK_ENABLED: true MEDIA_TRAEFIK_ENABLED: true
MEDIA_COMPOSE_PROJECT_NAME: media MEDIA_COMPOSE_PROJECT_NAME: media
MEDIA_JELLYFIN_IMAGE: jellyfin/jellyfin:latest MEDIA_JELLYFIN_IMAGE: jellyfin/jellyfin:latest
MEDIA_JELLYFIN_TRAEFIK_HOST: jellyfin.media.pivoine.art MEDIA_JELLYFIN_TRAEFIK_HOST: jellyfin.media.pivoine.art
MEDIA_FILESTASH_IMAGE: machines/filestash:latest
MEDIA_FILESTASH_TRAEFIK_HOST: filestash.media.pivoine.art
MEDIA_FILESTASH_CANARY: true
MEDIA_PINCHFLAT_IMAGE: ghcr.io/kieraneglin/pinchflat:latest MEDIA_PINCHFLAT_IMAGE: ghcr.io/kieraneglin/pinchflat:latest
MEDIA_PINCHFLAT_TRAEFIK_HOST: pinchflat.media.pivoine.art MEDIA_PINCHFLAT_TRAEFIK_HOST: pinchflat.media.pivoine.art
# Immich - Photo and video management
MEDIA_IMMICH_SERVER_IMAGE: ghcr.io/immich-app/immich-server:release
MEDIA_IMMICH_ML_IMAGE: ghcr.io/immich-app/immich-machine-learning:release
MEDIA_IMMICH_POSTGRES_IMAGE: ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0
MEDIA_IMMICH_TRAEFIK_HOST: immich.media.pivoine.art
MEDIA_IMMICH_DB_NAME: immich
MEDIA_IMMICH_DB_USER: immich
# Dev (Gitea + Coolify) # Dev (Gitea + Coolify)
DEV_TRAEFIK_ENABLED: true DEV_TRAEFIK_ENABLED: true
DEV_COMPOSE_PROJECT_NAME: dev DEV_COMPOSE_PROJECT_NAME: dev
@@ -135,7 +139,6 @@ envs:
AI_COMPOSE_PROJECT_NAME: ai AI_COMPOSE_PROJECT_NAME: ai
AI_POSTGRES_IMAGE: pgvector/pgvector:pg16 AI_POSTGRES_IMAGE: pgvector/pgvector:pg16
AI_WEBUI_IMAGE: ghcr.io/open-webui/open-webui:main AI_WEBUI_IMAGE: ghcr.io/open-webui/open-webui:main
AI_CRAWL4AI_IMAGE: unclecode/crawl4ai:latest
AI_FACEFUSION_IMAGE: facefusion/facefusion:3.5.0-cpu AI_FACEFUSION_IMAGE: facefusion/facefusion:3.5.0-cpu
AI_FACEFUSION_TRAEFIK_ENABLED: true AI_FACEFUSION_TRAEFIK_ENABLED: true
AI_FACEFUSION_TRAEFIK_HOST: facefusion.ai.pivoine.art AI_FACEFUSION_TRAEFIK_HOST: facefusion.ai.pivoine.art
@@ -263,3 +266,57 @@ scripts:
docker restart sexy_api && docker restart sexy_api &&
echo "✓ Directus API restarted" echo "✓ Directus API restarted"
net/create: docker network create "$NETWORK_NAME" net/create: docker network create "$NETWORK_NAME"
# Setup iptables NAT for Docker containers to reach Tailscale network
# Requires Tailscale installed on host: curl -fsSL https://tailscale.com/install.sh | sh
tailscale/setup: |
echo "Setting up iptables for Docker-to-Tailscale routing..."
# Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
grep -q "net.ipv4.ip_forward=1" /etc/sysctl.conf || echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf
# Get Docker network CIDR
DOCKER_CIDR=$(docker network inspect ${NETWORK_NAME} --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}' 2>/dev/null || echo "172.18.0.0/16")
echo "Docker network CIDR: $DOCKER_CIDR"
# Add NAT rule (check if already exists)
if ! sudo iptables -t nat -C POSTROUTING -s "$DOCKER_CIDR" -o tailscale0 -j MASQUERADE 2>/dev/null; then
sudo iptables -t nat -A POSTROUTING -s "$DOCKER_CIDR" -o tailscale0 -j MASQUERADE
echo "✓ iptables NAT rule added"
else
echo "✓ iptables NAT rule already exists"
fi
# Persist rules
sudo netfilter-persistent save 2>/dev/null || echo "Install iptables-persistent to persist rules: sudo apt install iptables-persistent"
echo "✓ Tailscale routing configured"
# Install and configure Tailscale on host with persistent state
tailscale/install: |
echo "Installing Tailscale..."
# Install Tailscale if not present
if ! command -v tailscale &> /dev/null; then
curl -fsSL https://tailscale.com/install.sh | sh
else
echo "✓ Tailscale already installed"
fi
# Create state directory for persistence
TAILSCALE_STATE="/var/lib/tailscale"
sudo mkdir -p "$TAILSCALE_STATE"
# Start and enable tailscaled service
sudo systemctl enable --now tailscaled
# Connect to Tailscale network
echo "Connecting to Tailscale..."
sudo tailscale up --authkey="$TAILSCALE_AUTHKEY" --hostname=vps
# Show status
echo ""
tailscale status
echo ""
echo "✓ Tailscale installed and connected"
echo " Run 'arty tailscale/setup' to configure iptables routing for Docker"

370
core/backrest/config.json Normal file
View File

@@ -0,0 +1,370 @@
{
"modno": 1,
"version": 4,
"instance": "falcon",
"repos": [
{
"id": "hidrive-backup",
"uri": "/repos",
"guid": "df03886ea215b0a3ff9730190d906d7034032bf0f1906ed4ad00f2c4f1748215",
"password": "falcon-backup-2025",
"prunePolicy": {
"schedule": {
"cron": "0 2 * * 0"
}
},
"checkPolicy": {
"schedule": {
"cron": "0 3 * * 0"
}
},
"autoUnlock": true
}
],
"plans": [
{
"id": "ai-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/ai_postgres_data",
"/volumes/ai_webui_data"
],
"schedule": {
"cron": "0 3 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "asciinema-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/asciinema_data"
],
"schedule": {
"cron": "0 11 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "coolify-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/dev_coolify_data"
],
"schedule": {
"cron": "0 0 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "directus-bundle-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/directus_bundle"
],
"schedule": {
"cron": "0 4 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 3
}
}
},
{
"id": "directus-uploads-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/directus_uploads"
],
"schedule": {
"cron": "0 4 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "gitea-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/dev_gitea_config",
"/volumes/dev_gitea_data",
"/volumes/dev_gitea_runner_data"
],
"schedule": {
"cron": "0 11 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "jellyfin-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/jelly_config"
],
"schedule": {
"cron": "0 9 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "joplin-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/joplin_data"
],
"schedule": {
"cron": "0 2 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "letsencrypt-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/letsencrypt_data"
],
"schedule": {
"cron": "0 8 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 12,
"yearly": 3
}
}
},
{
"id": "linkwarden-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/linkwarden_data",
"/volumes/linkwarden_meili_data"
],
"schedule": {
"cron": "0 7 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6
}
}
},
{
"id": "mattermost-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/mattermost_config",
"/volumes/mattermost_data",
"/volumes/mattermost_plugins"
],
"schedule": {
"cron": "0 5 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "n8n-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/n8n_data"
],
"schedule": {
"cron": "0 6 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6
}
}
},
{
"id": "netdata-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/netdata_config"
],
"schedule": {
"cron": "0 10 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 3
}
}
},
{
"id": "postgres-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/core_postgres_data"
],
"schedule": {
"cron": "0 2 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
},
{
"id": "redis-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/core_redis_data"
],
"schedule": {
"cron": "0 3 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 3
}
}
},
{
"id": "scrapy-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/scrapy_code",
"/volumes/scrapyd_data"
],
"schedule": {
"cron": "0 6 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 3
}
}
},
{
"id": "tandoor-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/tandoor_mediafiles",
"/volumes/tandoor_staticfiles"
],
"schedule": {
"cron": "0 5 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6
}
}
},
{
"id": "vaultwarden-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/vaultwarden_data"
],
"schedule": {
"cron": "0 8 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 12,
"yearly": 3
}
}
},
{
"id": "immich-backup",
"repo": "hidrive-backup",
"paths": [
"/volumes/immich_postgres_data",
"/volumes/immich_upload"
],
"schedule": {
"cron": "0 12 * * *"
},
"retention": {
"policyTimeBucketed": {
"daily": 7,
"weekly": 4,
"monthly": 6,
"yearly": 2
}
}
}
]
}

View File

@@ -56,7 +56,7 @@ services:
volumes: volumes:
# Backrest application data # Backrest application data
- backrest_data:/data - backrest_data:/data
- backrest_config:/config - ./backrest:/config
- backrest_cache:/cache - backrest_cache:/cache
- backrest_tmp:/tmp - backrest_tmp:/tmp
@@ -74,7 +74,6 @@ services:
- backup_util_tandoor_staticfiles:/volumes/tandoor_staticfiles:ro - backup_util_tandoor_staticfiles:/volumes/tandoor_staticfiles:ro
- backup_util_tandoor_mediafiles:/volumes/tandoor_mediafiles:ro - backup_util_tandoor_mediafiles:/volumes/tandoor_mediafiles:ro
- backup_n8n_data:/volumes/n8n_data:ro - backup_n8n_data:/volumes/n8n_data:ro
- backup_filestash_data:/volumes/filestash_data:ro
- backup_util_linkwarden_data:/volumes/linkwarden_data:ro - backup_util_linkwarden_data:/volumes/linkwarden_data:ro
- backup_util_linkwarden_meili_data:/volumes/linkwarden_meili_data:ro - backup_util_linkwarden_meili_data:/volumes/linkwarden_meili_data:ro
- backup_letsencrypt_data:/volumes/letsencrypt_data:ro - backup_letsencrypt_data:/volumes/letsencrypt_data:ro
@@ -84,12 +83,13 @@ services:
- backup_netdata_config:/volumes/netdata_config:ro - backup_netdata_config:/volumes/netdata_config:ro
- backup_ai_postgres_data:/volumes/ai_postgres_data:ro - backup_ai_postgres_data:/volumes/ai_postgres_data:ro
- backup_ai_webui_data:/volumes/ai_webui_data:ro - backup_ai_webui_data:/volumes/ai_webui_data:ro
- backup_ai_crawl4ai_data:/volumes/ai_crawl4ai_data:ro
- backup_asciinema_data:/volumes/asciinema_data:ro - backup_asciinema_data:/volumes/asciinema_data:ro
- backup_dev_gitea_data:/volumes/dev_gitea_data:ro - backup_dev_gitea_data:/volumes/dev_gitea_data:ro
- backup_dev_gitea_config:/volumes/dev_gitea_config:ro - backup_dev_gitea_config:/volumes/dev_gitea_config:ro
- backup_dev_gitea_runner_data:/volumes/dev_gitea_runner_data:ro - backup_dev_gitea_runner_data:/volumes/dev_gitea_runner_data:ro
- backup_dev_coolify_data:/volumes/dev_coolify_data:ro - backup_dev_coolify_data:/volumes/dev_coolify_data:ro
- backup_media_immich_postgres_data:/volumes/immich_postgres_data:ro
- backup_media_immich_upload:/volumes/immich_upload:ro
environment: environment:
TZ: ${TIMEZONE:-Europe/Berlin} TZ: ${TIMEZONE:-Europe/Berlin}
@@ -124,8 +124,6 @@ volumes:
name: ${CORE_COMPOSE_PROJECT_NAME}_redis_data name: ${CORE_COMPOSE_PROJECT_NAME}_redis_data
backrest_data: backrest_data:
name: ${CORE_COMPOSE_PROJECT_NAME}_backrest_data name: ${CORE_COMPOSE_PROJECT_NAME}_backrest_data
backrest_config:
name: ${CORE_COMPOSE_PROJECT_NAME}_backrest_config
backrest_cache: backrest_cache:
name: ${CORE_COMPOSE_PROJECT_NAME}_backrest_cache name: ${CORE_COMPOSE_PROJECT_NAME}_backrest_cache
backrest_tmp: backrest_tmp:
@@ -162,9 +160,6 @@ volumes:
backup_n8n_data: backup_n8n_data:
name: dev_n8n_data name: dev_n8n_data
external: true external: true
backup_filestash_data:
name: stash_filestash_data
external: true
backup_util_linkwarden_data: backup_util_linkwarden_data:
name: util_linkwarden_data name: util_linkwarden_data
external: true external: true
@@ -192,9 +187,6 @@ volumes:
backup_ai_webui_data: backup_ai_webui_data:
name: ai_webui_data name: ai_webui_data
external: true external: true
backup_ai_crawl4ai_data:
name: ai_crawl4ai_data
external: true
backup_asciinema_data: backup_asciinema_data:
name: dev_asciinema_data name: dev_asciinema_data
external: true external: true
@@ -210,3 +202,9 @@ volumes:
backup_dev_coolify_data: backup_dev_coolify_data:
name: dev_coolify_data name: dev_coolify_data
external: true external: true
backup_media_immich_postgres_data:
name: media_immich_postgres_data
external: true
backup_media_immich_upload:
name: media_immich_upload
external: true

View File

@@ -33,36 +33,6 @@ services:
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
# Filestash - Web-based file manager
filestash:
image: ${MEDIA_FILESTASH_IMAGE:-machines/filestash:latest}
container_name: ${MEDIA_COMPOSE_PROJECT_NAME}_filestash
restart: unless-stopped
volumes:
- filestash_data:/app/data/state/
tmpfs:
- /tmp:exec
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
APPLICATION_URL: ${MEDIA_FILESTASH_TRAEFIK_HOST}
CANARY: ${MEDIA_FILESTASH_CANARY:-true}
networks:
- compose_network
labels:
- 'traefik.enable=${MEDIA_TRAEFIK_ENABLED}'
- 'traefik.http.middlewares.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-redirect-web-secure.redirectscheme.scheme=https'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web.middlewares=${MEDIA_COMPOSE_PROJECT_NAME}-filestash-redirect-web-secure'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web.rule=Host(`${MEDIA_FILESTASH_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web.entrypoints=web'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure.rule=Host(`${MEDIA_FILESTASH_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure.tls.certresolver=resolver'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure.entrypoints=web-secure'
- 'traefik.http.middlewares.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure-compress.compress=true'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure.middlewares=${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure-compress'
- 'traefik.http.services.${MEDIA_COMPOSE_PROJECT_NAME}-filestash-web-secure.loadbalancer.server.port=8334'
- 'traefik.docker.network=${NETWORK_NAME}'
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
# Pinchflat - YouTube download manager # Pinchflat - YouTube download manager
pinchflat: pinchflat:
image: ${MEDIA_PINCHFLAT_IMAGE:-ghcr.io/kieraneglin/pinchflat:latest} image: ${MEDIA_PINCHFLAT_IMAGE:-ghcr.io/kieraneglin/pinchflat:latest}
@@ -95,15 +65,98 @@ services:
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
# Immich PostgreSQL - Dedicated database with vector extensions
immich_postgres:
image: ${MEDIA_IMMICH_POSTGRES_IMAGE:-ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0}
container_name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_postgres
restart: unless-stopped
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
POSTGRES_USER: ${MEDIA_IMMICH_DB_USER:-immich}
POSTGRES_PASSWORD: ${MEDIA_IMMICH_DB_PASSWORD}
POSTGRES_DB: ${MEDIA_IMMICH_DB_NAME:-immich}
POSTGRES_INITDB_ARGS: --data-checksums
volumes:
- immich_postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${MEDIA_IMMICH_DB_USER:-immich} -d ${MEDIA_IMMICH_DB_NAME:-immich}"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- compose_network
# Immich Server - Main application
immich_server:
image: ${MEDIA_IMMICH_SERVER_IMAGE:-ghcr.io/immich-app/immich-server:release}
container_name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_server
restart: unless-stopped
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
DB_HOSTNAME: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_postgres
DB_PORT: 5432
DB_USERNAME: ${MEDIA_IMMICH_DB_USER:-immich}
DB_PASSWORD: ${MEDIA_IMMICH_DB_PASSWORD}
DB_DATABASE_NAME: ${MEDIA_IMMICH_DB_NAME:-immich}
REDIS_HOSTNAME: ${CORE_COMPOSE_PROJECT_NAME}_redis
REDIS_PORT: ${CORE_REDIS_PORT:-6379}
IMMICH_MACHINE_LEARNING_URL: http://${MEDIA_COMPOSE_PROJECT_NAME}_immich_ml:3003
volumes:
- immich_upload:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
depends_on:
immich_postgres:
condition: service_healthy
networks:
- compose_network
labels:
- 'traefik.enable=${MEDIA_TRAEFIK_ENABLED}'
# HTTP to HTTPS redirect
- 'traefik.http.middlewares.${MEDIA_COMPOSE_PROJECT_NAME}-immich-redirect-web-secure.redirectscheme.scheme=https'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web.middlewares=${MEDIA_COMPOSE_PROJECT_NAME}-immich-redirect-web-secure'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web.rule=Host(`${MEDIA_IMMICH_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web.entrypoints=web'
# HTTPS router
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure.rule=Host(`${MEDIA_IMMICH_TRAEFIK_HOST}`)'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure.tls.certresolver=resolver'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure.entrypoints=web-secure'
- 'traefik.http.middlewares.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure-compress.compress=true'
- 'traefik.http.routers.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure.middlewares=${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure-compress,security-headers@file'
# Service
- 'traefik.http.services.${MEDIA_COMPOSE_PROJECT_NAME}-immich-web-secure.loadbalancer.server.port=2283'
- 'traefik.docker.network=${NETWORK_NAME}'
# Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
# Immich Machine Learning - AI inference for faces, search, etc.
immich_ml:
image: ${MEDIA_IMMICH_ML_IMAGE:-ghcr.io/immich-app/immich-machine-learning:release}
container_name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_ml
restart: unless-stopped
environment:
TZ: ${TIMEZONE:-Europe/Berlin}
volumes:
- immich_model_cache:/cache
networks:
- compose_network
labels:
# Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}'
volumes: volumes:
jellyfin_config: jellyfin_config:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_jellyfin_config name: ${MEDIA_COMPOSE_PROJECT_NAME}_jellyfin_config
jellyfin_cache: jellyfin_cache:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_jellyfin_cache name: ${MEDIA_COMPOSE_PROJECT_NAME}_jellyfin_cache
filestash_data:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_filestash_data
pinchflat_config: pinchflat_config:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_pinchflat_config name: ${MEDIA_COMPOSE_PROJECT_NAME}_pinchflat_config
immich_postgres_data:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_postgres_data
immich_upload:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_upload
immich_model_cache:
name: ${MEDIA_COMPOSE_PROJECT_NAME}_immich_model_cache
networks: networks:
compose_network: compose_network:

View File

@@ -74,23 +74,26 @@ access_control:
- "admin.asciinema.dev.pivoine.art" - "admin.asciinema.dev.pivoine.art"
- "facefusion.ai.pivoine.art" - "facefusion.ai.pivoine.art"
- "pinchflat.media.pivoine.art" - "pinchflat.media.pivoine.art"
- "comfy.ai.pivoine.art"
- "supervisor.ai.pivoine.art"
- "audiocraft.ai.pivoine.art"
- "upscale.ai.pivoine.art"
policy: one_factor policy: one_factor
# session secret set via environment variable: AUTHELIA_SESSION_SECRET # session secret set via environment variable: AUTHELIA_SESSION_SECRET
session: session:
name: 'authelia_session' name: "authelia_session"
same_site: 'lax' same_site: "lax"
expiration: '1h' expiration: "1h"
inactivity: '5m' inactivity: "15m"
remember_me: '1M' remember_me: "1M"
cookies: cookies:
- domain: 'pivoine.art' - domain: "pivoine.art"
authelia_url: 'https://auth.pivoine.art' authelia_url: "https://auth.pivoine.art"
same_site: 'lax' same_site: "lax"
expiration: '1h' expiration: "1h"
inactivity: '5m' inactivity: "5m"
remember_me: '1M' remember_me: "1M"
regulation: regulation:
max_retries: 3 max_retries: 3

View File

@@ -6,49 +6,49 @@ services:
restart: unless-stopped restart: unless-stopped
command: command:
# API & Dashboard # API & Dashboard
- '--api.dashboard=true' - "--api.dashboard=true"
- '--api.insecure=false' - "--api.insecure=false"
# Ping endpoint for healthcheck # Ping endpoint for healthcheck
- '--ping=true' - "--ping=true"
# Experimental plugins # Experimental plugins
- '--experimental.plugins.sablier.modulename=github.com/acouvreur/sablier' - "--experimental.plugins.sablier.modulename=github.com/acouvreur/sablier"
- '--experimental.plugins.sablier.version=v1.8.0' - "--experimental.plugins.sablier.version=v1.8.0"
# Logging # Logging
- '--log.level=${NET_PROXY_LOG_LEVEL:-INFO}' - "--log.level=${NET_PROXY_LOG_LEVEL:-INFO}"
- '--accesslog=true' - "--accesslog=true"
# Global # Global
- '--global.sendAnonymousUsage=false' - "--global.sendAnonymousUsage=false"
- '--global.checkNewVersion=true' - "--global.checkNewVersion=true"
# Docker Provider # Docker Provider
- '--providers.docker=true' - "--providers.docker=true"
- '--providers.docker.exposedbydefault=false' - "--providers.docker.exposedbydefault=false"
- '--providers.docker.network=${NETWORK_NAME}' - "--providers.docker.network=${NETWORK_NAME}"
# File Provider for dynamic configuration # File Provider for dynamic configuration
- '--providers.file.directory=/etc/traefik/dynamic' - "--providers.file.directory=/etc/traefik/dynamic"
- '--providers.file.watch=true' - "--providers.file.watch=true"
# Entrypoints # Entrypoints
- '--entrypoints.web.address=:${NET_PROXY_PORT_HTTP:-80}' - "--entrypoints.web.address=:${NET_PROXY_PORT_HTTP:-80}"
- '--entrypoints.web-secure.address=:${NET_PROXY_PORT_HTTPS:-443}' - "--entrypoints.web-secure.address=:${NET_PROXY_PORT_HTTPS:-443}"
# Global HTTP to HTTPS redirect # Global HTTP to HTTPS redirect
- '--entrypoints.web.http.redirections.entryPoint.to=web-secure' - "--entrypoints.web.http.redirections.entryPoint.to=web-secure"
- '--entrypoints.web.http.redirections.entryPoint.scheme=https' - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
- '--entrypoints.web.http.redirections.entryPoint.permanent=true' - "--entrypoints.web.http.redirections.entryPoint.permanent=true"
# Security Headers (applied globally) # Security Headers (applied globally)
- '--entrypoints.web-secure.http.middlewares=security-headers@file' - "--entrypoints.web-secure.http.middlewares=security-headers@file"
# Let's Encrypt # Let's Encrypt
- '--certificatesresolvers.resolver.acme.tlschallenge=true' - "--certificatesresolvers.resolver.acme.tlschallenge=true"
- '--certificatesresolvers.resolver.acme.email=${ADMIN_EMAIL}' - "--certificatesresolvers.resolver.acme.email=${ADMIN_EMAIL}"
- '--certificatesresolvers.resolver.acme.storage=/letsencrypt/acme.json' - "--certificatesresolvers.resolver.acme.storage=/letsencrypt/acme.json"
healthcheck: healthcheck:
test: ["CMD", "traefik", "healthcheck", "--ping"] test: ["CMD", "traefik", "healthcheck", "--ping"]
@@ -74,20 +74,20 @@ services:
- ./dynamic:/etc/traefik/dynamic:ro - ./dynamic:/etc/traefik/dynamic:ro
labels: labels:
- 'traefik.enable=true' - "traefik.enable=true"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-traefik-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-traefik-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-traefik-redirect-web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-traefik-redirect-web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.rule=Host(`${NET_PROXY_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.rule=Host(`${NET_PROXY_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.entrypoints=web' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web.entrypoints=web"
# HTTPS router with auth # HTTPS router with auth
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.rule=Host(`${NET_PROXY_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.rule=Host(`${NET_PROXY_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.entrypoints=web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.entrypoints=web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.service=api@internal' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.service=api@internal"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file"
- 'traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.loadbalancer.server.port=8080' - "traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-traefik-web-secure.loadbalancer.server.port=8080"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Netdata - Real-time monitoring # Netdata - Real-time monitoring
netdata: netdata:
@@ -128,23 +128,23 @@ services:
networks: networks:
- compose_network - compose_network
labels: labels:
- 'traefik.enable=${NET_TRAEFIK_ENABLED}' - "traefik.enable=${NET_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-netdata-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-netdata-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-netdata-redirect-web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-netdata-redirect-web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.rule=Host(`${NET_NETDATA_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.rule=Host(`${NET_NETDATA_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.entrypoints=web' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web.entrypoints=web"
# HTTPS router # HTTPS router
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.rule=Host(`${NET_NETDATA_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.rule=Host(`${NET_NETDATA_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.entrypoints=web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.entrypoints=web-secure"
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-netdata-compress.compress=true' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-netdata-compress.compress=true"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-netdata-compress,${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-netdata-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-netdata-compress,${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file"
# Service # Service
- 'traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-netdata.loadbalancer.server.port=19999' - "traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-netdata.loadbalancer.server.port=19999"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# Watchtower - Automatic container updates # Watchtower - Automatic container updates
watchtower: watchtower:
@@ -202,7 +202,8 @@ services:
- compose_network - compose_network
healthcheck: healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:3000/api/heartbeat || exit 1"] test:
["CMD-SHELL", "curl -f http://localhost:3000/api/heartbeat || exit 1"]
interval: 30s interval: 30s
timeout: 10s timeout: 10s
retries: 5 retries: 5
@@ -210,21 +211,21 @@ services:
labels: labels:
# Traefik Configuration # Traefik Configuration
- 'traefik.enable=${NET_TRAEFIK_ENABLED}' - "traefik.enable=${NET_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-umami-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-umami-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-umami-redirect-web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-umami-redirect-web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.rule=Host(`${NET_TRACK_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.rule=Host(`${NET_TRACK_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.entrypoints=web' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web.entrypoints=web"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.rule=Host(`${NET_TRACK_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.rule=Host(`${NET_TRACK_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.entrypoints=web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.entrypoints=web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.middlewares=security-headers@file' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.middlewares=security-headers@file"
- 'traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.loadbalancer.server.port=3000' - "traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-umami-web-secure.loadbalancer.server.port=3000"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# Mailpit - SMTP server with web UI # Mailpit - SMTP server with web UI
mailpit: mailpit:
@@ -250,22 +251,22 @@ services:
networks: networks:
- compose_network - compose_network
labels: labels:
- 'traefik.enable=${NET_TRAEFIK_ENABLED}' - "traefik.enable=${NET_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-mailpit-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-mailpit-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-mailpit-redirect-web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-mailpit-redirect-web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.rule=Host(`${NET_MAILPIT_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.rule=Host(`${NET_MAILPIT_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.entrypoints=web' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web.entrypoints=web"
# HTTPS router with auth # HTTPS router with auth
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.rule=Host(`${NET_MAILPIT_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.rule=Host(`${NET_MAILPIT_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.entrypoints=web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.entrypoints=web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia,security-headers@file"
# Service # Service
- 'traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.loadbalancer.server.port=8025' - "traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-mailpit-web-secure.loadbalancer.server.port=8025"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
# Authelia - SSO and authentication portal # Authelia - SSO and authentication portal
authelia: authelia:
@@ -285,27 +286,27 @@ services:
networks: networks:
- compose_network - compose_network
labels: labels:
- 'traefik.enable=${NET_TRAEFIK_ENABLED}' - "traefik.enable=${NET_TRAEFIK_ENABLED}"
# HTTP to HTTPS redirect # HTTP to HTTPS redirect
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia-redirect-web-secure.redirectscheme.scheme=https' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia-redirect-web-secure.redirectscheme.scheme=https"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia-redirect-web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.middlewares=${NET_COMPOSE_PROJECT_NAME}-authelia-redirect-web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.rule=Host(`${NET_AUTHELIA_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.rule=Host(`${NET_AUTHELIA_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.entrypoints=web' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web.entrypoints=web"
# HTTPS router # HTTPS router
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.rule=Host(`${NET_AUTHELIA_TRAEFIK_HOST}`)' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.rule=Host(`${NET_AUTHELIA_TRAEFIK_HOST}`)"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.tls.certresolver=resolver' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.tls.certresolver=resolver"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.entrypoints=web-secure' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.entrypoints=web-secure"
- 'traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.middlewares=security-headers@file' - "traefik.http.routers.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.middlewares=security-headers@file"
# Service # Service
- 'traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.loadbalancer.server.port=9091' - "traefik.http.services.${NET_COMPOSE_PROJECT_NAME}-authelia-web-secure.loadbalancer.server.port=9091"
- 'traefik.docker.network=${NETWORK_NAME}' - "traefik.docker.network=${NETWORK_NAME}"
# ForwardAuth middleware for other services # ForwardAuth middleware for other services
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.address=http://net_authelia:9091/api/authz/forward-auth' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.address=http://net_authelia:9091/api/authz/forward-auth"
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.trustForwardHeader=true' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.trustForwardHeader=true"
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.authResponseHeaders=Remote-User,Remote-Groups,Remote-Name,Remote-Email' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.authResponseHeaders=Remote-User,Remote-Groups,Remote-Name,Remote-Email"
- 'traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.authResponseHeadersRegex=^Remote-' - "traefik.http.middlewares.${NET_COMPOSE_PROJECT_NAME}-authelia.forwardAuth.authResponseHeadersRegex=^Remote-"
# Watchtower # Watchtower
- 'com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}' - "com.centurylinklabs.watchtower.enable=${WATCHTOWER_LABEL_ENABLE}"
volumes: volumes:
letsencrypt_data: letsencrypt_data: