Files
docker-compose/ai/README.md
Sebastian Krüger abcebd1d9b docs: migrate multi-modal AI orchestration to dedicated runpod repository
Multi-modal AI stack (text/image/music generation) has been moved to:
Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git

Updated ai/README.md to document:
- VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL)
- Reference to new runpod repository for GPU infrastructure
- Clear separation between VPS and GPU deployments
- Integration architecture via Tailscale VPN
2025-11-21 14:36:36 +01:00

171 lines
7.0 KiB
Markdown

# AI Infrastructure
This directory contains AI-related configurations for the VPS deployment.
## Multi-Modal GPU Infrastructure (Migrated)
**The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:**
**Repository**: https://dev.pivoine.art/valknar/runpod
The RunPod repository contains:
- Model orchestrator for automatic switching between text, image, and music models
- vLLM + Qwen 2.5 7B (text generation)
- Flux.1 Schnell (image generation)
- MusicGen Medium (music generation)
- RunPod template creation scripts
- Complete deployment documentation
This separation allows for independent management of:
- **VPS Services** (this repo): Open WebUI, Crawl4AI, AI database
- **GPU Services** (runpod repo): Model inference, orchestration, RunPod templates
## VPS AI Services (ai/compose.yaml)
This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server:
### Services
#### ai_postgres
Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads:
- Vector similarity search support
- Isolated from core database for performance
- Used by Open WebUI for RAG and embeddings
#### webui (Open WebUI)
ChatGPT-like interface exposed at `ai.pivoine.art:8080`:
- Claude API integration via Anthropic
- RAG support with document upload
- Vector storage via pgvector
- Web search capability
- SMTP email via IONOS
- User signup enabled
#### crawl4ai
Internal web scraping service for LLM content preparation:
- API on port 11235 (not exposed publicly)
- Optimized for AI/RAG workflows
- Integration with Open WebUI and n8n
## Integration with GPU Server
The VPS AI services connect to the GPU server via Tailscale VPN:
- **VPS Tailscale IP**: 100.102.217.79
- **GPU Tailscale IP**: 100.100.108.13
**LiteLLM Proxy** (port 4000 on VPS) routes requests:
- Claude API for chat completions
- GPU orchestrator for self-hosted models (text, image, music)
See `../litellm-config.yaml` for routing configuration.
## Environment Variables
Required in `.env`:
```bash
# AI Database
AI_DB_PASSWORD=<password>
# Open WebUI
AI_WEBUI_SECRET_KEY=<secret>
# Claude API
ANTHROPIC_API_KEY=<api_key>
# Email (IONOS SMTP)
ADMIN_EMAIL=<email>
SMTP_HOST=smtp.ionos.com
SMTP_PORT=587
SMTP_USER=<smtp_user>
SMTP_PASSWORD=<smtp_password>
```
## Backup Configuration
AI services are backed up daily via Restic:
- **ai_postgres_data**: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly)
- **ai_webui_data**: 3 AM (same retention)
- **ai_crawl4ai_data**: 3 AM (same retention)
Repository: `/mnt/hidrive/users/valknar/Backup`
## Management Commands
```bash
# Start AI stack
pnpm arty up ai_postgres webui crawl4ai
# View logs
docker logs -f ai_webui
docker logs -f ai_postgres
docker logs -f ai_crawl4ai
# Check Open WebUI
curl http://ai.pivoine.art:8080/health
# Restart AI services
pnpm arty restart ai_postgres webui crawl4ai
```
## GPU Server Management
For GPU server operations (model orchestration, template creation, etc.):
```bash
# Clone the dedicated repository
git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git
# See runpod repository for:
# - Model orchestration setup
# - RunPod template creation
# - GPU deployment guides
```
## Documentation
### VPS AI Services
- [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) - VPS AI deployment history
### GPU Server (Separate Repository)
- [runpod/README.md](https://dev.pivoine.art/valknar/runpod) - Main GPU documentation
- [runpod/DEPLOYMENT.md](https://dev.pivoine.art/valknar/runpod) - Deployment guide
- [runpod/RUNPOD_TEMPLATE.md](https://dev.pivoine.art/valknar/runpod) - Template creation
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ VPS (Tailscale: 100.102.217.79) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ LiteLLM Proxy (Port 4000) │ │
│ │ Routes to: Claude API + GPU Orchestrator │ │
│ └───────┬───────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────▼─────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Open WebUI │ │ Crawl4AI │ │ AI PostgreSQL │ │
│ │ Port: 8080 │ │ Port: 11235 │ │ + pgvector │ │
│ └─────────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ Tailscale VPN
┌──────────────────────────────┼──────────────────────────────────┐
│ RunPod GPU Server (Tailscale: 100.100.108.13) │
│ ┌───────────────────────────▼──────────────────────────────┐ │
│ │ Orchestrator (Port 9000) │ │
│ │ Manages sequential model loading │ │
│ └─────┬──────────────┬──────────────────┬──────────────────┘ │
│ │ │ │ │
│ ┌─────▼──────┐ ┌────▼────────┐ ┌──────▼───────┐ │
│ │vLLM │ │Flux.1 │ │MusicGen │ │
│ │Qwen 2.5 7B │ │Schnell │ │Medium │ │
│ │Port: 8001 │ │Port: 8002 │ │Port: 8003 │ │
│ └────────────┘ └─────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Support
For issues:
- **VPS AI services**: Check logs via `docker logs`
- **GPU server**: See runpod repository documentation
- **LiteLLM routing**: Review `../litellm-config.yaml`