docker-compose/ai/README.md

# AI Infrastructure

This directory contains AI-related configurations for the VPS deployment.

## Multi-Modal GPU Infrastructure (Migrated)

**The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:**

**Repository**: https://dev.pivoine.art/valknar/runpod

The RunPod repository contains:
- Model orchestrator for automatic switching between text, image, and music models
- vLLM + Qwen 2.5 7B (text generation)
- Flux.1 Schnell (image generation)
- MusicGen Medium (music generation)
- RunPod template creation scripts
- Complete deployment documentation

This separation allows for independent management of:
- **VPS Services** (this repo): Open WebUI, Crawl4AI, AI database
- **GPU Services** (runpod repo): Model inference, orchestration, RunPod templates

## VPS AI Services (ai/compose.yaml)

This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server:

### Services

#### ai_postgres
Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads:
- Vector similarity search support
- Isolated from core database for performance
- Used by Open WebUI for RAG and embeddings

#### webui (Open WebUI)
ChatGPT-like interface exposed at `ai.pivoine.art:8080`:
- Claude API integration via Anthropic
- RAG support with document upload
- Vector storage via pgvector
- Web search capability
- SMTP email via IONOS
- User signup enabled

#### crawl4ai
Internal web scraping service for LLM content preparation:
- API on port 11235 (not exposed publicly)
- Optimized for AI/RAG workflows
- Integration with Open WebUI and n8n

## Integration with GPU Server

The VPS AI services connect to the GPU server via Tailscale VPN:
- **VPS Tailscale IP**: 100.102.217.79
- **GPU Tailscale IP**: 100.100.108.13

**LiteLLM Proxy** (port 4000 on VPS) routes requests:
- Claude API for chat completions
- GPU orchestrator for self-hosted models (text, image, music)

See `../litellm-config.yaml` for routing configuration.

## Environment Variables

Required in `.env`:
```bash
# AI Database
AI_DB_PASSWORD=<password>

# Open WebUI
AI_WEBUI_SECRET_KEY=<secret>

# Claude API
ANTHROPIC_API_KEY=<api_key>

# Email (IONOS SMTP)
ADMIN_EMAIL=<email>
SMTP_HOST=smtp.ionos.com
SMTP_PORT=587
SMTP_USER=<smtp_user>
SMTP_PASSWORD=<smtp_password>
```

## Backup Configuration

AI services are backed up daily via Restic:
- **ai_postgres_data**: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly)
- **ai_webui_data**: 3 AM (same retention)
- **ai_crawl4ai_data**: 3 AM (same retention)

Repository: `/mnt/hidrive/users/valknar/Backup`

## Management Commands

```bash
# Start AI stack
pnpm arty up ai_postgres webui crawl4ai

# View logs
docker logs -f ai_webui
docker logs -f ai_postgres
docker logs -f ai_crawl4ai

# Check Open WebUI
curl http://ai.pivoine.art:8080/health

# Restart AI services
pnpm arty restart ai_postgres webui crawl4ai
```

## GPU Server Management

For GPU server operations (model orchestration, template creation, etc.):

```bash
# Clone the dedicated repository
git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git

# See runpod repository for:
# - Model orchestration setup
# - RunPod template creation
# - GPU deployment guides
```

## Documentation

### VPS AI Services
- [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) - VPS AI deployment history

### GPU Server (Separate Repository)
- [runpod/README.md](https://dev.pivoine.art/valknar/runpod) - Main GPU documentation
- [runpod/DEPLOYMENT.md](https://dev.pivoine.art/valknar/runpod) - Deployment guide
- [runpod/RUNPOD_TEMPLATE.md](https://dev.pivoine.art/valknar/runpod) - Template creation

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                     VPS (Tailscale: 100.102.217.79)             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ LiteLLM Proxy (Port 4000)                                 │  │
│  │ Routes to: Claude API + GPU Orchestrator                  │  │
│  └───────┬───────────────────────────────────────────────────┘  │
│          │                                                       │
│  ┌───────▼─────────┐  ┌──────────────┐  ┌─────────────────┐   │
│  │ Open WebUI      │  │ Crawl4AI     │  │ AI PostgreSQL   │   │
│  │ Port: 8080      │  │ Port: 11235  │  │ + pgvector      │   │
│  └─────────────────┘  └──────────────┘  └─────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                               │ Tailscale VPN
┌──────────────────────────────┼──────────────────────────────────┐
│              RunPod GPU Server (Tailscale: 100.100.108.13)      │
│  ┌───────────────────────────▼──────────────────────────────┐   │
│  │ Orchestrator (Port 9000)                                  │   │
│  │ Manages sequential model loading                          │   │
│  └─────┬──────────────┬──────────────────┬──────────────────┘   │
│        │              │                  │                       │
│  ┌─────▼──────┐ ┌────▼────────┐  ┌──────▼───────┐              │
│  │vLLM        │ │Flux.1       │  │MusicGen      │              │
│  │Qwen 2.5 7B │ │Schnell      │  │Medium        │              │
│  │Port: 8001  │ │Port: 8002   │  │Port: 8003    │              │
│  └────────────┘ └─────────────┘  └──────────────┘              │
└─────────────────────────────────────────────────────────────────┘
```

## Support

For issues:
- **VPS AI services**: Check logs via `docker logs`
- **GPU server**: See runpod repository documentation
- **LiteLLM routing**: Review `../litellm-config.yaml`