Files
docker-compose/ai
Sebastian Krüger abcebd1d9b docs: migrate multi-modal AI orchestration to dedicated runpod repository
Multi-modal AI stack (text/image/music generation) has been moved to:
Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git

Updated ai/README.md to document:
- VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL)
- Reference to new runpod repository for GPU infrastructure
- Clear separation between VPS and GPU deployments
- Integration architecture via Tailscale VPN
2025-11-21 14:36:36 +01:00
..

AI Infrastructure

This directory contains AI-related configurations for the VPS deployment.

Multi-Modal GPU Infrastructure (Migrated)

The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:

Repository: https://dev.pivoine.art/valknar/runpod

The RunPod repository contains:

  • Model orchestrator for automatic switching between text, image, and music models
  • vLLM + Qwen 2.5 7B (text generation)
  • Flux.1 Schnell (image generation)
  • MusicGen Medium (music generation)
  • RunPod template creation scripts
  • Complete deployment documentation

This separation allows for independent management of:

  • VPS Services (this repo): Open WebUI, Crawl4AI, AI database
  • GPU Services (runpod repo): Model inference, orchestration, RunPod templates

VPS AI Services (ai/compose.yaml)

This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server:

Services

ai_postgres

Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads:

  • Vector similarity search support
  • Isolated from core database for performance
  • Used by Open WebUI for RAG and embeddings

webui (Open WebUI)

ChatGPT-like interface exposed at ai.pivoine.art:8080:

  • Claude API integration via Anthropic
  • RAG support with document upload
  • Vector storage via pgvector
  • Web search capability
  • SMTP email via IONOS
  • User signup enabled

crawl4ai

Internal web scraping service for LLM content preparation:

  • API on port 11235 (not exposed publicly)
  • Optimized for AI/RAG workflows
  • Integration with Open WebUI and n8n

Integration with GPU Server

The VPS AI services connect to the GPU server via Tailscale VPN:

  • VPS Tailscale IP: 100.102.217.79
  • GPU Tailscale IP: 100.100.108.13

LiteLLM Proxy (port 4000 on VPS) routes requests:

  • Claude API for chat completions
  • GPU orchestrator for self-hosted models (text, image, music)

See ../litellm-config.yaml for routing configuration.

Environment Variables

Required in .env:

# AI Database
AI_DB_PASSWORD=<password>

# Open WebUI
AI_WEBUI_SECRET_KEY=<secret>

# Claude API
ANTHROPIC_API_KEY=<api_key>

# Email (IONOS SMTP)
ADMIN_EMAIL=<email>
SMTP_HOST=smtp.ionos.com
SMTP_PORT=587
SMTP_USER=<smtp_user>
SMTP_PASSWORD=<smtp_password>

Backup Configuration

AI services are backed up daily via Restic:

  • ai_postgres_data: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly)
  • ai_webui_data: 3 AM (same retention)
  • ai_crawl4ai_data: 3 AM (same retention)

Repository: /mnt/hidrive/users/valknar/Backup

Management Commands

# Start AI stack
pnpm arty up ai_postgres webui crawl4ai

# View logs
docker logs -f ai_webui
docker logs -f ai_postgres
docker logs -f ai_crawl4ai

# Check Open WebUI
curl http://ai.pivoine.art:8080/health

# Restart AI services
pnpm arty restart ai_postgres webui crawl4ai

GPU Server Management

For GPU server operations (model orchestration, template creation, etc.):

# Clone the dedicated repository
git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git

# See runpod repository for:
# - Model orchestration setup
# - RunPod template creation
# - GPU deployment guides

Documentation

VPS AI Services

GPU Server (Separate Repository)

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                     VPS (Tailscale: 100.102.217.79)             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ LiteLLM Proxy (Port 4000)                                 │  │
│  │ Routes to: Claude API + GPU Orchestrator                  │  │
│  └───────┬───────────────────────────────────────────────────┘  │
│          │                                                       │
│  ┌───────▼─────────┐  ┌──────────────┐  ┌─────────────────┐   │
│  │ Open WebUI      │  │ Crawl4AI     │  │ AI PostgreSQL   │   │
│  │ Port: 8080      │  │ Port: 11235  │  │ + pgvector      │   │
│  └─────────────────┘  └──────────────┘  └─────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                               │ Tailscale VPN
┌──────────────────────────────┼──────────────────────────────────┐
│              RunPod GPU Server (Tailscale: 100.100.108.13)      │
│  ┌───────────────────────────▼──────────────────────────────┐   │
│  │ Orchestrator (Port 9000)                                  │   │
│  │ Manages sequential model loading                          │   │
│  └─────┬──────────────┬──────────────────┬──────────────────┘   │
│        │              │                  │                       │
│  ┌─────▼──────┐ ┌────▼────────┐  ┌──────▼───────┐              │
│  │vLLM        │ │Flux.1       │  │MusicGen      │              │
│  │Qwen 2.5 7B │ │Schnell      │  │Medium        │              │
│  │Port: 8001  │ │Port: 8002   │  │Port: 8003    │              │
│  └────────────┘ └─────────────┘  └──────────────┘              │
└─────────────────────────────────────────────────────────────────┘

Support

For issues:

  • VPS AI services: Check logs via docker logs
  • GPU server: See runpod repository documentation
  • LiteLLM routing: Review ../litellm-config.yaml