Multi-modal AI stack (text/image/music generation) has been moved to: Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git Updated ai/README.md to document: - VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL) - Reference to new runpod repository for GPU infrastructure - Clear separation between VPS and GPU deployments - Integration architecture via Tailscale VPN
171 lines
7.0 KiB
Markdown
171 lines
7.0 KiB
Markdown
# AI Infrastructure
|
|
|
|
This directory contains AI-related configurations for the VPS deployment.
|
|
|
|
## Multi-Modal GPU Infrastructure (Migrated)
|
|
|
|
**The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:**
|
|
|
|
**Repository**: https://dev.pivoine.art/valknar/runpod
|
|
|
|
The RunPod repository contains:
|
|
- Model orchestrator for automatic switching between text, image, and music models
|
|
- vLLM + Qwen 2.5 7B (text generation)
|
|
- Flux.1 Schnell (image generation)
|
|
- MusicGen Medium (music generation)
|
|
- RunPod template creation scripts
|
|
- Complete deployment documentation
|
|
|
|
This separation allows for independent management of:
|
|
- **VPS Services** (this repo): Open WebUI, Crawl4AI, AI database
|
|
- **GPU Services** (runpod repo): Model inference, orchestration, RunPod templates
|
|
|
|
## VPS AI Services (ai/compose.yaml)
|
|
|
|
This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server:
|
|
|
|
### Services
|
|
|
|
#### ai_postgres
|
|
Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads:
|
|
- Vector similarity search support
|
|
- Isolated from core database for performance
|
|
- Used by Open WebUI for RAG and embeddings
|
|
|
|
#### webui (Open WebUI)
|
|
ChatGPT-like interface exposed at `ai.pivoine.art:8080`:
|
|
- Claude API integration via Anthropic
|
|
- RAG support with document upload
|
|
- Vector storage via pgvector
|
|
- Web search capability
|
|
- SMTP email via IONOS
|
|
- User signup enabled
|
|
|
|
#### crawl4ai
|
|
Internal web scraping service for LLM content preparation:
|
|
- API on port 11235 (not exposed publicly)
|
|
- Optimized for AI/RAG workflows
|
|
- Integration with Open WebUI and n8n
|
|
|
|
## Integration with GPU Server
|
|
|
|
The VPS AI services connect to the GPU server via Tailscale VPN:
|
|
- **VPS Tailscale IP**: 100.102.217.79
|
|
- **GPU Tailscale IP**: 100.100.108.13
|
|
|
|
**LiteLLM Proxy** (port 4000 on VPS) routes requests:
|
|
- Claude API for chat completions
|
|
- GPU orchestrator for self-hosted models (text, image, music)
|
|
|
|
See `../litellm-config.yaml` for routing configuration.
|
|
|
|
## Environment Variables
|
|
|
|
Required in `.env`:
|
|
```bash
|
|
# AI Database
|
|
AI_DB_PASSWORD=<password>
|
|
|
|
# Open WebUI
|
|
AI_WEBUI_SECRET_KEY=<secret>
|
|
|
|
# Claude API
|
|
ANTHROPIC_API_KEY=<api_key>
|
|
|
|
# Email (IONOS SMTP)
|
|
ADMIN_EMAIL=<email>
|
|
SMTP_HOST=smtp.ionos.com
|
|
SMTP_PORT=587
|
|
SMTP_USER=<smtp_user>
|
|
SMTP_PASSWORD=<smtp_password>
|
|
```
|
|
|
|
## Backup Configuration
|
|
|
|
AI services are backed up daily via Restic:
|
|
- **ai_postgres_data**: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly)
|
|
- **ai_webui_data**: 3 AM (same retention)
|
|
- **ai_crawl4ai_data**: 3 AM (same retention)
|
|
|
|
Repository: `/mnt/hidrive/users/valknar/Backup`
|
|
|
|
## Management Commands
|
|
|
|
```bash
|
|
# Start AI stack
|
|
pnpm arty up ai_postgres webui crawl4ai
|
|
|
|
# View logs
|
|
docker logs -f ai_webui
|
|
docker logs -f ai_postgres
|
|
docker logs -f ai_crawl4ai
|
|
|
|
# Check Open WebUI
|
|
curl http://ai.pivoine.art:8080/health
|
|
|
|
# Restart AI services
|
|
pnpm arty restart ai_postgres webui crawl4ai
|
|
```
|
|
|
|
## GPU Server Management
|
|
|
|
For GPU server operations (model orchestration, template creation, etc.):
|
|
|
|
```bash
|
|
# Clone the dedicated repository
|
|
git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git
|
|
|
|
# See runpod repository for:
|
|
# - Model orchestration setup
|
|
# - RunPod template creation
|
|
# - GPU deployment guides
|
|
```
|
|
|
|
## Documentation
|
|
|
|
### VPS AI Services
|
|
- [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) - VPS AI deployment history
|
|
|
|
### GPU Server (Separate Repository)
|
|
- [runpod/README.md](https://dev.pivoine.art/valknar/runpod) - Main GPU documentation
|
|
- [runpod/DEPLOYMENT.md](https://dev.pivoine.art/valknar/runpod) - Deployment guide
|
|
- [runpod/RUNPOD_TEMPLATE.md](https://dev.pivoine.art/valknar/runpod) - Template creation
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ VPS (Tailscale: 100.102.217.79) │
|
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
│ │ LiteLLM Proxy (Port 4000) │ │
|
|
│ │ Routes to: Claude API + GPU Orchestrator │ │
|
|
│ └───────┬───────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────▼─────────┐ ┌──────────────┐ ┌─────────────────┐ │
|
|
│ │ Open WebUI │ │ Crawl4AI │ │ AI PostgreSQL │ │
|
|
│ │ Port: 8080 │ │ Port: 11235 │ │ + pgvector │ │
|
|
│ └─────────────────┘ └──────────────┘ └─────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│ Tailscale VPN
|
|
┌──────────────────────────────┼──────────────────────────────────┐
|
|
│ RunPod GPU Server (Tailscale: 100.100.108.13) │
|
|
│ ┌───────────────────────────▼──────────────────────────────┐ │
|
|
│ │ Orchestrator (Port 9000) │ │
|
|
│ │ Manages sequential model loading │ │
|
|
│ └─────┬──────────────┬──────────────────┬──────────────────┘ │
|
|
│ │ │ │ │
|
|
│ ┌─────▼──────┐ ┌────▼────────┐ ┌──────▼───────┐ │
|
|
│ │vLLM │ │Flux.1 │ │MusicGen │ │
|
|
│ │Qwen 2.5 7B │ │Schnell │ │Medium │ │
|
|
│ │Port: 8001 │ │Port: 8002 │ │Port: 8003 │ │
|
|
│ └────────────┘ └─────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Support
|
|
|
|
For issues:
|
|
- **VPS AI services**: Check logs via `docker logs`
|
|
- **GPU server**: See runpod repository documentation
|
|
- **LiteLLM routing**: Review `../litellm-config.yaml`
|