Commit Graph

436 Commits

Author SHA1 Message Date
e2e0927291 feat: update LiteLLM to use RunPod GPU via Tailscale
- Update api_base URLs from 100.100.108.13 to 100.121.199.88 (RunPod Tailscale IP)
- All self-hosted models (qwen-2.5-7b, flux-schnell, musicgen-medium) now route through Tailscale VPN
- Tested and verified connectivity between VPS and RunPod GPU orchestrator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:42:27 +01:00
a5ed2be933 docs: remove outdated ai/README.md
Removed outdated AI infrastructure README that referenced GPU services.
VPS AI services (Open WebUI, Crawl4AI, facefusion) are documented in compose.yaml comments.
GPU infrastructure docs are now in dedicated runpod repository.
2025-11-21 14:42:23 +01:00
d5e37dbd3f cleanup: remove GPU/RunPod files from docker-compose repository
Removed GPU orchestration files migrated to dedicated runpod repository:
- Model orchestrator, vLLM, Flux, MusicGen services
- GPU Docker Compose files and configs
- GPU deployment scripts and documentation

Kept VPS AI services and facefusion:
- compose.yaml (VPS AI + facefusion)
- litellm-config.yaml (VPS LiteLLM)
- postgres/ (VPS PostgreSQL init)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)
- README.md (updated with runpod reference)

GPU infrastructure now maintained at: ssh://git@dev.pivoine.art:2222/valknar/runpod.git
2025-11-21 14:41:10 +01:00
abcebd1d9b docs: migrate multi-modal AI orchestration to dedicated runpod repository
Multi-modal AI stack (text/image/music generation) has been moved to:
Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git

Updated ai/README.md to document:
- VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL)
- Reference to new runpod repository for GPU infrastructure
- Clear separation between VPS and GPU deployments
- Integration architecture via Tailscale VPN
2025-11-21 14:36:36 +01:00
3ed3e68271 feat(ai): add multi-modal orchestration system for text, image, and music generation
Implemented a cost-optimized AI infrastructure running on single RTX 4090 GPU with
automatic model switching based on request type. This enables text, image, and
music generation on the same hardware with sequential loading.

## New Components

**Model Orchestrator** (ai/model-orchestrator/):
- FastAPI service managing model lifecycle
- Automatic model detection and switching based on request type
- OpenAI-compatible API proxy for all models
- Simple YAML configuration for adding new models
- Docker SDK integration for service management
- Endpoints: /v1/chat/completions, /v1/images/generations, /v1/audio/generations

**Text Generation** (ai/vllm/):
- Reorganized existing vLLM server into proper structure
- Qwen 2.5 7B Instruct (14GB VRAM, ~50 tok/sec)
- Docker containerized with CUDA 12.4 support

**Image Generation** (ai/flux/):
- Flux.1 Schnell for fast, high-quality images
- 14GB VRAM, 4-5 sec per image
- OpenAI DALL-E compatible API
- Pre-built image: ghcr.io/matatonic/openedai-images-flux

**Music Generation** (ai/musicgen/):
- Meta's MusicGen Medium (facebook/musicgen-medium)
- Text-to-music generation (11GB VRAM)
- 60-90 seconds for 30s audio clips
- Custom FastAPI wrapper with AudioCraft

## Architecture

```
VPS (LiteLLM) → Tailscale VPN → GPU Orchestrator (Port 9000)
                                       ↓
                       ┌───────────────┼───────────────┐
                  vLLM (8001)    Flux (8002)    MusicGen (8003)
                   [Only ONE active at a time - sequential loading]
```

## Configuration Files

- docker-compose.gpu.yaml: Main orchestration file for RunPod deployment
- model-orchestrator/models.yaml: Model registry (easy to add new models)
- .env.example: Environment variable template
- README.md: Comprehensive deployment and usage guide

## Updated Files

- litellm-config.yaml: Updated to route through orchestrator (port 9000)
- GPU_DEPLOYMENT_LOG.md: Documented multi-modal architecture

## Features

 Automatic model switching (30-120s latency)
 Cost-optimized single GPU deployment (~$0.50/hr vs ~$0.75/hr multi-GPU)
 Easy model addition via YAML configuration
 OpenAI-compatible APIs for all model types
 Centralized routing through LiteLLM proxy
 GPU memory safety (only one model loaded at time)

## Usage

Deploy to RunPod:
```bash
scp -r ai/* gpu-pivoine:/workspace/ai/
ssh gpu-pivoine "cd /workspace/ai && docker compose -f docker-compose.gpu.yaml up -d orchestrator"
```

Test models:
```bash
# Text
curl http://100.100.108.13:9000/v1/chat/completions -d '{"model":"qwen-2.5-7b","messages":[...]}'

# Image
curl http://100.100.108.13:9000/v1/images/generations -d '{"model":"flux-schnell","prompt":"..."}'

# Music
curl http://100.100.108.13:9000/v1/audio/generations -d '{"model":"musicgen-medium","prompt":"..."}'
```

All models available via Open WebUI at https://ai.pivoine.art

## Adding New Models

1. Add entry to models.yaml
2. Define Docker service in docker-compose.gpu.yaml
3. Restart orchestrator

That's it! The orchestrator automatically detects and manages the new model.

## Performance

| Model | VRAM | Startup | Speed |
|-------|------|---------|-------|
| Qwen 2.5 7B | 14GB | 120s | ~50 tok/sec |
| Flux.1 Schnell | 14GB | 60s | 4-5s/image |
| MusicGen Medium | 11GB | 45s | 60-90s for 30s audio |

Model switching overhead: 30-120 seconds

## License Notes

- vLLM: Apache 2.0
- Flux.1: Apache 2.0
- AudioCraft: MIT (code), CC-BY-NC (pre-trained weights - non-commercial)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:12:13 +01:00
bb3dabcba7 feat(ai): complete GPU deployment with self-hosted Qwen 2.5 7B model
This commit finalizes the GPU infrastructure deployment on RunPod:

- Added qwen-2.5-7b model to LiteLLM configuration
  - Self-hosted on RunPod RTX 4090 GPU server
  - Connected via Tailscale VPN (100.100.108.13:8000)
  - OpenAI-compatible API endpoint
  - Rate limits: 1000 RPM, 100k TPM

- Marked GPU deployment as COMPLETE in deployment log
  - vLLM 0.6.4.post1 with custom AsyncLLMEngine server
  - Qwen/Qwen2.5-7B-Instruct model (14.25 GB)
  - 85% GPU memory utilization, 4096 context length
  - Successfully integrated with Open WebUI at ai.pivoine.art

Infrastructure:
- Provider: RunPod Spot Instance (~$0.50/hr)
- GPU: NVIDIA RTX 4090 24GB
- Disk: 50GB local SSD + 922TB network volume
- VPN: Tailscale (replaces WireGuard due to RunPod UDP restrictions)

Model now visible and accessible in Open WebUI for end users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 13:18:17 +01:00
8de88d96ac docs(ai): add comprehensive GPU setup documentation and configs
- Add setup guides (SETUP_GUIDE, TAILSCALE_SETUP, DOCKER_GPU_SETUP, etc.)
- Add deployment configurations (litellm-config-gpu.yaml, gpu-server-compose.yaml)
- Add GPU_DEPLOYMENT_LOG.md with current infrastructure details
- Add GPU_EXPANSION_PLAN.md with complete provider comparison
- Add deploy-gpu-stack.sh automation script

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 12:57:06 +01:00
c0b1308ffe feat(ai): add GPU server deployment with vLLM and Tailscale
- Add simple_vllm_server.py: Custom AsyncLLMEngine FastAPI server
  - Bypasses multiprocessing issues on RunPod
  - OpenAI-compatible API (/v1/models, /v1/completions, /v1/chat/completions)
  - Uses Qwen 2.5 7B Instruct model

- Add comprehensive setup guides:
  - SETUP_GUIDE.md: RunPod account and GPU server setup
  - TAILSCALE_SETUP.md: VPN configuration (replaces WireGuard)
  - DOCKER_GPU_SETUP.md: Docker + NVIDIA Container Toolkit
  - README_GPU_SETUP.md: Main documentation hub

- Add deployment configurations:
  - litellm-config-gpu.yaml: LiteLLM config with GPU endpoints
  - gpu-server-compose.yaml: Docker Compose for GPU services
  - deploy-gpu-stack.sh: Automated deployment script

- Add GPU_DEPLOYMENT_LOG.md: Current deployment documentation
  - Network: Tailscale IP 100.100.108.13
  - Infrastructure: RunPod RTX 4090, 50GB disk
  - Known issues and troubleshooting guide

- Add GPU_EXPANSION_PLAN.md: 70-page comprehensive expansion plan

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 12:56:57 +01:00
e22936ecbe fix: set Docker API version for Watchtower compatibility
Add DOCKER_API_VERSION=1.44 environment variable to Watchtower
to ensure compatibility with upgraded Docker daemon.

The Watchtower image (v1.7.1) has an older Docker client that
defaults to API version 1.25, which is incompatible with the
new Docker daemon requiring API version 1.44+.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 19:24:57 +01:00
7cdab58018 feat: enable Watchtower auto-updates for all application services
Add missing Watchtower labels to:
- net_umami: Analytics service
- dev_gitea_runner: CI/CD runner
- sexy_api: Directus CMS backend
- util_linkwarden_meilisearch: Search engine

All application services now have automatic updates enabled.
Critical infrastructure (postgres, redis, traefik) intentionally
excluded from auto-updates for stability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 18:45:38 +01:00
d583015d2b Revert "perf: use local volume for Pinchflat downloads instead of WebDAV"
This reverts commit 5f2fb12436.
2025-11-20 15:34:11 +01:00
5f2fb12436 perf: use local volume for Pinchflat downloads instead of WebDAV
The HiDrive WebDAV mount was causing severe performance issues:
- High latency for directory listings and file checks
- Slow UI page loads (multi-second delays)
- Database query idle times of 600-1600ms

Changed to use local Docker volume for /downloads, which provides:
- Fast filesystem operations
- Responsive UI
- No database connection delays

Note: Downloads are now stored locally. Set up rsync/rclone
to sync to HiDrive if remote storage is needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:33:24 +01:00
92c3125773 fix: add JOURNAL_MODE=delete for Pinchflat SQLite on network share
SQLite was experiencing connection timeouts and errors because the
downloads folder is on a HiDrive network mount. Setting JOURNAL_MODE
to delete fixes SQLite locking issues on network filesystems.

Fixes: database connection timeouts and "Sqlite3 was invoked incorrectly" errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:26:33 +01:00
6c3f4bb186 feat: add Pinchflat to Authelia access control
Add pinchflat.media.pivoine.art to protected services requiring
one-factor authentication via Authelia SSO.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:10:48 +01:00
f2f85ae236 feat: add Pinchflat YouTube download manager to media stack
- Add Pinchflat service with Authelia SSO protection
- Configure download folder at /mnt/hidrive/users/valknar/Downloads/pinchflat
- Expose on pinchflat.media.pivoine.art
- Port 8945 with WebSocket support
- Protected by net-authelia middleware for secure access

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:06:15 +01:00
256ee786b2 refactor: remove unused terminal subdomain routing
The terminal.coolify.dev.pivoine.art subdomain is not needed since:
- Browser connects to wss://coolify.dev.pivoine.art/terminal/ws
- Terminal server only provides /ready health check endpoint
- Health checks are handled by Docker's internal healthcheck

Final routing configuration:
- realtime.coolify.dev.pivoine.art → port 6001 (soketi)
- coolify.dev.pivoine.art/terminal/ws → port 6002 (terminal path)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:59:08 +01:00
c561914f49 fix: route /terminal/ws path on main domain to realtime:6002
The browser connects to wss://coolify.dev.pivoine.art/terminal/ws,
not the terminal subdomain. Add path-based router with priority 100
to intercept /terminal/ws and route to coolify_realtime port 6002.

Routes configured:
- realtime.coolify.dev.pivoine.art → port 6001 (soketi)
- terminal.coolify.dev.pivoine.art → port 6002 (terminal)
- coolify.dev.pivoine.art/terminal/ws → port 6002 (terminal path)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:43:22 +01:00
96407fb57a fix: use single coolify-realtime container for both services
Based on Coolify's official docker-compose.prod.yml:
- Combine soketi and terminal into single coolify_realtime service
- Mount SSH keys at /data/coolify/ssh for terminal access
- Expose both port 6001 (realtime) and 6002 (terminal)
- Use combined health check for both ports
- Create separate Traefik services and routers for each subdomain
- Remove non-existent TERMINAL_HOST/TERMINAL_PORT variables
- realtime.coolify.dev.pivoine.art → port 6001
- terminal.coolify.dev.pivoine.art → port 6002

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:38:21 +01:00
45ea016aaa feat: expose terminal server on terminal.coolify.dev.pivoine.art
- Add Traefik labels to expose terminal server publicly
- Configure terminal server on terminal.coolify.dev.pivoine.art
- Update Coolify app to use public terminal hostname
- Change TERMINAL_HOST to terminal.coolify.dev.pivoine.art
- Change TERMINAL_PORT to 443 for HTTPS WebSocket connections

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:33:02 +01:00
438bbccadf feat: configure Coolify to connect to internal terminal server
- Add TERMINAL_HOST and TERMINAL_PORT environment variables to Coolify app
- Configure Coolify to use dev_coolify_terminal container on port 6002
- Add dependency on coolify_terminal service with health check
- Keep terminal server internal-only without direct Traefik routing
- Coolify app will proxy /terminal/ws to internal terminal server

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 14:29:43 +01:00
2b5d4d527d fix: use coolify-realtime image without path stripping for terminal 2025-11-17 14:21:41 +01:00
7fd0199e1a feat: strip /terminal/ws prefix before routing to soketi 2025-11-17 14:18:25 +01:00
0e5b539936 fix: remove path stripping from terminal router 2025-11-17 14:15:51 +01:00
f95a3ff143 fix: use standard soketi image for terminal on port 6002 2025-11-17 14:13:39 +01:00
710222e705 feat: add dedicated terminal service on port 6002 with path stripping 2025-11-17 14:10:29 +01:00
48fd6f87fe revert: restore working soketi configuration 2025-11-17 14:04:48 +01:00
eb10348988 fix: merge terminal into single coolify_soketi container with dual ports 2025-11-17 13:40:33 +01:00
417fbb6ff1 feat: configure Coolify to use terminal server internally 2025-11-17 13:35:23 +01:00
3050bbb859 feat: add dedicated coolify_terminal service for port 6002 2025-11-17 13:31:00 +01:00
6f1cce8c88 fix: remove unnecessary volumes and env vars from soketi 2025-11-17 13:28:09 +01:00
8e6c73f82d feat: use coolify-realtime image for port 6002 support 2025-11-17 13:27:24 +01:00
85ef8ecb36 feat: add terminal WebSocket router on port 6002 2025-11-17 13:25:48 +01:00
d812ede999 revert: restore original soketi configuration 2025-11-17 13:23:59 +01:00
fc23e22112 fix: use CMD-SHELL for soketi healthcheck with && 2025-11-17 13:21:13 +01:00
84c9d91bcf fix: remove explicit service link from soketi router 2025-11-17 13:19:34 +01:00
96004a38c2 fix: add path prefix stripping for terminal WebSocket
- Add stripprefix middleware to remove /terminal prefix
- Route /terminal/ws to /ws on terminal server (port 6002)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 13:13:21 +01:00
cd47bce06b fix: use coolify-realtime image with terminal WebSocket support
- Switch from standard soketi to coolify-realtime:1.0.10 image
- Add SSH volume mount for terminal functionality
- Update health check to verify both ports 6001 and 6002
- Add explicit service link for realtime HTTPS router

This fixes both realtime WebSocket and terminal/ws functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 13:10:08 +01:00
d90f0179df feat: route Coolify terminal WebSocket to Soketi port 6002
- Move /terminal/ws routing from main Coolify container to Soketi
- Configure Traefik to route terminal WebSocket traffic to port 6002
- Add high priority (100) to ensure path matching
- Based on official Coolify docker-compose.prod.yml configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 13:04:08 +01:00
27c3218784 fix: map /terminal/ws path to port 6002
Route terminal WebSocket to port 6002 on Coolify container
as requested.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:58:35 +01:00
1af4ec5fca fix: add dedicated router for terminal WebSocket without compression
The terminal WebSocket is served by main Coolify on port 8080.
Create separate router with priority 100 for /terminal/ws path
without compression middleware which blocks WebSocket upgrades.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:56:34 +01:00
4dee03dd86 fix: use direct container URL for terminal WebSocket routing
Route to dev_coolify_soketi container via URL instead of port-only,
which allows Traefik to reach the correct container.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:49:27 +01:00
d1357206e8 fix: route terminal WebSocket to Soketi container port 6001
Terminal WebSocket should connect through the Soketi/realtime
container which handles Pusher protocol on port 6001.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:49:07 +01:00
f36c10a5b4 feat: add Traefik route for terminal WebSocket path
Route /terminal/ws to port 6002 on Coolify container
Set priority 100 to take precedence over main router

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:47:02 +01:00
41841f800e fix: remove terminal-specific routing (handled by main router)
The /terminal/ws endpoint is part of the main Coolify application
on port 8080, not a separate service. WebSocket requests should go
through the main router automatically.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:41:44 +01:00
251ea6b775 feat: add Traefik route for Coolify terminal WebSocket
- Route /terminal/ws path to port 6002 on Coolify container
- Enable WebSocket terminal functionality in Coolify UI
- Path-based routing on main domain

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:39:56 +01:00
22deecdbe8 revert: remove terminal port 6002 configuration
Port 6002 is not active in default Coolify deployment.
Terminal functionality appears to work through main port 8080
or requires additional configuration not documented.

Need to investigate Coolify terminal enablement further.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:37:08 +01:00
46105b1f25 feat: enable Coolify terminal interface
- Add Traefik routing for terminal service on port 6002
- Accessible at terminal.coolify.dev.pivoine.art
- Enable web-based terminal access for deployments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:35:08 +01:00
94a8df8fa1 refactor: simplify Coolify realtime subdomain
Change from coolify-realtime.coolify.dev.pivoine.art
to realtime.coolify.dev.pivoine.art for cleaner URLs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 12:28:36 +01:00
102484d88c fix: remove unused Coolify mail env vars, use database config
Coolify stores SMTP settings in the database (instance_settings table)
rather than reading from environment variables.

SMTP settings configured directly in database:
- smtp_enabled: true
- smtp_host: net_mailpit
- smtp_port: 1025
- smtp_from_address: hi@pivoine.art
- smtp_from_name: Coolify

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 11:48:32 +01:00
ab1d350af3 feat: enable email notifications in Coolify
- Add MAIL_MAILER=smtp to use SMTP transport
- Configure MAIL_HOST and MAIL_PORT to use Mailpit relay
- Set MAIL_FROM_ADDRESS and MAIL_FROM_NAME for sender info
- No encryption/auth needed for internal Mailpit relay

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 11:40:55 +01:00