Commit Graph

12 Commits

Author SHA1 Message Date
5af3eeb333 feat: add BGE embedding service and reorganize supervisor groups
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
- Add vLLM embedding server for BAAI/bge-large-en-v1.5 (port 8002)
- Reorganize supervisor into two logical groups:
  - comfyui-services: comfyui, webdav-sync
  - vllm-services: vllm-qwen, vllm-llama, vllm-embedding
- Update arty.yml service management scripts for new group structure
- Add individual service control scripts for all vLLM models

Note: Embedding server currently uses placeholder implementation
For production use, switch to sentence-transformers or native vLLM embedding mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 06:32:01 +01:00
1ad99cdb53 refactor: replace orchestrator with dedicated vLLM servers for Qwen and Llama 2025-11-23 16:00:03 +01:00
cc0f55df38 fix: reduce max_model_len to 20000 to fit in 24GB VRAM 2025-11-23 15:43:37 +01:00
5cfd03f1ef fix: improve streaming with proper delta format and increase max_model_len to 32768 2025-11-23 15:38:18 +01:00
fdd724298a fix: increase max_tokens limit from 4096 to 32768 for LLMX CLI support 2025-11-23 15:10:06 +01:00
a8c2ee1b90 fix: make model name and port configurable via environment variables 2025-11-23 13:45:01 +01:00
16112e50f6 fix: relax dependency version constraints for vllm compatibility 2025-11-23 13:33:46 +01:00
e0a43259d4 fix: update pydantic version constraint to match vllm requirements 2025-11-23 13:33:22 +01:00
b94df17845 feat: add requirements.txt for vLLM models
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 13:25:03 +01:00
897dcb175a refactor: reorganize directory structure and remove hardcoded paths
Move comfyui and vllm out of models/ directory to top level for better
organization. Replace all hardcoded /workspace paths with relative paths
to make the configuration portable across different environments.

Changes:
- Move models/comfyui/ → comfyui/
- Move models/vllm/ → vllm/
- Remove models/ directory (empty)
- Update arty.yml: replace /workspace with environment variables
- Update supervisord.conf: use relative paths from /workspace/ai
- Update all script references to use new paths
- Maintain TQDM_DISABLE=1 to fix BrokenPipeError

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 20:49:27 +01:00
9a637cc4fc refactor: clean Docker files and restore standalone model services
- Remove all Docker-related files (Dockerfiles, compose.yaml)
- Remove documentation files (README, ARCHITECTURE, docs/)
- Remove old core/ directory (base_service, service_manager)
- Update models.yaml with correct service_script paths (models/*/server.py)
- Simplify vLLM requirements.txt to let vLLM manage dependencies
- Restore original standalone vLLM server (no base_service dependency)
- Remove obsolete vllm/, musicgen/, flux/ directories

Process-based architecture is now fully functional on RunPod.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:17:38 +01:00
277f1c95bd Initial commit: RunPod multi-modal AI orchestration stack
- Multi-modal AI infrastructure for RunPod RTX 4090
- Automatic model orchestration (text, image, music)
- Text: vLLM + Qwen 2.5 7B Instruct
- Image: Flux.1 Schnell via OpenEDAI
- Music: MusicGen Medium via AudioCraft
- Cost-optimized sequential loading on single GPU
- Template preparation scripts for rapid deployment
- Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00