1ad99cdb53
refactor: replace orchestrator with dedicated vLLM servers for Qwen and Llama
2025-11-23 16:00:03 +01:00
cc0f55df38
fix: reduce max_model_len to 20000 to fit in 24GB VRAM
2025-11-23 15:43:37 +01:00
5cfd03f1ef
fix: improve streaming with proper delta format and increase max_model_len to 32768
2025-11-23 15:38:18 +01:00
fdd724298a
fix: increase max_tokens limit from 4096 to 32768 for LLMX CLI support
2025-11-23 15:10:06 +01:00
a8c2ee1b90
fix: make model name and port configurable via environment variables
2025-11-23 13:45:01 +01:00
16112e50f6
fix: relax dependency version constraints for vllm compatibility
2025-11-23 13:33:46 +01:00
e0a43259d4
fix: update pydantic version constraint to match vllm requirements
2025-11-23 13:33:22 +01:00
b94df17845
feat: add requirements.txt for vLLM models
...
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-23 13:25:03 +01:00
897dcb175a
refactor: reorganize directory structure and remove hardcoded paths
...
Move comfyui and vllm out of models/ directory to top level for better
organization. Replace all hardcoded /workspace paths with relative paths
to make the configuration portable across different environments.
Changes:
- Move models/comfyui/ → comfyui/
- Move models/vllm/ → vllm/
- Remove models/ directory (empty)
- Update arty.yml: replace /workspace with environment variables
- Update supervisord.conf: use relative paths from /workspace/ai
- Update all script references to use new paths
- Maintain TQDM_DISABLE=1 to fix BrokenPipeError
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 20:49:27 +01:00
9a637cc4fc
refactor: clean Docker files and restore standalone model services
...
- Remove all Docker-related files (Dockerfiles, compose.yaml)
- Remove documentation files (README, ARCHITECTURE, docs/)
- Remove old core/ directory (base_service, service_manager)
- Update models.yaml with correct service_script paths (models/*/server.py)
- Simplify vLLM requirements.txt to let vLLM manage dependencies
- Restore original standalone vLLM server (no base_service dependency)
- Remove obsolete vllm/, musicgen/, flux/ directories
Process-based architecture is now fully functional on RunPod.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 16:17:38 +01:00
277f1c95bd
Initial commit: RunPod multi-modal AI orchestration stack
...
- Multi-modal AI infrastructure for RunPod RTX 4090
- Automatic model orchestration (text, image, music)
- Text: vLLM + Qwen 2.5 7B Instruct
- Image: Flux.1 Schnell via OpenEDAI
- Music: MusicGen Medium via AudioCraft
- Cost-optimized sequential loading on single GPU
- Template preparation scripts for rapid deployment
- Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00