feat: add Supervisor process manager for service management

- Add supervisord.conf with ComfyUI and orchestrator services
- Update Ansible playbook with supervisor installation tag
- Rewrite start-all.sh and stop-all.sh to use Supervisor
- Add status.sh script for checking service status
- Update arty.yml with supervisor commands and shortcuts
- Update CLAUDE.md with Supervisor documentation and troubleshooting
- Services now auto-restart on crashes with centralized logging

Benefits:
- Better process control than manual pkill/background jobs
- Auto-restart on service crashes
- Centralized log management in /workspace/logs/
- Web interface for monitoring (port 9001)
- Works perfectly in RunPod containers (no systemd needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-22 09:22:16 +01:00
parent 2207d60f98
commit 664da9f4ea
7 changed files with 306 additions and 29 deletions

View File

@@ -1,22 +1,49 @@
#!/bin/bash
#
# Stop AI Services
# Gracefully stops all running AI services
# Gracefully stops all services managed by Supervisor
#
set -e
WORKSPACE_DIR="${WORKSPACE_DIR:-/workspace}"
SUPERVISORD_CONF="${WORKSPACE_DIR}/supervisord.conf"
echo "========================================="
echo " Stopping AI Services"
echo "========================================="
echo ""
# Kill orchestrator and model processes
echo "Stopping orchestrator..."
pkill -f "orchestrator_subprocess.py" || echo "Orchestrator not running"
# Check if supervisord is running
if [ ! -f "${WORKSPACE_DIR}/supervisord.pid" ]; then
echo "Supervisor is not running (no PID file found)"
echo "Cleaning up any stray processes..."
pkill -f "orchestrator_subprocess.py" || echo " - Orchestrator not running"
pkill -f "ComfyUI.*main.py" || echo " - ComfyUI not running"
echo ""
echo "All services stopped"
exit 0
fi
echo "Stopping model services..."
pkill -f "models/vllm/server.py" || echo "vLLM not running"
PID=$(cat "${WORKSPACE_DIR}/supervisord.pid")
if ! ps -p "$PID" > /dev/null 2>&1; then
echo "Supervisor PID file exists but process is not running"
echo "Removing stale PID file..."
rm -f "${WORKSPACE_DIR}/supervisord.pid"
echo ""
echo "All services stopped"
exit 0
fi
# Stop all supervised services
echo "Stopping all supervised services..."
supervisorctl -c "${SUPERVISORD_CONF}" stop all
sleep 2
# Shutdown supervisord
echo "Shutting down Supervisor daemon..."
supervisorctl -c "${SUPERVISORD_CONF}" shutdown
echo ""
echo "All services stopped"