refactor: reorganize directory structure and remove hardcoded paths

Move comfyui and vllm out of models/ directory to top level for better organization. Replace all hardcoded /workspace paths with relative paths to make the configuration portable across different environments. Changes: - Move models/comfyui/ → comfyui/ - Move models/vllm/ → vllm/ - Remove models/ directory (empty) - Update arty.yml: replace /workspace with environment variables - Update supervisord.conf: use relative paths from /workspace/ai - Update all script references to use new paths - Maintain TQDM_DISABLE=1 to fix BrokenPipeError 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 20:49:27 +01:00
parent 9058b1f9dd
commit 897dcb175a
31 changed files with 124 additions and 124 deletions
--- a/comfyui/workflows/WORKFLOW_STANDARDS.md
+++ b/comfyui/workflows/WORKFLOW_STANDARDS.md
@@ -0,0 +1,657 @@
+# ComfyUI Workflow Development Standards
+
+Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.
+
+## Table of Contents
+
+- [Naming Conventions](#naming-conventions)
+- [Workflow Structure](#workflow-structure)
+- [API Integration](#api-integration)
+- [Error Handling](#error-handling)
+- [VRAM Optimization](#vram-optimization)
+- [Quality Assurance](#quality-assurance)
+- [Documentation Requirements](#documentation-requirements)
+- [Testing Guidelines](#testing-guidelines)
+
+## Naming Conventions
+
+### Workflow Files
+
+Format: `{category}-{model}-{type}-{environment}-v{version}.json`
+
+**Components:**
+- `category`: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)
+- `model`: Specific model variant (schnell, dev, small, medium, large)
+- `type`: Operation type (t2i, i2i, i2v, t2m, upscale)
+- `environment`: `production` (stable) or `experimental` (testing)
+- `version`: Semantic versioning (1, 2, 3, etc.)
+
+**Examples:**
+- `flux-schnell-t2i-production-v1.json` - FLUX Schnell text-to-image, production version 1
+- `sdxl-refiner-t2i-production-v2.json` - SDXL with refiner, production version 2
+- `musicgen-large-t2m-experimental-v1.json` - MusicGen large, experimental version 1
+
+### Node Naming
+
+**Descriptive names for all nodes:**
+```json
+{
+  "title": "FLUX Schnell Checkpoint Loader",
+  "type": "CheckpointLoaderSimple",
+  "properties": {
+    "Node name for S&R": "CheckpointLoaderSimple"
+  }
+}
+```
+
+**Naming patterns:**
+- Loaders: `{Model} Checkpoint Loader`, `{Model} VAE Loader`
+- Samplers: `{Model} KSampler`, `{Model} Advanced Sampler`
+- Inputs: `API Text Input`, `API Image Input`, `API Seed Input`
+- Outputs: `API Image Output`, `Preview Output`, `Save Output`
+- Processing: `VAE Encode`, `VAE Decode`, `CLIP Text Encode`
+
+## Workflow Structure
+
+### Required Node Groups
+
+Every production workflow MUST include these node groups:
+
+#### 1. Input Group
+```
+Purpose: Receive parameters from API or UI
+Nodes:
+  - Text input nodes (prompts, negative prompts)
+  - Numeric input nodes (seed, steps, CFG scale)
+  - Image input nodes (for i2i, i2v workflows)
+  - Model selection nodes (if multiple models supported)
+```
+
+#### 2. Model Loading Group
+```
+Purpose: Load required models and components
+Nodes:
+  - Checkpoint/Diffuser loaders
+  - VAE loaders
+  - CLIP text encoders
+  - ControlNet loaders (if applicable)
+  - IP-Adapter loaders (if applicable)
+```
+
+#### 3. Processing Group
+```
+Purpose: Main generation/transformation logic
+Nodes:
+  - Samplers (KSampler, Advanced KSampler)
+  - Encoders (CLIP, VAE)
+  - Conditioning nodes
+  - ControlNet application (if applicable)
+```
+
+#### 4. Post-Processing Group
+```
+Purpose: Refinement and enhancement
+Nodes:
+  - VAE decoding
+  - Upscaling (if applicable)
+  - Face enhancement (Impact-Pack)
+  - Image adjustments
+```
+
+#### 5. Output Group
+```
+Purpose: Save and return results
+Nodes:
+  - SaveImage nodes (for file output)
+  - Preview nodes (for UI feedback)
+  - API output nodes (for orchestrator)
+```
+
+#### 6. Error Handling Group (Optional but Recommended)
+```
+Purpose: Validation and fallback
+Nodes:
+  - Validation nodes
+  - Fallback nodes
+  - Error logging nodes
+```
+
+### Node Organization
+
+**Logical flow (left to right, top to bottom):**
+```
+[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
+                                    ↓
+                            [Error Handling]
+```
+
+**Visual grouping:**
+- Use node positions to create visual separation
+- Group related nodes together
+- Align nodes for readability
+- Use consistent spacing
+
+## API Integration
+
+### Input Nodes
+
+**Required for API compatibility:**
+
+1. **Text Inputs** (prompts, negative prompts)
+```json
+{
+  "inputs": {
+    "text": "A beautiful sunset over mountains",
+    "default": ""
+  },
+  "class_type": "CLIPTextEncode",
+  "title": "API Prompt Input"
+}
+```
+
+2. **Numeric Inputs** (seed, steps, CFG, etc.)
+```json
+{
+  "inputs": {
+    "seed": 42,
+    "steps": 20,
+    "cfg": 7.5,
+    "sampler_name": "euler_ancestral",
+    "scheduler": "normal"
+  },
+  "class_type": "KSampler",
+  "title": "API Sampler Config"
+}
+```
+
+3. **Image Inputs** (for i2i workflows)
+```json
+{
+  "inputs": {
+    "image": "",
+    "upload": "image"
+  },
+  "class_type": "LoadImage",
+  "title": "API Image Input"
+}
+```
+
+### Output Nodes
+
+**Required for orchestrator return:**
+
+```json
+{
+  "inputs": {
+    "images": ["node_id", 0],
+    "filename_prefix": "ComfyUI"
+  },
+  "class_type": "SaveImage",
+  "title": "API Image Output"
+}
+```
+
+### Parameter Validation
+
+**Include validation for critical parameters:**
+
+```json
+{
+  "inputs": {
+    "value": "seed",
+    "min": 0,
+    "max": 4294967295,
+    "default": 42
+  },
+  "class_type": "IntegerInput",
+  "title": "Seed Validator"
+}
+```
+
+## Error Handling
+
+### Required Validations
+
+1. **Model Availability**
+   - Check if checkpoint files exist
+   - Validate model paths
+   - Provide fallback to default models
+
+2. **Parameter Bounds**
+   - Validate numeric ranges (seed, steps, CFG)
+   - Check dimension constraints (width, height)
+   - Validate string inputs (sampler names, scheduler types)
+
+3. **VRAM Limits**
+   - Check batch size against VRAM
+   - Validate resolution against VRAM
+   - Enable tiling for large images
+
+4. **Input Validation**
+   - Verify required inputs are provided
+   - Check image formats and dimensions
+   - Validate prompt lengths
+
+### Fallback Strategies
+
+**Default values for missing inputs:**
+```json
+{
+  "inputs": {
+    "text": "{{prompt | default('A beautiful landscape')}}",
+    "seed": "{{seed | default(42)}}",
+    "steps": "{{steps | default(20)}}"
+  }
+}
+```
+
+**Graceful degradation:**
+- If refiner unavailable, skip refinement step
+- If upscaler fails, return base resolution
+- If face enhancement errors, return unenhanced image
+
+## VRAM Optimization
+
+### Model Unloading
+
+**Explicit model cleanup between stages:**
+
+```json
+{
+  "inputs": {
+    "model": ["checkpoint_loader", 0]
+  },
+  "class_type": "FreeModel",
+  "title": "Unload Base Model"
+}
+```
+
+**When to unload:**
+- After base generation, before refinement
+- After refinement, before upscaling
+- Between different model types (diffusion → CLIP → VAE)
+
+### VAE Tiling
+
+**Enable for high-resolution processing:**
+
+```json
+{
+  "inputs": {
+    "samples": ["sampler", 0],
+    "vae": ["vae_loader", 0],
+    "tile_size": 512,
+    "overlap": 64
+  },
+  "class_type": "VAEDecodeTiled",
+  "title": "Tiled VAE Decode"
+}
+```
+
+**Tiling thresholds:**
+- Use tiled VAE for images >1024x1024
+- Tile size: 512 for 24GB VRAM, 256 for lower
+- Overlap: 64px minimum for seamless tiles
+
+### Attention Slicing
+
+**Reduce memory for large models:**
+
+```json
+{
+  "inputs": {
+    "model": ["checkpoint_loader", 0],
+    "attention_mode": "sliced"
+  },
+  "class_type": "ModelOptimization",
+  "title": "Enable Attention Slicing"
+}
+```
+
+### Batch Processing
+
+**VRAM-safe batch sizes:**
+- FLUX models: batch_size=1
+- SDXL: batch_size=1-2
+- SD3.5: batch_size=1
+- Upscaling: batch_size=1
+
+**Sequential batching:**
+```json
+{
+  "inputs": {
+    "mode": "sequential",
+    "batch_size": 1
+  },
+  "class_type": "BatchProcessor"
+}
+```
+
+## Quality Assurance
+
+### Preview Nodes
+
+**Include preview at key stages:**
+
+```json
+{
+  "inputs": {
+    "images": ["vae_decode", 0]
+  },
+  "class_type": "PreviewImage",
+  "title": "Preview Base Generation"
+}
+```
+
+**Preview locations:**
+- After base generation (before refinement)
+- After refinement (before upscaling)
+- After upscaling (final check)
+- After face enhancement
+
+### Quality Gates
+
+**Checkpoints for validation:**
+
+1. **Resolution Check**
+```json
+{
+  "inputs": {
+    "image": ["input", 0],
+    "min_width": 512,
+    "min_height": 512,
+    "max_width": 2048,
+    "max_height": 2048
+  },
+  "class_type": "ImageSizeValidator"
+}
+```
+
+2. **Quality Metrics**
+```json
+{
+  "inputs": {
+    "image": ["vae_decode", 0],
+    "min_quality_score": 0.7
+  },
+  "class_type": "QualityChecker"
+}
+```
+
+### Save Points
+
+**Save intermediate results:**
+
+```json
+{
+  "inputs": {
+    "images": ["base_generation", 0],
+    "filename_prefix": "intermediate/base_"
+  },
+  "class_type": "SaveImage",
+  "title": "Save Base Generation"
+}
+```
+
+**When to save:**
+- Base generation (before refinement)
+- After each major processing stage
+- Before potentially destructive operations
+
+## Documentation Requirements
+
+### Workflow Metadata
+
+**Include in workflow JSON:**
+
+```json
+{
+  "workflow_info": {
+    "name": "FLUX Schnell Text-to-Image Production",
+    "version": "1.0.0",
+    "author": "RunPod AI Model Orchestrator",
+    "description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
+    "category": "text-to-image",
+    "tags": ["flux", "fast", "production"],
+    "requirements": {
+      "models": ["FLUX.1-schnell"],
+      "custom_nodes": [],
+      "vram_min": "16GB",
+      "vram_recommended": "24GB"
+    },
+    "parameters": {
+      "prompt": {
+        "type": "string",
+        "required": true,
+        "description": "Text description of desired image"
+      },
+      "seed": {
+        "type": "integer",
+        "required": false,
+        "default": 42,
+        "min": 0,
+        "max": 4294967295
+      },
+      "steps": {
+        "type": "integer",
+        "required": false,
+        "default": 4,
+        "min": 1,
+        "max": 20
+      }
+    },
+    "outputs": {
+      "image": {
+        "type": "image",
+        "format": "PNG",
+        "resolution": "1024x1024"
+      }
+    }
+  }
+}
+```
+
+### Node Comments
+
+**Document complex nodes:**
+
+```json
+{
+  "title": "FLUX KSampler - Main Generation",
+  "notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
+  "inputs": {
+    "seed": 42,
+    "steps": 4,
+    "cfg": 1.0
+  }
+}
+```
+
+### Usage Examples
+
+**Include in workflow or README:**
+
+```markdown
+## Example Usage
+
+### ComfyUI Web Interface
+1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
+2. Set prompt: "A serene mountain landscape at sunset"
+3. Adjust seed: 42 (optional)
+4. Click "Queue Prompt"
+
+### Orchestrator API
+```bash
+curl -X POST http://localhost:9000/api/comfyui/generate \
+  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'
+```
+```
+
+## Testing Guidelines
+
+### Manual Testing
+
+**Required tests before production:**
+
+1. **UI Test**
+   - Load in ComfyUI web interface
+   - Execute with default parameters
+   - Verify output quality
+   - Check preview nodes
+   - Confirm save locations
+
+2. **API Test**
+   - Call via orchestrator API
+   - Test with various parameter combinations
+   - Verify JSON response format
+   - Check error handling
+
+3. **Edge Cases**
+   - Missing optional parameters
+   - Invalid parameter values
+   - Out-of-range inputs
+   - Missing models (graceful failure)
+
+### Automated Testing
+
+**Test script template:**
+
+```bash
+#!/bin/bash
+# Test workflow: flux-schnell-t2i-production-v1.json
+
+WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"
+
+# Test 1: Default parameters
+curl -X POST http://localhost:9000/api/comfyui/generate \
+  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
+  | jq '.status' # Should return "success"
+
+# Test 2: Custom parameters
+curl -X POST http://localhost:9000/api/comfyui/generate \
+  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
+  | jq '.status'
+
+# Test 3: Missing prompt (should use default)
+curl -X POST http://localhost:9000/api/comfyui/generate \
+  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
+  | jq '.status'
+```
+
+### Performance Testing
+
+**Measure key metrics:**
+
+```bash
+# Generation time
+time curl -X POST http://localhost:9000/api/comfyui/generate \
+  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'
+
+# VRAM usage
+nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1
+
+# GPU utilization
+nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1
+```
+
+**Performance baselines (24GB VRAM):**
+- FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
+- FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
+- SDXL + Refiner (1024x1024): ~40-60 seconds
+- CogVideoX (6s video): ~120-180 seconds
+
+### Load Testing
+
+**Concurrent request handling:**
+
+```bash
+# Test 5 concurrent generations
+for i in {1..5}; do
+  curl -X POST http://localhost:9000/api/comfyui/generate \
+    -d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
+done
+wait
+```
+
+## Version Control
+
+### Semantic Versioning
+
+**Version increments:**
+- `v1` → `v2`: Major changes (different models, restructured workflow)
+- Internal iterations: Keep same version, document changes in git commits
+
+### Change Documentation
+
+**Changelog format:**
+
+```markdown
+## flux-schnell-t2i-production-v2.json
+
+### Changes from v1
+- Added API input validation
+- Optimized VRAM usage with model unloading
+- Added preview node after generation
+- Updated default steps from 4 to 6
+
+### Breaking Changes
+- Changed output node structure (requires orchestrator update)
+
+### Migration Guide
+- Update API calls to use new parameter names
+- Clear ComfyUI cache before loading v2
+```
+
+### Deprecation Process
+
+**Sunsetting old versions:**
+
+1. Mark old version as deprecated in README
+2. Keep deprecated version for 2 releases
+3. Add deprecation warning in workflow metadata
+4. Document migration path to new version
+5. Archive deprecated workflows in `archive/` directory
+
+## Best Practices
+
+### DO
+
+- Use descriptive node names
+- Include preview nodes at key stages
+- Validate all inputs
+- Optimize for VRAM efficiency
+- Document all parameters
+- Test with both UI and API
+- Version your workflows
+- Include error handling
+- Save intermediate results
+- Use semantic naming
+
+### DON'T
+
+- Hardcode file paths
+- Assume unlimited VRAM
+- Skip input validation
+- Omit documentation
+- Create overly complex workflows
+- Use experimental nodes in production
+- Ignore VRAM optimization
+- Skip testing edge cases
+- Use unclear node names
+- Forget to version
+
+## Resources
+
+- **ComfyUI Wiki**: https://github.com/comfyanonymous/ComfyUI/wiki
+- **Custom Nodes List**: https://github.com/ltdrdata/ComfyUI-Manager
+- **VRAM Optimization Guide**: `/workspace/ai/CLAUDE.md`
+- **Model Documentation**: `/workspace/ai/COMFYUI_MODELS.md`
+
+## Support
+
+For questions or issues:
+1. Review this standards document
+2. Check ComfyUI logs: `supervisorctl tail -f comfyui`
+3. Test workflow in UI before API
+4. Validate JSON syntax
+5. Check model availability