Move comfyui and vllm out of models/ directory to top level for better organization. Replace all hardcoded /workspace paths with relative paths to make the configuration portable across different environments. Changes: - Move models/comfyui/ → comfyui/ - Move models/vllm/ → vllm/ - Remove models/ directory (empty) - Update arty.yml: replace /workspace with environment variables - Update supervisord.conf: use relative paths from /workspace/ai - Update all script references to use new paths - Maintain TQDM_DISABLE=1 to fix BrokenPipeError 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
658 lines
14 KiB
Markdown
658 lines
14 KiB
Markdown
# ComfyUI Workflow Development Standards
|
|
|
|
Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.
|
|
|
|
## Table of Contents
|
|
|
|
- [Naming Conventions](#naming-conventions)
|
|
- [Workflow Structure](#workflow-structure)
|
|
- [API Integration](#api-integration)
|
|
- [Error Handling](#error-handling)
|
|
- [VRAM Optimization](#vram-optimization)
|
|
- [Quality Assurance](#quality-assurance)
|
|
- [Documentation Requirements](#documentation-requirements)
|
|
- [Testing Guidelines](#testing-guidelines)
|
|
|
|
## Naming Conventions
|
|
|
|
### Workflow Files
|
|
|
|
Format: `{category}-{model}-{type}-{environment}-v{version}.json`
|
|
|
|
**Components:**
|
|
- `category`: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)
|
|
- `model`: Specific model variant (schnell, dev, small, medium, large)
|
|
- `type`: Operation type (t2i, i2i, i2v, t2m, upscale)
|
|
- `environment`: `production` (stable) or `experimental` (testing)
|
|
- `version`: Semantic versioning (1, 2, 3, etc.)
|
|
|
|
**Examples:**
|
|
- `flux-schnell-t2i-production-v1.json` - FLUX Schnell text-to-image, production version 1
|
|
- `sdxl-refiner-t2i-production-v2.json` - SDXL with refiner, production version 2
|
|
- `musicgen-large-t2m-experimental-v1.json` - MusicGen large, experimental version 1
|
|
|
|
### Node Naming
|
|
|
|
**Descriptive names for all nodes:**
|
|
```json
|
|
{
|
|
"title": "FLUX Schnell Checkpoint Loader",
|
|
"type": "CheckpointLoaderSimple",
|
|
"properties": {
|
|
"Node name for S&R": "CheckpointLoaderSimple"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Naming patterns:**
|
|
- Loaders: `{Model} Checkpoint Loader`, `{Model} VAE Loader`
|
|
- Samplers: `{Model} KSampler`, `{Model} Advanced Sampler`
|
|
- Inputs: `API Text Input`, `API Image Input`, `API Seed Input`
|
|
- Outputs: `API Image Output`, `Preview Output`, `Save Output`
|
|
- Processing: `VAE Encode`, `VAE Decode`, `CLIP Text Encode`
|
|
|
|
## Workflow Structure
|
|
|
|
### Required Node Groups
|
|
|
|
Every production workflow MUST include these node groups:
|
|
|
|
#### 1. Input Group
|
|
```
|
|
Purpose: Receive parameters from API or UI
|
|
Nodes:
|
|
- Text input nodes (prompts, negative prompts)
|
|
- Numeric input nodes (seed, steps, CFG scale)
|
|
- Image input nodes (for i2i, i2v workflows)
|
|
- Model selection nodes (if multiple models supported)
|
|
```
|
|
|
|
#### 2. Model Loading Group
|
|
```
|
|
Purpose: Load required models and components
|
|
Nodes:
|
|
- Checkpoint/Diffuser loaders
|
|
- VAE loaders
|
|
- CLIP text encoders
|
|
- ControlNet loaders (if applicable)
|
|
- IP-Adapter loaders (if applicable)
|
|
```
|
|
|
|
#### 3. Processing Group
|
|
```
|
|
Purpose: Main generation/transformation logic
|
|
Nodes:
|
|
- Samplers (KSampler, Advanced KSampler)
|
|
- Encoders (CLIP, VAE)
|
|
- Conditioning nodes
|
|
- ControlNet application (if applicable)
|
|
```
|
|
|
|
#### 4. Post-Processing Group
|
|
```
|
|
Purpose: Refinement and enhancement
|
|
Nodes:
|
|
- VAE decoding
|
|
- Upscaling (if applicable)
|
|
- Face enhancement (Impact-Pack)
|
|
- Image adjustments
|
|
```
|
|
|
|
#### 5. Output Group
|
|
```
|
|
Purpose: Save and return results
|
|
Nodes:
|
|
- SaveImage nodes (for file output)
|
|
- Preview nodes (for UI feedback)
|
|
- API output nodes (for orchestrator)
|
|
```
|
|
|
|
#### 6. Error Handling Group (Optional but Recommended)
|
|
```
|
|
Purpose: Validation and fallback
|
|
Nodes:
|
|
- Validation nodes
|
|
- Fallback nodes
|
|
- Error logging nodes
|
|
```
|
|
|
|
### Node Organization
|
|
|
|
**Logical flow (left to right, top to bottom):**
|
|
```
|
|
[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
|
|
↓
|
|
[Error Handling]
|
|
```
|
|
|
|
**Visual grouping:**
|
|
- Use node positions to create visual separation
|
|
- Group related nodes together
|
|
- Align nodes for readability
|
|
- Use consistent spacing
|
|
|
|
## API Integration
|
|
|
|
### Input Nodes
|
|
|
|
**Required for API compatibility:**
|
|
|
|
1. **Text Inputs** (prompts, negative prompts)
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"text": "A beautiful sunset over mountains",
|
|
"default": ""
|
|
},
|
|
"class_type": "CLIPTextEncode",
|
|
"title": "API Prompt Input"
|
|
}
|
|
```
|
|
|
|
2. **Numeric Inputs** (seed, steps, CFG, etc.)
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"seed": 42,
|
|
"steps": 20,
|
|
"cfg": 7.5,
|
|
"sampler_name": "euler_ancestral",
|
|
"scheduler": "normal"
|
|
},
|
|
"class_type": "KSampler",
|
|
"title": "API Sampler Config"
|
|
}
|
|
```
|
|
|
|
3. **Image Inputs** (for i2i workflows)
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"image": "",
|
|
"upload": "image"
|
|
},
|
|
"class_type": "LoadImage",
|
|
"title": "API Image Input"
|
|
}
|
|
```
|
|
|
|
### Output Nodes
|
|
|
|
**Required for orchestrator return:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"images": ["node_id", 0],
|
|
"filename_prefix": "ComfyUI"
|
|
},
|
|
"class_type": "SaveImage",
|
|
"title": "API Image Output"
|
|
}
|
|
```
|
|
|
|
### Parameter Validation
|
|
|
|
**Include validation for critical parameters:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"value": "seed",
|
|
"min": 0,
|
|
"max": 4294967295,
|
|
"default": 42
|
|
},
|
|
"class_type": "IntegerInput",
|
|
"title": "Seed Validator"
|
|
}
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Required Validations
|
|
|
|
1. **Model Availability**
|
|
- Check if checkpoint files exist
|
|
- Validate model paths
|
|
- Provide fallback to default models
|
|
|
|
2. **Parameter Bounds**
|
|
- Validate numeric ranges (seed, steps, CFG)
|
|
- Check dimension constraints (width, height)
|
|
- Validate string inputs (sampler names, scheduler types)
|
|
|
|
3. **VRAM Limits**
|
|
- Check batch size against VRAM
|
|
- Validate resolution against VRAM
|
|
- Enable tiling for large images
|
|
|
|
4. **Input Validation**
|
|
- Verify required inputs are provided
|
|
- Check image formats and dimensions
|
|
- Validate prompt lengths
|
|
|
|
### Fallback Strategies
|
|
|
|
**Default values for missing inputs:**
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"text": "{{prompt | default('A beautiful landscape')}}",
|
|
"seed": "{{seed | default(42)}}",
|
|
"steps": "{{steps | default(20)}}"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Graceful degradation:**
|
|
- If refiner unavailable, skip refinement step
|
|
- If upscaler fails, return base resolution
|
|
- If face enhancement errors, return unenhanced image
|
|
|
|
## VRAM Optimization
|
|
|
|
### Model Unloading
|
|
|
|
**Explicit model cleanup between stages:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"model": ["checkpoint_loader", 0]
|
|
},
|
|
"class_type": "FreeModel",
|
|
"title": "Unload Base Model"
|
|
}
|
|
```
|
|
|
|
**When to unload:**
|
|
- After base generation, before refinement
|
|
- After refinement, before upscaling
|
|
- Between different model types (diffusion → CLIP → VAE)
|
|
|
|
### VAE Tiling
|
|
|
|
**Enable for high-resolution processing:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"samples": ["sampler", 0],
|
|
"vae": ["vae_loader", 0],
|
|
"tile_size": 512,
|
|
"overlap": 64
|
|
},
|
|
"class_type": "VAEDecodeTiled",
|
|
"title": "Tiled VAE Decode"
|
|
}
|
|
```
|
|
|
|
**Tiling thresholds:**
|
|
- Use tiled VAE for images >1024x1024
|
|
- Tile size: 512 for 24GB VRAM, 256 for lower
|
|
- Overlap: 64px minimum for seamless tiles
|
|
|
|
### Attention Slicing
|
|
|
|
**Reduce memory for large models:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"model": ["checkpoint_loader", 0],
|
|
"attention_mode": "sliced"
|
|
},
|
|
"class_type": "ModelOptimization",
|
|
"title": "Enable Attention Slicing"
|
|
}
|
|
```
|
|
|
|
### Batch Processing
|
|
|
|
**VRAM-safe batch sizes:**
|
|
- FLUX models: batch_size=1
|
|
- SDXL: batch_size=1-2
|
|
- SD3.5: batch_size=1
|
|
- Upscaling: batch_size=1
|
|
|
|
**Sequential batching:**
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"mode": "sequential",
|
|
"batch_size": 1
|
|
},
|
|
"class_type": "BatchProcessor"
|
|
}
|
|
```
|
|
|
|
## Quality Assurance
|
|
|
|
### Preview Nodes
|
|
|
|
**Include preview at key stages:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"images": ["vae_decode", 0]
|
|
},
|
|
"class_type": "PreviewImage",
|
|
"title": "Preview Base Generation"
|
|
}
|
|
```
|
|
|
|
**Preview locations:**
|
|
- After base generation (before refinement)
|
|
- After refinement (before upscaling)
|
|
- After upscaling (final check)
|
|
- After face enhancement
|
|
|
|
### Quality Gates
|
|
|
|
**Checkpoints for validation:**
|
|
|
|
1. **Resolution Check**
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"image": ["input", 0],
|
|
"min_width": 512,
|
|
"min_height": 512,
|
|
"max_width": 2048,
|
|
"max_height": 2048
|
|
},
|
|
"class_type": "ImageSizeValidator"
|
|
}
|
|
```
|
|
|
|
2. **Quality Metrics**
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"image": ["vae_decode", 0],
|
|
"min_quality_score": 0.7
|
|
},
|
|
"class_type": "QualityChecker"
|
|
}
|
|
```
|
|
|
|
### Save Points
|
|
|
|
**Save intermediate results:**
|
|
|
|
```json
|
|
{
|
|
"inputs": {
|
|
"images": ["base_generation", 0],
|
|
"filename_prefix": "intermediate/base_"
|
|
},
|
|
"class_type": "SaveImage",
|
|
"title": "Save Base Generation"
|
|
}
|
|
```
|
|
|
|
**When to save:**
|
|
- Base generation (before refinement)
|
|
- After each major processing stage
|
|
- Before potentially destructive operations
|
|
|
|
## Documentation Requirements
|
|
|
|
### Workflow Metadata
|
|
|
|
**Include in workflow JSON:**
|
|
|
|
```json
|
|
{
|
|
"workflow_info": {
|
|
"name": "FLUX Schnell Text-to-Image Production",
|
|
"version": "1.0.0",
|
|
"author": "RunPod AI Model Orchestrator",
|
|
"description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
|
|
"category": "text-to-image",
|
|
"tags": ["flux", "fast", "production"],
|
|
"requirements": {
|
|
"models": ["FLUX.1-schnell"],
|
|
"custom_nodes": [],
|
|
"vram_min": "16GB",
|
|
"vram_recommended": "24GB"
|
|
},
|
|
"parameters": {
|
|
"prompt": {
|
|
"type": "string",
|
|
"required": true,
|
|
"description": "Text description of desired image"
|
|
},
|
|
"seed": {
|
|
"type": "integer",
|
|
"required": false,
|
|
"default": 42,
|
|
"min": 0,
|
|
"max": 4294967295
|
|
},
|
|
"steps": {
|
|
"type": "integer",
|
|
"required": false,
|
|
"default": 4,
|
|
"min": 1,
|
|
"max": 20
|
|
}
|
|
},
|
|
"outputs": {
|
|
"image": {
|
|
"type": "image",
|
|
"format": "PNG",
|
|
"resolution": "1024x1024"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Node Comments
|
|
|
|
**Document complex nodes:**
|
|
|
|
```json
|
|
{
|
|
"title": "FLUX KSampler - Main Generation",
|
|
"notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
|
|
"inputs": {
|
|
"seed": 42,
|
|
"steps": 4,
|
|
"cfg": 1.0
|
|
}
|
|
}
|
|
```
|
|
|
|
### Usage Examples
|
|
|
|
**Include in workflow or README:**
|
|
|
|
```markdown
|
|
## Example Usage
|
|
|
|
### ComfyUI Web Interface
|
|
1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
|
|
2. Set prompt: "A serene mountain landscape at sunset"
|
|
3. Adjust seed: 42 (optional)
|
|
4. Click "Queue Prompt"
|
|
|
|
### Orchestrator API
|
|
```bash
|
|
curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'
|
|
```
|
|
```
|
|
|
|
## Testing Guidelines
|
|
|
|
### Manual Testing
|
|
|
|
**Required tests before production:**
|
|
|
|
1. **UI Test**
|
|
- Load in ComfyUI web interface
|
|
- Execute with default parameters
|
|
- Verify output quality
|
|
- Check preview nodes
|
|
- Confirm save locations
|
|
|
|
2. **API Test**
|
|
- Call via orchestrator API
|
|
- Test with various parameter combinations
|
|
- Verify JSON response format
|
|
- Check error handling
|
|
|
|
3. **Edge Cases**
|
|
- Missing optional parameters
|
|
- Invalid parameter values
|
|
- Out-of-range inputs
|
|
- Missing models (graceful failure)
|
|
|
|
### Automated Testing
|
|
|
|
**Test script template:**
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Test workflow: flux-schnell-t2i-production-v1.json
|
|
|
|
WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"
|
|
|
|
# Test 1: Default parameters
|
|
curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
|
|
| jq '.status' # Should return "success"
|
|
|
|
# Test 2: Custom parameters
|
|
curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
|
|
| jq '.status'
|
|
|
|
# Test 3: Missing prompt (should use default)
|
|
curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
|
|
| jq '.status'
|
|
```
|
|
|
|
### Performance Testing
|
|
|
|
**Measure key metrics:**
|
|
|
|
```bash
|
|
# Generation time
|
|
time curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'
|
|
|
|
# VRAM usage
|
|
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1
|
|
|
|
# GPU utilization
|
|
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1
|
|
```
|
|
|
|
**Performance baselines (24GB VRAM):**
|
|
- FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
|
|
- FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
|
|
- SDXL + Refiner (1024x1024): ~40-60 seconds
|
|
- CogVideoX (6s video): ~120-180 seconds
|
|
|
|
### Load Testing
|
|
|
|
**Concurrent request handling:**
|
|
|
|
```bash
|
|
# Test 5 concurrent generations
|
|
for i in {1..5}; do
|
|
curl -X POST http://localhost:9000/api/comfyui/generate \
|
|
-d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
|
|
done
|
|
wait
|
|
```
|
|
|
|
## Version Control
|
|
|
|
### Semantic Versioning
|
|
|
|
**Version increments:**
|
|
- `v1` → `v2`: Major changes (different models, restructured workflow)
|
|
- Internal iterations: Keep same version, document changes in git commits
|
|
|
|
### Change Documentation
|
|
|
|
**Changelog format:**
|
|
|
|
```markdown
|
|
## flux-schnell-t2i-production-v2.json
|
|
|
|
### Changes from v1
|
|
- Added API input validation
|
|
- Optimized VRAM usage with model unloading
|
|
- Added preview node after generation
|
|
- Updated default steps from 4 to 6
|
|
|
|
### Breaking Changes
|
|
- Changed output node structure (requires orchestrator update)
|
|
|
|
### Migration Guide
|
|
- Update API calls to use new parameter names
|
|
- Clear ComfyUI cache before loading v2
|
|
```
|
|
|
|
### Deprecation Process
|
|
|
|
**Sunsetting old versions:**
|
|
|
|
1. Mark old version as deprecated in README
|
|
2. Keep deprecated version for 2 releases
|
|
3. Add deprecation warning in workflow metadata
|
|
4. Document migration path to new version
|
|
5. Archive deprecated workflows in `archive/` directory
|
|
|
|
## Best Practices
|
|
|
|
### DO
|
|
|
|
- Use descriptive node names
|
|
- Include preview nodes at key stages
|
|
- Validate all inputs
|
|
- Optimize for VRAM efficiency
|
|
- Document all parameters
|
|
- Test with both UI and API
|
|
- Version your workflows
|
|
- Include error handling
|
|
- Save intermediate results
|
|
- Use semantic naming
|
|
|
|
### DON'T
|
|
|
|
- Hardcode file paths
|
|
- Assume unlimited VRAM
|
|
- Skip input validation
|
|
- Omit documentation
|
|
- Create overly complex workflows
|
|
- Use experimental nodes in production
|
|
- Ignore VRAM optimization
|
|
- Skip testing edge cases
|
|
- Use unclear node names
|
|
- Forget to version
|
|
|
|
## Resources
|
|
|
|
- **ComfyUI Wiki**: https://github.com/comfyanonymous/ComfyUI/wiki
|
|
- **Custom Nodes List**: https://github.com/ltdrdata/ComfyUI-Manager
|
|
- **VRAM Optimization Guide**: `/workspace/ai/CLAUDE.md`
|
|
- **Model Documentation**: `/workspace/ai/COMFYUI_MODELS.md`
|
|
|
|
## Support
|
|
|
|
For questions or issues:
|
|
1. Review this standards document
|
|
2. Check ComfyUI logs: `supervisorctl tail -f comfyui`
|
|
3. Test workflow in UI before API
|
|
4. Validate JSON syntax
|
|
5. Check model availability
|