Move comfyui and vllm out of models/ directory to top level for better organization. Replace all hardcoded /workspace paths with relative paths to make the configuration portable across different environments. Changes: - Move models/comfyui/ → comfyui/ - Move models/vllm/ → vllm/ - Remove models/ directory (empty) - Update arty.yml: replace /workspace with environment variables - Update supervisord.conf: use relative paths from /workspace/ai - Update all script references to use new paths - Maintain TQDM_DISABLE=1 to fix BrokenPipeError 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
ComfyUI Workflow Development Standards
Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.
Table of Contents
- Naming Conventions
- Workflow Structure
- API Integration
- Error Handling
- VRAM Optimization
- Quality Assurance
- Documentation Requirements
- Testing Guidelines
Naming Conventions
Workflow Files
Format: {category}-{model}-{type}-{environment}-v{version}.json
Components:
category: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)model: Specific model variant (schnell, dev, small, medium, large)type: Operation type (t2i, i2i, i2v, t2m, upscale)environment:production(stable) orexperimental(testing)version: Semantic versioning (1, 2, 3, etc.)
Examples:
flux-schnell-t2i-production-v1.json- FLUX Schnell text-to-image, production version 1sdxl-refiner-t2i-production-v2.json- SDXL with refiner, production version 2musicgen-large-t2m-experimental-v1.json- MusicGen large, experimental version 1
Node Naming
Descriptive names for all nodes:
{
"title": "FLUX Schnell Checkpoint Loader",
"type": "CheckpointLoaderSimple",
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
}
}
Naming patterns:
- Loaders:
{Model} Checkpoint Loader,{Model} VAE Loader - Samplers:
{Model} KSampler,{Model} Advanced Sampler - Inputs:
API Text Input,API Image Input,API Seed Input - Outputs:
API Image Output,Preview Output,Save Output - Processing:
VAE Encode,VAE Decode,CLIP Text Encode
Workflow Structure
Required Node Groups
Every production workflow MUST include these node groups:
1. Input Group
Purpose: Receive parameters from API or UI
Nodes:
- Text input nodes (prompts, negative prompts)
- Numeric input nodes (seed, steps, CFG scale)
- Image input nodes (for i2i, i2v workflows)
- Model selection nodes (if multiple models supported)
2. Model Loading Group
Purpose: Load required models and components
Nodes:
- Checkpoint/Diffuser loaders
- VAE loaders
- CLIP text encoders
- ControlNet loaders (if applicable)
- IP-Adapter loaders (if applicable)
3. Processing Group
Purpose: Main generation/transformation logic
Nodes:
- Samplers (KSampler, Advanced KSampler)
- Encoders (CLIP, VAE)
- Conditioning nodes
- ControlNet application (if applicable)
4. Post-Processing Group
Purpose: Refinement and enhancement
Nodes:
- VAE decoding
- Upscaling (if applicable)
- Face enhancement (Impact-Pack)
- Image adjustments
5. Output Group
Purpose: Save and return results
Nodes:
- SaveImage nodes (for file output)
- Preview nodes (for UI feedback)
- API output nodes (for orchestrator)
6. Error Handling Group (Optional but Recommended)
Purpose: Validation and fallback
Nodes:
- Validation nodes
- Fallback nodes
- Error logging nodes
Node Organization
Logical flow (left to right, top to bottom):
[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
↓
[Error Handling]
Visual grouping:
- Use node positions to create visual separation
- Group related nodes together
- Align nodes for readability
- Use consistent spacing
API Integration
Input Nodes
Required for API compatibility:
- Text Inputs (prompts, negative prompts)
{
"inputs": {
"text": "A beautiful sunset over mountains",
"default": ""
},
"class_type": "CLIPTextEncode",
"title": "API Prompt Input"
}
- Numeric Inputs (seed, steps, CFG, etc.)
{
"inputs": {
"seed": 42,
"steps": 20,
"cfg": 7.5,
"sampler_name": "euler_ancestral",
"scheduler": "normal"
},
"class_type": "KSampler",
"title": "API Sampler Config"
}
- Image Inputs (for i2i workflows)
{
"inputs": {
"image": "",
"upload": "image"
},
"class_type": "LoadImage",
"title": "API Image Input"
}
Output Nodes
Required for orchestrator return:
{
"inputs": {
"images": ["node_id", 0],
"filename_prefix": "ComfyUI"
},
"class_type": "SaveImage",
"title": "API Image Output"
}
Parameter Validation
Include validation for critical parameters:
{
"inputs": {
"value": "seed",
"min": 0,
"max": 4294967295,
"default": 42
},
"class_type": "IntegerInput",
"title": "Seed Validator"
}
Error Handling
Required Validations
-
Model Availability
- Check if checkpoint files exist
- Validate model paths
- Provide fallback to default models
-
Parameter Bounds
- Validate numeric ranges (seed, steps, CFG)
- Check dimension constraints (width, height)
- Validate string inputs (sampler names, scheduler types)
-
VRAM Limits
- Check batch size against VRAM
- Validate resolution against VRAM
- Enable tiling for large images
-
Input Validation
- Verify required inputs are provided
- Check image formats and dimensions
- Validate prompt lengths
Fallback Strategies
Default values for missing inputs:
{
"inputs": {
"text": "{{prompt | default('A beautiful landscape')}}",
"seed": "{{seed | default(42)}}",
"steps": "{{steps | default(20)}}"
}
}
Graceful degradation:
- If refiner unavailable, skip refinement step
- If upscaler fails, return base resolution
- If face enhancement errors, return unenhanced image
VRAM Optimization
Model Unloading
Explicit model cleanup between stages:
{
"inputs": {
"model": ["checkpoint_loader", 0]
},
"class_type": "FreeModel",
"title": "Unload Base Model"
}
When to unload:
- After base generation, before refinement
- After refinement, before upscaling
- Between different model types (diffusion → CLIP → VAE)
VAE Tiling
Enable for high-resolution processing:
{
"inputs": {
"samples": ["sampler", 0],
"vae": ["vae_loader", 0],
"tile_size": 512,
"overlap": 64
},
"class_type": "VAEDecodeTiled",
"title": "Tiled VAE Decode"
}
Tiling thresholds:
- Use tiled VAE for images >1024x1024
- Tile size: 512 for 24GB VRAM, 256 for lower
- Overlap: 64px minimum for seamless tiles
Attention Slicing
Reduce memory for large models:
{
"inputs": {
"model": ["checkpoint_loader", 0],
"attention_mode": "sliced"
},
"class_type": "ModelOptimization",
"title": "Enable Attention Slicing"
}
Batch Processing
VRAM-safe batch sizes:
- FLUX models: batch_size=1
- SDXL: batch_size=1-2
- SD3.5: batch_size=1
- Upscaling: batch_size=1
Sequential batching:
{
"inputs": {
"mode": "sequential",
"batch_size": 1
},
"class_type": "BatchProcessor"
}
Quality Assurance
Preview Nodes
Include preview at key stages:
{
"inputs": {
"images": ["vae_decode", 0]
},
"class_type": "PreviewImage",
"title": "Preview Base Generation"
}
Preview locations:
- After base generation (before refinement)
- After refinement (before upscaling)
- After upscaling (final check)
- After face enhancement
Quality Gates
Checkpoints for validation:
- Resolution Check
{
"inputs": {
"image": ["input", 0],
"min_width": 512,
"min_height": 512,
"max_width": 2048,
"max_height": 2048
},
"class_type": "ImageSizeValidator"
}
- Quality Metrics
{
"inputs": {
"image": ["vae_decode", 0],
"min_quality_score": 0.7
},
"class_type": "QualityChecker"
}
Save Points
Save intermediate results:
{
"inputs": {
"images": ["base_generation", 0],
"filename_prefix": "intermediate/base_"
},
"class_type": "SaveImage",
"title": "Save Base Generation"
}
When to save:
- Base generation (before refinement)
- After each major processing stage
- Before potentially destructive operations
Documentation Requirements
Workflow Metadata
Include in workflow JSON:
{
"workflow_info": {
"name": "FLUX Schnell Text-to-Image Production",
"version": "1.0.0",
"author": "RunPod AI Model Orchestrator",
"description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
"category": "text-to-image",
"tags": ["flux", "fast", "production"],
"requirements": {
"models": ["FLUX.1-schnell"],
"custom_nodes": [],
"vram_min": "16GB",
"vram_recommended": "24GB"
},
"parameters": {
"prompt": {
"type": "string",
"required": true,
"description": "Text description of desired image"
},
"seed": {
"type": "integer",
"required": false,
"default": 42,
"min": 0,
"max": 4294967295
},
"steps": {
"type": "integer",
"required": false,
"default": 4,
"min": 1,
"max": 20
}
},
"outputs": {
"image": {
"type": "image",
"format": "PNG",
"resolution": "1024x1024"
}
}
}
}
Node Comments
Document complex nodes:
{
"title": "FLUX KSampler - Main Generation",
"notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
"inputs": {
"seed": 42,
"steps": 4,
"cfg": 1.0
}
}
Usage Examples
Include in workflow or README:
## Example Usage
### ComfyUI Web Interface
1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
2. Set prompt: "A serene mountain landscape at sunset"
3. Adjust seed: 42 (optional)
4. Click "Queue Prompt"
### Orchestrator API
```bash
curl -X POST http://localhost:9000/api/comfyui/generate \
-d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'
## Testing Guidelines
### Manual Testing
**Required tests before production:**
1. **UI Test**
- Load in ComfyUI web interface
- Execute with default parameters
- Verify output quality
- Check preview nodes
- Confirm save locations
2. **API Test**
- Call via orchestrator API
- Test with various parameter combinations
- Verify JSON response format
- Check error handling
3. **Edge Cases**
- Missing optional parameters
- Invalid parameter values
- Out-of-range inputs
- Missing models (graceful failure)
### Automated Testing
**Test script template:**
```bash
#!/bin/bash
# Test workflow: flux-schnell-t2i-production-v1.json
WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"
# Test 1: Default parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
| jq '.status' # Should return "success"
# Test 2: Custom parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
| jq '.status'
# Test 3: Missing prompt (should use default)
curl -X POST http://localhost:9000/api/comfyui/generate \
-d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
| jq '.status'
Performance Testing
Measure key metrics:
# Generation time
time curl -X POST http://localhost:9000/api/comfyui/generate \
-d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'
# VRAM usage
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1
# GPU utilization
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1
Performance baselines (24GB VRAM):
- FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
- FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
- SDXL + Refiner (1024x1024): ~40-60 seconds
- CogVideoX (6s video): ~120-180 seconds
Load Testing
Concurrent request handling:
# Test 5 concurrent generations
for i in {1..5}; do
curl -X POST http://localhost:9000/api/comfyui/generate \
-d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
done
wait
Version Control
Semantic Versioning
Version increments:
v1→v2: Major changes (different models, restructured workflow)- Internal iterations: Keep same version, document changes in git commits
Change Documentation
Changelog format:
## flux-schnell-t2i-production-v2.json
### Changes from v1
- Added API input validation
- Optimized VRAM usage with model unloading
- Added preview node after generation
- Updated default steps from 4 to 6
### Breaking Changes
- Changed output node structure (requires orchestrator update)
### Migration Guide
- Update API calls to use new parameter names
- Clear ComfyUI cache before loading v2
Deprecation Process
Sunsetting old versions:
- Mark old version as deprecated in README
- Keep deprecated version for 2 releases
- Add deprecation warning in workflow metadata
- Document migration path to new version
- Archive deprecated workflows in
archive/directory
Best Practices
DO
- Use descriptive node names
- Include preview nodes at key stages
- Validate all inputs
- Optimize for VRAM efficiency
- Document all parameters
- Test with both UI and API
- Version your workflows
- Include error handling
- Save intermediate results
- Use semantic naming
DON'T
- Hardcode file paths
- Assume unlimited VRAM
- Skip input validation
- Omit documentation
- Create overly complex workflows
- Use experimental nodes in production
- Ignore VRAM optimization
- Skip testing edge cases
- Use unclear node names
- Forget to version
Resources
- ComfyUI Wiki: https://github.com/comfyanonymous/ComfyUI/wiki
- Custom Nodes List: https://github.com/ltdrdata/ComfyUI-Manager
- VRAM Optimization Guide:
/workspace/ai/CLAUDE.md - Model Documentation:
/workspace/ai/COMFYUI_MODELS.md
Support
For questions or issues:
- Review this standards document
- Check ComfyUI logs:
supervisorctl tail -f comfyui - Test workflow in UI before API
- Validate JSON syntax
- Check model availability