Files
runpod/comfyui/workflows/WORKFLOW_STANDARDS.md
Sebastian Krüger 897dcb175a refactor: reorganize directory structure and remove hardcoded paths
Move comfyui and vllm out of models/ directory to top level for better
organization. Replace all hardcoded /workspace paths with relative paths
to make the configuration portable across different environments.

Changes:
- Move models/comfyui/ → comfyui/
- Move models/vllm/ → vllm/
- Remove models/ directory (empty)
- Update arty.yml: replace /workspace with environment variables
- Update supervisord.conf: use relative paths from /workspace/ai
- Update all script references to use new paths
- Maintain TQDM_DISABLE=1 to fix BrokenPipeError

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 20:49:27 +01:00

14 KiB

ComfyUI Workflow Development Standards

Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.

Table of Contents

Naming Conventions

Workflow Files

Format: {category}-{model}-{type}-{environment}-v{version}.json

Components:

  • category: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)
  • model: Specific model variant (schnell, dev, small, medium, large)
  • type: Operation type (t2i, i2i, i2v, t2m, upscale)
  • environment: production (stable) or experimental (testing)
  • version: Semantic versioning (1, 2, 3, etc.)

Examples:

  • flux-schnell-t2i-production-v1.json - FLUX Schnell text-to-image, production version 1
  • sdxl-refiner-t2i-production-v2.json - SDXL with refiner, production version 2
  • musicgen-large-t2m-experimental-v1.json - MusicGen large, experimental version 1

Node Naming

Descriptive names for all nodes:

{
  "title": "FLUX Schnell Checkpoint Loader",
  "type": "CheckpointLoaderSimple",
  "properties": {
    "Node name for S&R": "CheckpointLoaderSimple"
  }
}

Naming patterns:

  • Loaders: {Model} Checkpoint Loader, {Model} VAE Loader
  • Samplers: {Model} KSampler, {Model} Advanced Sampler
  • Inputs: API Text Input, API Image Input, API Seed Input
  • Outputs: API Image Output, Preview Output, Save Output
  • Processing: VAE Encode, VAE Decode, CLIP Text Encode

Workflow Structure

Required Node Groups

Every production workflow MUST include these node groups:

1. Input Group

Purpose: Receive parameters from API or UI
Nodes:
  - Text input nodes (prompts, negative prompts)
  - Numeric input nodes (seed, steps, CFG scale)
  - Image input nodes (for i2i, i2v workflows)
  - Model selection nodes (if multiple models supported)

2. Model Loading Group

Purpose: Load required models and components
Nodes:
  - Checkpoint/Diffuser loaders
  - VAE loaders
  - CLIP text encoders
  - ControlNet loaders (if applicable)
  - IP-Adapter loaders (if applicable)

3. Processing Group

Purpose: Main generation/transformation logic
Nodes:
  - Samplers (KSampler, Advanced KSampler)
  - Encoders (CLIP, VAE)
  - Conditioning nodes
  - ControlNet application (if applicable)

4. Post-Processing Group

Purpose: Refinement and enhancement
Nodes:
  - VAE decoding
  - Upscaling (if applicable)
  - Face enhancement (Impact-Pack)
  - Image adjustments

5. Output Group

Purpose: Save and return results
Nodes:
  - SaveImage nodes (for file output)
  - Preview nodes (for UI feedback)
  - API output nodes (for orchestrator)
Purpose: Validation and fallback
Nodes:
  - Validation nodes
  - Fallback nodes
  - Error logging nodes

Node Organization

Logical flow (left to right, top to bottom):

[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
                                    ↓
                            [Error Handling]

Visual grouping:

  • Use node positions to create visual separation
  • Group related nodes together
  • Align nodes for readability
  • Use consistent spacing

API Integration

Input Nodes

Required for API compatibility:

  1. Text Inputs (prompts, negative prompts)
{
  "inputs": {
    "text": "A beautiful sunset over mountains",
    "default": ""
  },
  "class_type": "CLIPTextEncode",
  "title": "API Prompt Input"
}
  1. Numeric Inputs (seed, steps, CFG, etc.)
{
  "inputs": {
    "seed": 42,
    "steps": 20,
    "cfg": 7.5,
    "sampler_name": "euler_ancestral",
    "scheduler": "normal"
  },
  "class_type": "KSampler",
  "title": "API Sampler Config"
}
  1. Image Inputs (for i2i workflows)
{
  "inputs": {
    "image": "",
    "upload": "image"
  },
  "class_type": "LoadImage",
  "title": "API Image Input"
}

Output Nodes

Required for orchestrator return:

{
  "inputs": {
    "images": ["node_id", 0],
    "filename_prefix": "ComfyUI"
  },
  "class_type": "SaveImage",
  "title": "API Image Output"
}

Parameter Validation

Include validation for critical parameters:

{
  "inputs": {
    "value": "seed",
    "min": 0,
    "max": 4294967295,
    "default": 42
  },
  "class_type": "IntegerInput",
  "title": "Seed Validator"
}

Error Handling

Required Validations

  1. Model Availability

    • Check if checkpoint files exist
    • Validate model paths
    • Provide fallback to default models
  2. Parameter Bounds

    • Validate numeric ranges (seed, steps, CFG)
    • Check dimension constraints (width, height)
    • Validate string inputs (sampler names, scheduler types)
  3. VRAM Limits

    • Check batch size against VRAM
    • Validate resolution against VRAM
    • Enable tiling for large images
  4. Input Validation

    • Verify required inputs are provided
    • Check image formats and dimensions
    • Validate prompt lengths

Fallback Strategies

Default values for missing inputs:

{
  "inputs": {
    "text": "{{prompt | default('A beautiful landscape')}}",
    "seed": "{{seed | default(42)}}",
    "steps": "{{steps | default(20)}}"
  }
}

Graceful degradation:

  • If refiner unavailable, skip refinement step
  • If upscaler fails, return base resolution
  • If face enhancement errors, return unenhanced image

VRAM Optimization

Model Unloading

Explicit model cleanup between stages:

{
  "inputs": {
    "model": ["checkpoint_loader", 0]
  },
  "class_type": "FreeModel",
  "title": "Unload Base Model"
}

When to unload:

  • After base generation, before refinement
  • After refinement, before upscaling
  • Between different model types (diffusion → CLIP → VAE)

VAE Tiling

Enable for high-resolution processing:

{
  "inputs": {
    "samples": ["sampler", 0],
    "vae": ["vae_loader", 0],
    "tile_size": 512,
    "overlap": 64
  },
  "class_type": "VAEDecodeTiled",
  "title": "Tiled VAE Decode"
}

Tiling thresholds:

  • Use tiled VAE for images >1024x1024
  • Tile size: 512 for 24GB VRAM, 256 for lower
  • Overlap: 64px minimum for seamless tiles

Attention Slicing

Reduce memory for large models:

{
  "inputs": {
    "model": ["checkpoint_loader", 0],
    "attention_mode": "sliced"
  },
  "class_type": "ModelOptimization",
  "title": "Enable Attention Slicing"
}

Batch Processing

VRAM-safe batch sizes:

  • FLUX models: batch_size=1
  • SDXL: batch_size=1-2
  • SD3.5: batch_size=1
  • Upscaling: batch_size=1

Sequential batching:

{
  "inputs": {
    "mode": "sequential",
    "batch_size": 1
  },
  "class_type": "BatchProcessor"
}

Quality Assurance

Preview Nodes

Include preview at key stages:

{
  "inputs": {
    "images": ["vae_decode", 0]
  },
  "class_type": "PreviewImage",
  "title": "Preview Base Generation"
}

Preview locations:

  • After base generation (before refinement)
  • After refinement (before upscaling)
  • After upscaling (final check)
  • After face enhancement

Quality Gates

Checkpoints for validation:

  1. Resolution Check
{
  "inputs": {
    "image": ["input", 0],
    "min_width": 512,
    "min_height": 512,
    "max_width": 2048,
    "max_height": 2048
  },
  "class_type": "ImageSizeValidator"
}
  1. Quality Metrics
{
  "inputs": {
    "image": ["vae_decode", 0],
    "min_quality_score": 0.7
  },
  "class_type": "QualityChecker"
}

Save Points

Save intermediate results:

{
  "inputs": {
    "images": ["base_generation", 0],
    "filename_prefix": "intermediate/base_"
  },
  "class_type": "SaveImage",
  "title": "Save Base Generation"
}

When to save:

  • Base generation (before refinement)
  • After each major processing stage
  • Before potentially destructive operations

Documentation Requirements

Workflow Metadata

Include in workflow JSON:

{
  "workflow_info": {
    "name": "FLUX Schnell Text-to-Image Production",
    "version": "1.0.0",
    "author": "RunPod AI Model Orchestrator",
    "description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
    "category": "text-to-image",
    "tags": ["flux", "fast", "production"],
    "requirements": {
      "models": ["FLUX.1-schnell"],
      "custom_nodes": [],
      "vram_min": "16GB",
      "vram_recommended": "24GB"
    },
    "parameters": {
      "prompt": {
        "type": "string",
        "required": true,
        "description": "Text description of desired image"
      },
      "seed": {
        "type": "integer",
        "required": false,
        "default": 42,
        "min": 0,
        "max": 4294967295
      },
      "steps": {
        "type": "integer",
        "required": false,
        "default": 4,
        "min": 1,
        "max": 20
      }
    },
    "outputs": {
      "image": {
        "type": "image",
        "format": "PNG",
        "resolution": "1024x1024"
      }
    }
  }
}

Node Comments

Document complex nodes:

{
  "title": "FLUX KSampler - Main Generation",
  "notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
  "inputs": {
    "seed": 42,
    "steps": 4,
    "cfg": 1.0
  }
}

Usage Examples

Include in workflow or README:

## Example Usage

### ComfyUI Web Interface
1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
2. Set prompt: "A serene mountain landscape at sunset"
3. Adjust seed: 42 (optional)
4. Click "Queue Prompt"

### Orchestrator API
```bash
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'

## Testing Guidelines

### Manual Testing

**Required tests before production:**

1. **UI Test**
   - Load in ComfyUI web interface
   - Execute with default parameters
   - Verify output quality
   - Check preview nodes
   - Confirm save locations

2. **API Test**
   - Call via orchestrator API
   - Test with various parameter combinations
   - Verify JSON response format
   - Check error handling

3. **Edge Cases**
   - Missing optional parameters
   - Invalid parameter values
   - Out-of-range inputs
   - Missing models (graceful failure)

### Automated Testing

**Test script template:**

```bash
#!/bin/bash
# Test workflow: flux-schnell-t2i-production-v1.json

WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"

# Test 1: Default parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
  | jq '.status' # Should return "success"

# Test 2: Custom parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
  | jq '.status'

# Test 3: Missing prompt (should use default)
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
  | jq '.status'

Performance Testing

Measure key metrics:

# Generation time
time curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'

# VRAM usage
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1

# GPU utilization
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1

Performance baselines (24GB VRAM):

  • FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
  • FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
  • SDXL + Refiner (1024x1024): ~40-60 seconds
  • CogVideoX (6s video): ~120-180 seconds

Load Testing

Concurrent request handling:

# Test 5 concurrent generations
for i in {1..5}; do
  curl -X POST http://localhost:9000/api/comfyui/generate \
    -d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
done
wait

Version Control

Semantic Versioning

Version increments:

  • v1v2: Major changes (different models, restructured workflow)
  • Internal iterations: Keep same version, document changes in git commits

Change Documentation

Changelog format:

## flux-schnell-t2i-production-v2.json

### Changes from v1
- Added API input validation
- Optimized VRAM usage with model unloading
- Added preview node after generation
- Updated default steps from 4 to 6

### Breaking Changes
- Changed output node structure (requires orchestrator update)

### Migration Guide
- Update API calls to use new parameter names
- Clear ComfyUI cache before loading v2

Deprecation Process

Sunsetting old versions:

  1. Mark old version as deprecated in README
  2. Keep deprecated version for 2 releases
  3. Add deprecation warning in workflow metadata
  4. Document migration path to new version
  5. Archive deprecated workflows in archive/ directory

Best Practices

DO

  • Use descriptive node names
  • Include preview nodes at key stages
  • Validate all inputs
  • Optimize for VRAM efficiency
  • Document all parameters
  • Test with both UI and API
  • Version your workflows
  • Include error handling
  • Save intermediate results
  • Use semantic naming

DON'T

  • Hardcode file paths
  • Assume unlimited VRAM
  • Skip input validation
  • Omit documentation
  • Create overly complex workflows
  • Use experimental nodes in production
  • Ignore VRAM optimization
  • Skip testing edge cases
  • Use unclear node names
  • Forget to version

Resources

Support

For questions or issues:

  1. Review this standards document
  2. Check ComfyUI logs: supervisorctl tail -f comfyui
  3. Test workflow in UI before API
  4. Validate JSON syntax
  5. Check model availability