runpod/comfyui/workflows/WORKFLOW_STANDARDS.md

# ComfyUI Workflow Development Standards

Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.

## Table of Contents

- [Naming Conventions](#naming-conventions)
- [Workflow Structure](#workflow-structure)
- [API Integration](#api-integration)
- [Error Handling](#error-handling)
- [VRAM Optimization](#vram-optimization)
- [Quality Assurance](#quality-assurance)
- [Documentation Requirements](#documentation-requirements)
- [Testing Guidelines](#testing-guidelines)

## Naming Conventions

### Workflow Files

Format: `{category}-{model}-{type}-{environment}-v{version}.json`

**Components:**
- `category`: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)
- `model`: Specific model variant (schnell, dev, small, medium, large)
- `type`: Operation type (t2i, i2i, i2v, t2m, upscale)
- `environment`: `production` (stable) or `experimental` (testing)
- `version`: Semantic versioning (1, 2, 3, etc.)

**Examples:**
- `flux-schnell-t2i-production-v1.json` - FLUX Schnell text-to-image, production version 1
- `sdxl-refiner-t2i-production-v2.json` - SDXL with refiner, production version 2
- `musicgen-large-t2m-experimental-v1.json` - MusicGen large, experimental version 1

### Node Naming

**Descriptive names for all nodes:**
```json
{
  "title": "FLUX Schnell Checkpoint Loader",
  "type": "CheckpointLoaderSimple",
  "properties": {
    "Node name for S&R": "CheckpointLoaderSimple"
  }
}
```

**Naming patterns:**
- Loaders: `{Model} Checkpoint Loader`, `{Model} VAE Loader`
- Samplers: `{Model} KSampler`, `{Model} Advanced Sampler`
- Inputs: `API Text Input`, `API Image Input`, `API Seed Input`
- Outputs: `API Image Output`, `Preview Output`, `Save Output`
- Processing: `VAE Encode`, `VAE Decode`, `CLIP Text Encode`

## Workflow Structure

### Required Node Groups

Every production workflow MUST include these node groups:

#### 1. Input Group
```
Purpose: Receive parameters from API or UI
Nodes:
  - Text input nodes (prompts, negative prompts)
  - Numeric input nodes (seed, steps, CFG scale)
  - Image input nodes (for i2i, i2v workflows)
  - Model selection nodes (if multiple models supported)
```

#### 2. Model Loading Group
```
Purpose: Load required models and components
Nodes:
  - Checkpoint/Diffuser loaders
  - VAE loaders
  - CLIP text encoders
  - ControlNet loaders (if applicable)
  - IP-Adapter loaders (if applicable)
```

#### 3. Processing Group
```
Purpose: Main generation/transformation logic
Nodes:
  - Samplers (KSampler, Advanced KSampler)
  - Encoders (CLIP, VAE)
  - Conditioning nodes
  - ControlNet application (if applicable)
```

#### 4. Post-Processing Group
```
Purpose: Refinement and enhancement
Nodes:
  - VAE decoding
  - Upscaling (if applicable)
  - Face enhancement (Impact-Pack)
  - Image adjustments
```

#### 5. Output Group
```
Purpose: Save and return results
Nodes:
  - SaveImage nodes (for file output)
  - Preview nodes (for UI feedback)
  - API output nodes (for orchestrator)
```

#### 6. Error Handling Group (Optional but Recommended)
```
Purpose: Validation and fallback
Nodes:
  - Validation nodes
  - Fallback nodes
  - Error logging nodes
```

### Node Organization

**Logical flow (left to right, top to bottom):**
```
[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
                                    ↓
                            [Error Handling]
```

**Visual grouping:**
- Use node positions to create visual separation
- Group related nodes together
- Align nodes for readability
- Use consistent spacing

## API Integration

### Input Nodes

**Required for API compatibility:**

1. **Text Inputs** (prompts, negative prompts)
```json
{
  "inputs": {
    "text": "A beautiful sunset over mountains",
    "default": ""
  },
  "class_type": "CLIPTextEncode",
  "title": "API Prompt Input"
}
```

2. **Numeric Inputs** (seed, steps, CFG, etc.)
```json
{
  "inputs": {
    "seed": 42,
    "steps": 20,
    "cfg": 7.5,
    "sampler_name": "euler_ancestral",
    "scheduler": "normal"
  },
  "class_type": "KSampler",
  "title": "API Sampler Config"
}
```

3. **Image Inputs** (for i2i workflows)
```json
{
  "inputs": {
    "image": "",
    "upload": "image"
  },
  "class_type": "LoadImage",
  "title": "API Image Input"
}
```

### Output Nodes

**Required for orchestrator return:**

```json
{
  "inputs": {
    "images": ["node_id", 0],
    "filename_prefix": "ComfyUI"
  },
  "class_type": "SaveImage",
  "title": "API Image Output"
}
```

### Parameter Validation

**Include validation for critical parameters:**

```json
{
  "inputs": {
    "value": "seed",
    "min": 0,
    "max": 4294967295,
    "default": 42
  },
  "class_type": "IntegerInput",
  "title": "Seed Validator"
}
```

## Error Handling

### Required Validations

1. **Model Availability**
   - Check if checkpoint files exist
   - Validate model paths
   - Provide fallback to default models

2. **Parameter Bounds**
   - Validate numeric ranges (seed, steps, CFG)
   - Check dimension constraints (width, height)
   - Validate string inputs (sampler names, scheduler types)

3. **VRAM Limits**
   - Check batch size against VRAM
   - Validate resolution against VRAM
   - Enable tiling for large images

4. **Input Validation**
   - Verify required inputs are provided
   - Check image formats and dimensions
   - Validate prompt lengths

### Fallback Strategies

**Default values for missing inputs:**
```json
{
  "inputs": {
    "text": "{{prompt | default('A beautiful landscape')}}",
    "seed": "{{seed | default(42)}}",
    "steps": "{{steps | default(20)}}"
  }
}
```

**Graceful degradation:**
- If refiner unavailable, skip refinement step
- If upscaler fails, return base resolution
- If face enhancement errors, return unenhanced image

## VRAM Optimization

### Model Unloading

**Explicit model cleanup between stages:**

```json
{
  "inputs": {
    "model": ["checkpoint_loader", 0]
  },
  "class_type": "FreeModel",
  "title": "Unload Base Model"
}
```

**When to unload:**
- After base generation, before refinement
- After refinement, before upscaling
- Between different model types (diffusion → CLIP → VAE)

### VAE Tiling

**Enable for high-resolution processing:**

```json
{
  "inputs": {
    "samples": ["sampler", 0],
    "vae": ["vae_loader", 0],
    "tile_size": 512,
    "overlap": 64
  },
  "class_type": "VAEDecodeTiled",
  "title": "Tiled VAE Decode"
}
```

**Tiling thresholds:**
- Use tiled VAE for images >1024x1024
- Tile size: 512 for 24GB VRAM, 256 for lower
- Overlap: 64px minimum for seamless tiles

### Attention Slicing

**Reduce memory for large models:**

```json
{
  "inputs": {
    "model": ["checkpoint_loader", 0],
    "attention_mode": "sliced"
  },
  "class_type": "ModelOptimization",
  "title": "Enable Attention Slicing"
}
```

### Batch Processing

**VRAM-safe batch sizes:**
- FLUX models: batch_size=1
- SDXL: batch_size=1-2
- SD3.5: batch_size=1
- Upscaling: batch_size=1

**Sequential batching:**
```json
{
  "inputs": {
    "mode": "sequential",
    "batch_size": 1
  },
  "class_type": "BatchProcessor"
}
```

## Quality Assurance

### Preview Nodes

**Include preview at key stages:**

```json
{
  "inputs": {
    "images": ["vae_decode", 0]
  },
  "class_type": "PreviewImage",
  "title": "Preview Base Generation"
}
```

**Preview locations:**
- After base generation (before refinement)
- After refinement (before upscaling)
- After upscaling (final check)
- After face enhancement

### Quality Gates

**Checkpoints for validation:**

1. **Resolution Check**
```json
{
  "inputs": {
    "image": ["input", 0],
    "min_width": 512,
    "min_height": 512,
    "max_width": 2048,
    "max_height": 2048
  },
  "class_type": "ImageSizeValidator"
}
```

2. **Quality Metrics**
```json
{
  "inputs": {
    "image": ["vae_decode", 0],
    "min_quality_score": 0.7
  },
  "class_type": "QualityChecker"
}
```

### Save Points

**Save intermediate results:**

```json
{
  "inputs": {
    "images": ["base_generation", 0],
    "filename_prefix": "intermediate/base_"
  },
  "class_type": "SaveImage",
  "title": "Save Base Generation"
}
```

**When to save:**
- Base generation (before refinement)
- After each major processing stage
- Before potentially destructive operations

## Documentation Requirements

### Workflow Metadata

**Include in workflow JSON:**

```json
{
  "workflow_info": {
    "name": "FLUX Schnell Text-to-Image Production",
    "version": "1.0.0",
    "author": "RunPod AI Model Orchestrator",
    "description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
    "category": "text-to-image",
    "tags": ["flux", "fast", "production"],
    "requirements": {
      "models": ["FLUX.1-schnell"],
      "custom_nodes": [],
      "vram_min": "16GB",
      "vram_recommended": "24GB"
    },
    "parameters": {
      "prompt": {
        "type": "string",
        "required": true,
        "description": "Text description of desired image"
      },
      "seed": {
        "type": "integer",
        "required": false,
        "default": 42,
        "min": 0,
        "max": 4294967295
      },
      "steps": {
        "type": "integer",
        "required": false,
        "default": 4,
        "min": 1,
        "max": 20
      }
    },
    "outputs": {
      "image": {
        "type": "image",
        "format": "PNG",
        "resolution": "1024x1024"
      }
    }
  }
}
```

### Node Comments

**Document complex nodes:**

```json
{
  "title": "FLUX KSampler - Main Generation",
  "notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
  "inputs": {
    "seed": 42,
    "steps": 4,
    "cfg": 1.0
  }
}
```

### Usage Examples

**Include in workflow or README:**

```markdown
## Example Usage

### ComfyUI Web Interface
1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
2. Set prompt: "A serene mountain landscape at sunset"
3. Adjust seed: 42 (optional)
4. Click "Queue Prompt"

### Orchestrator API
```bash
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'
```
```

## Testing Guidelines

### Manual Testing

**Required tests before production:**

1. **UI Test**
   - Load in ComfyUI web interface
   - Execute with default parameters
   - Verify output quality
   - Check preview nodes
   - Confirm save locations

2. **API Test**
   - Call via orchestrator API
   - Test with various parameter combinations
   - Verify JSON response format
   - Check error handling

3. **Edge Cases**
   - Missing optional parameters
   - Invalid parameter values
   - Out-of-range inputs
   - Missing models (graceful failure)

### Automated Testing

**Test script template:**

```bash
#!/bin/bash
# Test workflow: flux-schnell-t2i-production-v1.json

WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"

# Test 1: Default parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
  | jq '.status' # Should return "success"

# Test 2: Custom parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
  | jq '.status'

# Test 3: Missing prompt (should use default)
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
  | jq '.status'
```

### Performance Testing

**Measure key metrics:**

```bash
# Generation time
time curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'

# VRAM usage
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1

# GPU utilization
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1
```

**Performance baselines (24GB VRAM):**
- FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
- FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
- SDXL + Refiner (1024x1024): ~40-60 seconds
- CogVideoX (6s video): ~120-180 seconds

### Load Testing

**Concurrent request handling:**

```bash
# Test 5 concurrent generations
for i in {1..5}; do
  curl -X POST http://localhost:9000/api/comfyui/generate \
    -d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
done
wait
```

## Version Control

### Semantic Versioning

**Version increments:**
- `v1` → `v2`: Major changes (different models, restructured workflow)
- Internal iterations: Keep same version, document changes in git commits

### Change Documentation

**Changelog format:**

```markdown
## flux-schnell-t2i-production-v2.json

### Changes from v1
- Added API input validation
- Optimized VRAM usage with model unloading
- Added preview node after generation
- Updated default steps from 4 to 6

### Breaking Changes
- Changed output node structure (requires orchestrator update)

### Migration Guide
- Update API calls to use new parameter names
- Clear ComfyUI cache before loading v2
```

### Deprecation Process

**Sunsetting old versions:**

1. Mark old version as deprecated in README
2. Keep deprecated version for 2 releases
3. Add deprecation warning in workflow metadata
4. Document migration path to new version
5. Archive deprecated workflows in `archive/` directory

## Best Practices

### DO

- Use descriptive node names
- Include preview nodes at key stages
- Validate all inputs
- Optimize for VRAM efficiency
- Document all parameters
- Test with both UI and API
- Version your workflows
- Include error handling
- Save intermediate results
- Use semantic naming

### DON'T

- Hardcode file paths
- Assume unlimited VRAM
- Skip input validation
- Omit documentation
- Create overly complex workflows
- Use experimental nodes in production
- Ignore VRAM optimization
- Skip testing edge cases
- Use unclear node names
- Forget to version

## Resources

- **ComfyUI Wiki**: https://github.com/comfyanonymous/ComfyUI/wiki
- **Custom Nodes List**: https://github.com/ltdrdata/ComfyUI-Manager
- **VRAM Optimization Guide**: `/workspace/ai/CLAUDE.md`
- **Model Documentation**: `/workspace/ai/COMFYUI_MODELS.md`

## Support

For questions or issues:
1. Review this standards document
2. Check ComfyUI logs: `supervisorctl tail -f comfyui`
3. Test workflow in UI before API
4. Validate JSON syntax
5. Check model availability