Files

Sebastian Krüger 897dcb175a refactor: reorganize directory structure and remove hardcoded paths

Move comfyui and vllm out of models/ directory to top level for better
organization. Replace all hardcoded /workspace paths with relative paths
to make the configuration portable across different environments.

Changes:
- Move models/comfyui/ → comfyui/
- Move models/vllm/ → vllm/
- Remove models/ directory (empty)
- Update arty.yml: replace /workspace with environment variables
- Update supervisord.conf: use relative paths from /workspace/ai
- Update all script references to use new paths
- Maintain TQDM_DISABLE=1 to fix BrokenPipeError

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-22 20:49:27 +01:00

14 KiB

Raw Blame History

ComfyUI Workflow Development Standards

Production standards and best practices for creating ComfyUI workflows in the RunPod AI Model Orchestrator.

Naming Conventions
Workflow Structure
API Integration
Error Handling
VRAM Optimization
Quality Assurance
Documentation Requirements
Testing Guidelines

Naming Conventions

Workflow Files

Format: {category}-{model}-{type}-{environment}-v{version}.json

Components:

category: Descriptive category (flux, sdxl, cogvideox, musicgen, etc.)
model: Specific model variant (schnell, dev, small, medium, large)
type: Operation type (t2i, i2i, i2v, t2m, upscale)
environment: production (stable) or experimental (testing)
version: Semantic versioning (1, 2, 3, etc.)

Examples:

flux-schnell-t2i-production-v1.json - FLUX Schnell text-to-image, production version 1
sdxl-refiner-t2i-production-v2.json - SDXL with refiner, production version 2
musicgen-large-t2m-experimental-v1.json - MusicGen large, experimental version 1

Node Naming

Descriptive names for all nodes:

{
  "title": "FLUX Schnell Checkpoint Loader",
  "type": "CheckpointLoaderSimple",
  "properties": {
    "Node name for S&R": "CheckpointLoaderSimple"
  }
}

Naming patterns:

Loaders: {Model} Checkpoint Loader, {Model} VAE Loader
Samplers: {Model} KSampler, {Model} Advanced Sampler
Inputs: API Text Input, API Image Input, API Seed Input
Outputs: API Image Output, Preview Output, Save Output
Processing: VAE Encode, VAE Decode, CLIP Text Encode

Workflow Structure

Required Node Groups

Every production workflow MUST include these node groups:

1. Input Group

Purpose: Receive parameters from API or UI
Nodes:
  - Text input nodes (prompts, negative prompts)
  - Numeric input nodes (seed, steps, CFG scale)
  - Image input nodes (for i2i, i2v workflows)
  - Model selection nodes (if multiple models supported)

2. Model Loading Group

Purpose: Load required models and components
Nodes:
  - Checkpoint/Diffuser loaders
  - VAE loaders
  - CLIP text encoders
  - ControlNet loaders (if applicable)
  - IP-Adapter loaders (if applicable)

3. Processing Group

Purpose: Main generation/transformation logic
Nodes:
  - Samplers (KSampler, Advanced KSampler)
  - Encoders (CLIP, VAE)
  - Conditioning nodes
  - ControlNet application (if applicable)

4. Post-Processing Group

Purpose: Refinement and enhancement
Nodes:
  - VAE decoding
  - Upscaling (if applicable)
  - Face enhancement (Impact-Pack)
  - Image adjustments

5. Output Group

Purpose: Save and return results
Nodes:
  - SaveImage nodes (for file output)
  - Preview nodes (for UI feedback)
  - API output nodes (for orchestrator)

6. Error Handling Group (Optional but Recommended)

Purpose: Validation and fallback
Nodes:
  - Validation nodes
  - Fallback nodes
  - Error logging nodes

Node Organization

Logical flow (left to right, top to bottom):

[Inputs] → [Model Loading] → [Processing] → [Post-Processing] → [Outputs]
                                    ↓
                            [Error Handling]

Visual grouping:

Use node positions to create visual separation
Group related nodes together
Align nodes for readability
Use consistent spacing

API Integration

Input Nodes

Required for API compatibility:

Text Inputs (prompts, negative prompts)

{
  "inputs": {
    "text": "A beautiful sunset over mountains",
    "default": ""
  },
  "class_type": "CLIPTextEncode",
  "title": "API Prompt Input"
}

Numeric Inputs (seed, steps, CFG, etc.)

{
  "inputs": {
    "seed": 42,
    "steps": 20,
    "cfg": 7.5,
    "sampler_name": "euler_ancestral",
    "scheduler": "normal"
  },
  "class_type": "KSampler",
  "title": "API Sampler Config"
}

Image Inputs (for i2i workflows)

{
  "inputs": {
    "image": "",
    "upload": "image"
  },
  "class_type": "LoadImage",
  "title": "API Image Input"
}

Output Nodes

Required for orchestrator return:

{
  "inputs": {
    "images": ["node_id", 0],
    "filename_prefix": "ComfyUI"
  },
  "class_type": "SaveImage",
  "title": "API Image Output"
}

Parameter Validation

Include validation for critical parameters:

{
  "inputs": {
    "value": "seed",
    "min": 0,
    "max": 4294967295,
    "default": 42
  },
  "class_type": "IntegerInput",
  "title": "Seed Validator"
}

Error Handling

Required Validations

Model Availability
- Check if checkpoint files exist
- Validate model paths
- Provide fallback to default models
Parameter Bounds
- Validate numeric ranges (seed, steps, CFG)
- Check dimension constraints (width, height)
- Validate string inputs (sampler names, scheduler types)
VRAM Limits
- Check batch size against VRAM
- Validate resolution against VRAM
- Enable tiling for large images
Input Validation
- Verify required inputs are provided
- Check image formats and dimensions
- Validate prompt lengths

Fallback Strategies

Default values for missing inputs:

{
  "inputs": {
    "text": "{{prompt | default('A beautiful landscape')}}",
    "seed": "{{seed | default(42)}}",
    "steps": "{{steps | default(20)}}"
  }
}

Graceful degradation:

If refiner unavailable, skip refinement step
If upscaler fails, return base resolution
If face enhancement errors, return unenhanced image

VRAM Optimization

Model Unloading

Explicit model cleanup between stages:

{
  "inputs": {
    "model": ["checkpoint_loader", 0]
  },
  "class_type": "FreeModel",
  "title": "Unload Base Model"
}

When to unload:

After base generation, before refinement
After refinement, before upscaling
Between different model types (diffusion → CLIP → VAE)

VAE Tiling

Enable for high-resolution processing:

{
  "inputs": {
    "samples": ["sampler", 0],
    "vae": ["vae_loader", 0],
    "tile_size": 512,
    "overlap": 64
  },
  "class_type": "VAEDecodeTiled",
  "title": "Tiled VAE Decode"
}

Tiling thresholds:

Use tiled VAE for images >1024x1024
Tile size: 512 for 24GB VRAM, 256 for lower
Overlap: 64px minimum for seamless tiles

Attention Slicing

Reduce memory for large models:

{
  "inputs": {
    "model": ["checkpoint_loader", 0],
    "attention_mode": "sliced"
  },
  "class_type": "ModelOptimization",
  "title": "Enable Attention Slicing"
}

Batch Processing

VRAM-safe batch sizes:

FLUX models: batch_size=1
SDXL: batch_size=1-2
SD3.5: batch_size=1
Upscaling: batch_size=1

Sequential batching:

{
  "inputs": {
    "mode": "sequential",
    "batch_size": 1
  },
  "class_type": "BatchProcessor"
}

Quality Assurance

Preview Nodes

Include preview at key stages:

{
  "inputs": {
    "images": ["vae_decode", 0]
  },
  "class_type": "PreviewImage",
  "title": "Preview Base Generation"
}

Preview locations:

After base generation (before refinement)
After refinement (before upscaling)
After upscaling (final check)
After face enhancement

Quality Gates

Checkpoints for validation:

Resolution Check

{
  "inputs": {
    "image": ["input", 0],
    "min_width": 512,
    "min_height": 512,
    "max_width": 2048,
    "max_height": 2048
  },
  "class_type": "ImageSizeValidator"
}

Quality Metrics

{
  "inputs": {
    "image": ["vae_decode", 0],
    "min_quality_score": 0.7
  },
  "class_type": "QualityChecker"
}

Save Points

Save intermediate results:

{
  "inputs": {
    "images": ["base_generation", 0],
    "filename_prefix": "intermediate/base_"
  },
  "class_type": "SaveImage",
  "title": "Save Base Generation"
}

When to save:

Base generation (before refinement)
After each major processing stage
Before potentially destructive operations

Documentation Requirements

Workflow Metadata

Include in workflow JSON:

{
  "workflow_info": {
    "name": "FLUX Schnell Text-to-Image Production",
    "version": "1.0.0",
    "author": "RunPod AI Model Orchestrator",
    "description": "Fast text-to-image generation using FLUX.1-schnell (4 steps)",
    "category": "text-to-image",
    "tags": ["flux", "fast", "production"],
    "requirements": {
      "models": ["FLUX.1-schnell"],
      "custom_nodes": [],
      "vram_min": "16GB",
      "vram_recommended": "24GB"
    },
    "parameters": {
      "prompt": {
        "type": "string",
        "required": true,
        "description": "Text description of desired image"
      },
      "seed": {
        "type": "integer",
        "required": false,
        "default": 42,
        "min": 0,
        "max": 4294967295
      },
      "steps": {
        "type": "integer",
        "required": false,
        "default": 4,
        "min": 1,
        "max": 20
      }
    },
    "outputs": {
      "image": {
        "type": "image",
        "format": "PNG",
        "resolution": "1024x1024"
      }
    }
  }
}

Node Comments

Document complex nodes:

{
  "title": "FLUX KSampler - Main Generation",
  "notes": "Using euler_ancestral sampler with 4 steps for FLUX Schnell. CFG=1.0 is optimal for this model. Seed controls reproducibility.",
  "inputs": {
    "seed": 42,
    "steps": 4,
    "cfg": 1.0
  }
}

Usage Examples

Include in workflow or README:

## Example Usage

### ComfyUI Web Interface
1. Load workflow: `text-to-image/flux-schnell-t2i-production-v1.json`
2. Set prompt: "A serene mountain landscape at sunset"
3. Adjust seed: 42 (optional)
4. Click "Queue Prompt"

### Orchestrator API
```bash
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "A serene mountain landscape"}}'


## Testing Guidelines

### Manual Testing

**Required tests before production:**

1. **UI Test**
   - Load in ComfyUI web interface
   - Execute with default parameters
   - Verify output quality
   - Check preview nodes
   - Confirm save locations

2. **API Test**
   - Call via orchestrator API
   - Test with various parameter combinations
   - Verify JSON response format
   - Check error handling

3. **Edge Cases**
   - Missing optional parameters
   - Invalid parameter values
   - Out-of-range inputs
   - Missing models (graceful failure)

### Automated Testing

**Test script template:**

```bash
#!/bin/bash
# Test workflow: flux-schnell-t2i-production-v1.json

WORKFLOW="text-to-image/flux-schnell-t2i-production-v1.json"

# Test 1: Default parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test image\"}}" \
  | jq '.status' # Should return "success"

# Test 2: Custom parameters
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {\"prompt\": \"test\", \"seed\": 123, \"steps\": 8}}" \
  | jq '.status'

# Test 3: Missing prompt (should use default)
curl -X POST http://localhost:9000/api/comfyui/generate \
  -d "{\"workflow\": \"$WORKFLOW\", \"inputs\": {}}" \
  | jq '.status'

Performance Testing

Measure key metrics:

# Generation time
time curl -X POST http://localhost:9000/api/comfyui/generate \
  -d '{"workflow": "flux-schnell-t2i-production-v1.json", "inputs": {"prompt": "benchmark"}}'

# VRAM usage
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -l 1

# GPU utilization
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -l 1

Performance baselines (24GB VRAM):

FLUX Schnell (1024x1024, 4 steps): ~5-8 seconds
FLUX Dev (1024x1024, 20 steps): ~25-35 seconds
SDXL + Refiner (1024x1024): ~40-60 seconds
CogVideoX (6s video): ~120-180 seconds

Load Testing

Concurrent request handling:

# Test 5 concurrent generations
for i in {1..5}; do
  curl -X POST http://localhost:9000/api/comfyui/generate \
    -d "{\"workflow\": \"flux-schnell-t2i-production-v1.json\", \"inputs\": {\"prompt\": \"test $i\", \"seed\": $i}}" &
done
wait

Version Control

Semantic Versioning

Version increments:

v1 → v2: Major changes (different models, restructured workflow)
Internal iterations: Keep same version, document changes in git commits

Change Documentation

Changelog format:

## flux-schnell-t2i-production-v2.json

### Changes from v1
- Added API input validation
- Optimized VRAM usage with model unloading
- Added preview node after generation
- Updated default steps from 4 to 6

### Breaking Changes
- Changed output node structure (requires orchestrator update)

### Migration Guide
- Update API calls to use new parameter names
- Clear ComfyUI cache before loading v2

Deprecation Process

Sunsetting old versions:

Mark old version as deprecated in README
Keep deprecated version for 2 releases
Add deprecation warning in workflow metadata
Document migration path to new version
Archive deprecated workflows in archive/ directory

Best Practices

DO

Use descriptive node names
Include preview nodes at key stages
Validate all inputs
Optimize for VRAM efficiency
Document all parameters
Test with both UI and API
Version your workflows
Include error handling
Save intermediate results
Use semantic naming

DON'T

Hardcode file paths
Assume unlimited VRAM
Skip input validation
Omit documentation
Create overly complex workflows
Use experimental nodes in production
Ignore VRAM optimization
Skip testing edge cases
Use unclear node names
Forget to version

Resources

ComfyUI Wiki: https://github.com/comfyanonymous/ComfyUI/wiki
Custom Nodes List: https://github.com/ltdrdata/ComfyUI-Manager
VRAM Optimization Guide: /workspace/ai/CLAUDE.md
Model Documentation: /workspace/ai/COMFYUI_MODELS.md

Support

For questions or issues:

Review this standards document
Check ComfyUI logs: supervisorctl tail -f comfyui
Test workflow in UI before API
Validate JSON syntax
Check model availability

14 KiB Raw Blame History