9.5 KiB
GitHub Actions Workflows
This document describes the automated workflows for building and managing the Awesome database.
Overview
Two workflows automate database management:
- Build Database - Creates a fresh database daily
- Cleanup Artifacts - Removes old artifacts to save storage
Build Database Workflow
File: .github/workflows/build-database.yml
Schedule
- Automatic: Daily at 02:00 UTC
- Manual: Can be triggered via GitHub Actions UI or CLI
Features
Automatic Daily Builds
- Fetches sindresorhus/awesome
- Recursively indexes all awesome lists
- Collects GitHub metadata (stars, forks, last commit)
- Generates full-text search index
- Compresses and uploads as artifact
Build Modes
Full Mode (default):
- Indexes all awesome lists
- Takes ~2-3 hours
- Produces comprehensive database
Sample Mode:
- Indexes random sample of 10 lists
- Takes ~5-10 minutes
- Good for testing
GitHub Token Integration
- Uses
GITHUB_TOKENsecret for API access - Provides 5,000 requests/hour (vs 60 without auth)
- Automatically configured during build
Manual Triggering
Via GitHub CLI
# Trigger full build
gh workflow run build-database.yml -f index_mode=full
# Trigger sample build (for testing)
gh workflow run build-database.yml -f index_mode=sample
# Check workflow status
gh run list --workflow=build-database.yml
# View specific run
gh run view <run-id>
Via GitHub UI
- Go to repository → Actions tab
- Select "Build Awesome Database" workflow
- Click "Run workflow" button
- Choose index mode (full/sample)
- Click "Run workflow"
Outputs
Artifacts Uploaded
awesome-{timestamp}.db- Timestamped database fileawesome-latest.db- Always points to newest buildmetadata.json- Build information
Artifact Naming: awesome-database-{run_id}
Retention: 90 days
Metadata Structure
{
"build_date": "2025-10-26 02:15:43 UTC",
"build_timestamp": 1730000143,
"git_sha": "abc123...",
"workflow_run_id": "12345678",
"total_lists": 450,
"total_repos": 15000,
"total_readmes": 12500,
"size_mb": 156.42,
"node_version": "v22.0.0",
"index_mode": "full"
}
Build Summary
Each run generates a summary with:
- Statistics (lists, repos, READMEs, size)
- Build timing information
- Download instructions
- Direct artifact link
Monitoring
Check Recent Runs
# List last 10 runs
gh run list --workflow=build-database.yml --limit 10
# Show only failed runs
gh run list --workflow=build-database.yml --status failure
# Watch current run
gh run watch
View Build Logs
# Show logs for specific run
gh run view <run-id> --log
# Show only failed steps
gh run view <run-id> --log-failed
Cleanup Artifacts Workflow
File: .github/workflows/cleanup-artifacts.yml
Schedule
- Automatic: Daily at 03:00 UTC (after database build)
- Manual: Can be triggered with custom settings
Features
Automatic Cleanup
- Removes artifacts older than 30 days (default)
- Cleans up old workflow runs (>30 days, keeping last 50)
- Generates detailed cleanup report
- Dry-run mode available
Configurable Retention
- Default: 30 days
- Can be customized per run
- Artifacts within retention period are preserved
Manual Triggering
Via GitHub CLI
# Standard cleanup (30 days)
gh workflow run cleanup-artifacts.yml
# Custom retention period (60 days)
gh workflow run cleanup-artifacts.yml -f retention_days=60
# Dry run (preview only, no deletions)
gh workflow run cleanup-artifacts.yml -f dry_run=true -f retention_days=30
# Aggressive cleanup (7 days)
gh workflow run cleanup-artifacts.yml -f retention_days=7
Via GitHub UI
- Go to repository → Actions tab
- Select "Cleanup Old Artifacts" workflow
- Click "Run workflow" button
- Configure options:
- retention_days: Days to keep (default: 30)
- dry_run: Preview mode (default: false)
- Click "Run workflow"
Cleanup Report
Each run generates a detailed report showing:
Summary Statistics
- Total artifacts scanned
- Number deleted
- Number kept
- Storage space freed (MB)
Deleted Artifacts Table
- Artifact name
- Size
- Creation date
- Age (in days)
Kept Artifacts Table
- Recently created artifacts
- Artifacts within retention period
- Limited to first 10 for brevity
Storage Management
Checking Storage Usage
# List all artifacts with sizes
gh api repos/:owner/:repo/actions/artifacts \
| jq -r '.artifacts[] | "\(.name) - \(.size_in_bytes / 1024 / 1024 | floor)MB - \(.created_at)"'
# Calculate total storage
gh api repos/:owner/:repo/actions/artifacts \
| jq '[.artifacts[].size_in_bytes] | add / 1024 / 1024 | floor'
Retention Strategy
Recommended settings:
- Production: 30-60 days retention
- Development: 14-30 days retention
- Testing: 7-14 days retention
Storage limits:
- Free GitHub: Limited artifact storage
- GitHub Pro: More generous limits
- GitHub Team/Enterprise: Higher limits
Downloading Databases
Method 1: Interactive Script (Recommended)
./scripts/download-db.sh
Features:
- Lists all available builds
- Shows metadata (date, size, commit)
- Interactive selection
- Automatic backup of existing database
- Progress indication
Usage:
# Interactive mode
./scripts/download-db.sh
# Specify repository
./scripts/download-db.sh --repo owner/awesome
# Download latest automatically
./scripts/download-db.sh --repo owner/awesome --latest
Method 2: GitHub CLI Direct
# List available artifacts
gh api repos/OWNER/REPO/actions/artifacts | jq -r '.artifacts[].name'
# Download specific run
gh run download <run-id> -n awesome-database-<run-id>
# Extract and install
mkdir -p ~/.awesome
cp awesome-*.db ~/.awesome/awesome.db
Method 3: GitHub API
# Get latest successful run
RUN_ID=$(gh api repos/OWNER/REPO/actions/workflows/build-database.yml/runs \
| jq -r '.workflow_runs[0].id')
# Download artifact
gh run download $RUN_ID -n awesome-database-$RUN_ID
Troubleshooting
Build Failures
Problem: Workflow fails during indexing
Solutions:
- Check API rate limits
- Review build logs:
gh run view <run-id> --log-failed - Try sample mode for testing
- Check GitHub status page
Common Issues:
- GitHub API rate limiting
- Network timeouts
- Invalid awesome list URLs
Download Issues
Problem: Cannot download artifacts
Solutions:
- Ensure GitHub CLI is authenticated:
gh auth status - Check artifact exists:
gh run list --workflow=build-database.yml - Verify artifact hasn't expired (90 days)
- Try alternative download method
Storage Issues
Problem: Running out of artifact storage
Solutions:
- Reduce retention period:
gh workflow run cleanup-artifacts.yml -f retention_days=14 - Run manual cleanup:
gh workflow run cleanup-artifacts.yml - Check current usage with GitHub API
- Consider upgrading GitHub plan
Permission Issues
Problem: Workflow lacks permissions
Solutions:
- Verify
GITHUB_TOKENhas required scopes - Check workflow permissions in
.ymlfile - Review repository settings → Actions → General
Best Practices
For Maintainers
-
Monitor Build Success Rate
- Set up notifications for failed builds
- Review logs regularly
- Keep dependencies updated
-
Optimize Build Times
- Use sample mode for development
- Cache dependencies when possible
- Monitor for slow API responses
-
Manage Storage
- Run cleanups regularly
- Adjust retention based on usage
- Archive important builds
-
Documentation
- Keep artifact metadata updated
- Document any custom configurations
- Update README with changes
For Users
-
Download Strategy
- Use latest builds for current data
- Check metadata before downloading
- Keep local backup of preferred versions
-
Update Frequency
- Daily builds provide fresh data
- Weekly downloads usually sufficient
- On-demand for specific needs
-
Storage Management
- Clean old local databases
- Use compression for backups
- Verify database integrity after download
Advanced Usage
Custom Build Scripts
You can create custom workflows based on the provided templates:
# Example: Weekly comprehensive build
name: Weekly Full Index
on:
schedule:
- cron: '0 0 * * 0' # Sundays at midnight
workflow_dispatch:
jobs:
build:
uses: ./.github/workflows/build-database.yml
with:
index_mode: full
Notification Integration
Add notifications to workflow:
- name: Notify on completion
if: always()
run: |
# Send to Slack, Discord, email, etc.
curl -X POST $WEBHOOK_URL -d "Build completed: ${{ job.status }}"
Multi-Platform Builds
Extend workflow for different platforms:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node-version: [22, 20, 18]
Resources
Support
For issues or questions:
- Check this documentation
- Review workflow logs
- Open an issue in the repository
- Consult GitHub Actions documentation