Files
awesome/WORKFLOWS.md

429 lines
9.5 KiB
Markdown
Raw Normal View History

2025-10-26 13:48:23 +01:00
# GitHub Actions Workflows
This document describes the automated workflows for building and managing the Awesome database.
## Overview
Two workflows automate database management:
1. **Build Database** - Creates a fresh database daily
2. **Cleanup Artifacts** - Removes old artifacts to save storage
## Build Database Workflow
**File:** `.github/workflows/build-database.yml`
### Schedule
- **Automatic:** Daily at 02:00 UTC
- **Manual:** Can be triggered via GitHub Actions UI or CLI
### Features
#### Automatic Daily Builds
- Fetches [sindresorhus/awesome](https://github.com/sindresorhus/awesome)
- Recursively indexes all awesome lists
- Collects GitHub metadata (stars, forks, last commit)
- Generates full-text search index
- Compresses and uploads as artifact
#### Build Modes
**Full Mode** (default):
- Indexes all awesome lists
- Takes ~2-3 hours
- Produces comprehensive database
**Sample Mode**:
- Indexes random sample of 10 lists
- Takes ~5-10 minutes
- Good for testing
#### GitHub Token Integration
- Uses `GITHUB_TOKEN` secret for API access
- Provides 5,000 requests/hour (vs 60 without auth)
- Automatically configured during build
### Manual Triggering
#### Via GitHub CLI
```bash
# Trigger full build
gh workflow run build-database.yml -f index_mode=full
# Trigger sample build (for testing)
gh workflow run build-database.yml -f index_mode=sample
# Check workflow status
gh run list --workflow=build-database.yml
# View specific run
gh run view <run-id>
```
#### Via GitHub UI
1. Go to repository → Actions tab
2. Select "Build Awesome Database" workflow
3. Click "Run workflow" button
4. Choose index mode (full/sample)
5. Click "Run workflow"
### Outputs
#### Artifacts Uploaded
- `awesome-{timestamp}.db` - Timestamped database file
- `awesome-latest.db` - Always points to newest build
- `metadata.json` - Build information
**Artifact Naming:** `awesome-database-{run_id}`
**Retention:** 90 days
#### Metadata Structure
```json
{
"build_date": "2025-10-26 02:15:43 UTC",
"build_timestamp": 1730000143,
"git_sha": "abc123...",
"workflow_run_id": "12345678",
"total_lists": 450,
"total_repos": 15000,
"total_readmes": 12500,
"size_mb": 156.42,
"node_version": "v22.0.0",
"index_mode": "full"
}
```
#### Build Summary
Each run generates a summary with:
- Statistics (lists, repos, READMEs, size)
- Build timing information
- Download instructions
- Direct artifact link
### Monitoring
#### Check Recent Runs
```bash
# List last 10 runs
gh run list --workflow=build-database.yml --limit 10
# Show only failed runs
gh run list --workflow=build-database.yml --status failure
# Watch current run
gh run watch
```
#### View Build Logs
```bash
# Show logs for specific run
gh run view <run-id> --log
# Show only failed steps
gh run view <run-id> --log-failed
```
## Cleanup Artifacts Workflow
**File:** `.github/workflows/cleanup-artifacts.yml`
### Schedule
- **Automatic:** Daily at 03:00 UTC (after database build)
- **Manual:** Can be triggered with custom settings
### Features
#### Automatic Cleanup
- Removes artifacts older than 30 days (default)
- Cleans up old workflow runs (>30 days, keeping last 50)
- Generates detailed cleanup report
- Dry-run mode available
#### Configurable Retention
- Default: 30 days
- Can be customized per run
- Artifacts within retention period are preserved
### Manual Triggering
#### Via GitHub CLI
```bash
# Standard cleanup (30 days)
gh workflow run cleanup-artifacts.yml
# Custom retention period (60 days)
gh workflow run cleanup-artifacts.yml -f retention_days=60
# Dry run (preview only, no deletions)
gh workflow run cleanup-artifacts.yml -f dry_run=true -f retention_days=30
# Aggressive cleanup (7 days)
gh workflow run cleanup-artifacts.yml -f retention_days=7
```
#### Via GitHub UI
1. Go to repository → Actions tab
2. Select "Cleanup Old Artifacts" workflow
3. Click "Run workflow" button
4. Configure options:
- **retention_days**: Days to keep (default: 30)
- **dry_run**: Preview mode (default: false)
5. Click "Run workflow"
### Cleanup Report
Each run generates a detailed report showing:
#### Summary Statistics
- Total artifacts scanned
- Number deleted
- Number kept
- Storage space freed (MB)
#### Deleted Artifacts Table
- Artifact name
- Size
- Creation date
- Age (in days)
#### Kept Artifacts Table
- Recently created artifacts
- Artifacts within retention period
- Limited to first 10 for brevity
### Storage Management
#### Checking Storage Usage
```bash
# List all artifacts with sizes
gh api repos/:owner/:repo/actions/artifacts \
| jq -r '.artifacts[] | "\(.name) - \(.size_in_bytes / 1024 / 1024 | floor)MB - \(.created_at)"'
# Calculate total storage
gh api repos/:owner/:repo/actions/artifacts \
| jq '[.artifacts[].size_in_bytes] | add / 1024 / 1024 | floor'
```
#### Retention Strategy
**Recommended settings:**
- **Production:** 30-60 days retention
- **Development:** 14-30 days retention
- **Testing:** 7-14 days retention
**Storage limits:**
- Free GitHub: Limited artifact storage
- GitHub Pro: More generous limits
- GitHub Team/Enterprise: Higher limits
## Downloading Databases
### Method 1: Interactive Script (Recommended)
```bash
./scripts/download-db.sh
```
**Features:**
- Lists all available builds
- Shows metadata (date, size, commit)
- Interactive selection
- Automatic backup of existing database
- Progress indication
**Usage:**
```bash
# Interactive mode
./scripts/download-db.sh
# Specify repository
./scripts/download-db.sh --repo owner/awesome
# Download latest automatically
./scripts/download-db.sh --repo owner/awesome --latest
```
### Method 2: GitHub CLI Direct
```bash
# List available artifacts
gh api repos/OWNER/REPO/actions/artifacts | jq -r '.artifacts[].name'
# Download specific run
gh run download <run-id> -n awesome-database-<run-id>
# Extract and install
mkdir -p ~/.awesome
cp awesome-*.db ~/.awesome/awesome.db
```
### Method 3: GitHub API
```bash
# Get latest successful run
RUN_ID=$(gh api repos/OWNER/REPO/actions/workflows/build-database.yml/runs \
| jq -r '.workflow_runs[0].id')
# Download artifact
gh run download $RUN_ID -n awesome-database-$RUN_ID
```
## Troubleshooting
### Build Failures
**Problem:** Workflow fails during indexing
**Solutions:**
1. Check API rate limits
2. Review build logs: `gh run view <run-id> --log-failed`
3. Try sample mode for testing
4. Check GitHub status page
**Common Issues:**
- GitHub API rate limiting
- Network timeouts
- Invalid awesome list URLs
### Download Issues
**Problem:** Cannot download artifacts
**Solutions:**
1. Ensure GitHub CLI is authenticated: `gh auth status`
2. Check artifact exists: `gh run list --workflow=build-database.yml`
3. Verify artifact hasn't expired (90 days)
4. Try alternative download method
### Storage Issues
**Problem:** Running out of artifact storage
**Solutions:**
1. Reduce retention period: `gh workflow run cleanup-artifacts.yml -f retention_days=14`
2. Run manual cleanup: `gh workflow run cleanup-artifacts.yml`
3. Check current usage with GitHub API
4. Consider upgrading GitHub plan
### Permission Issues
**Problem:** Workflow lacks permissions
**Solutions:**
1. Verify `GITHUB_TOKEN` has required scopes
2. Check workflow permissions in `.yml` file
3. Review repository settings → Actions → General
## Best Practices
### For Maintainers
1. **Monitor Build Success Rate**
- Set up notifications for failed builds
- Review logs regularly
- Keep dependencies updated
2. **Optimize Build Times**
- Use sample mode for development
- Cache dependencies when possible
- Monitor for slow API responses
3. **Manage Storage**
- Run cleanups regularly
- Adjust retention based on usage
- Archive important builds
4. **Documentation**
- Keep artifact metadata updated
- Document any custom configurations
- Update README with changes
### For Users
1. **Download Strategy**
- Use latest builds for current data
- Check metadata before downloading
- Keep local backup of preferred versions
2. **Update Frequency**
- Daily builds provide fresh data
- Weekly downloads usually sufficient
- On-demand for specific needs
3. **Storage Management**
- Clean old local databases
- Use compression for backups
- Verify database integrity after download
## Advanced Usage
### Custom Build Scripts
You can create custom workflows based on the provided templates:
```yaml
# Example: Weekly comprehensive build
name: Weekly Full Index
on:
schedule:
- cron: '0 0 * * 0' # Sundays at midnight
workflow_dispatch:
jobs:
build:
uses: ./.github/workflows/build-database.yml
with:
index_mode: full
```
### Notification Integration
Add notifications to workflow:
```yaml
- name: Notify on completion
if: always()
run: |
# Send to Slack, Discord, email, etc.
curl -X POST $WEBHOOK_URL -d "Build completed: ${{ job.status }}"
```
### Multi-Platform Builds
Extend workflow for different platforms:
```yaml
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node-version: [22, 20, 18]
```
## Resources
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
- [GitHub CLI Manual](https://cli.github.com/manual/)
- [Artifact Storage Limits](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions)
- [Workflow Syntax](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions)
## Support
For issues or questions:
1. Check this documentation
2. Review workflow logs
3. Open an issue in the repository
4. Consult GitHub Actions documentation