feat: github workflow
This commit is contained in:
428
WORKFLOWS.md
Normal file
428
WORKFLOWS.md
Normal file
@@ -0,0 +1,428 @@
|
||||
# GitHub Actions Workflows
|
||||
|
||||
This document describes the automated workflows for building and managing the Awesome database.
|
||||
|
||||
## Overview
|
||||
|
||||
Two workflows automate database management:
|
||||
|
||||
1. **Build Database** - Creates a fresh database daily
|
||||
2. **Cleanup Artifacts** - Removes old artifacts to save storage
|
||||
|
||||
## Build Database Workflow
|
||||
|
||||
**File:** `.github/workflows/build-database.yml`
|
||||
|
||||
### Schedule
|
||||
|
||||
- **Automatic:** Daily at 02:00 UTC
|
||||
- **Manual:** Can be triggered via GitHub Actions UI or CLI
|
||||
|
||||
### Features
|
||||
|
||||
#### Automatic Daily Builds
|
||||
- Fetches [sindresorhus/awesome](https://github.com/sindresorhus/awesome)
|
||||
- Recursively indexes all awesome lists
|
||||
- Collects GitHub metadata (stars, forks, last commit)
|
||||
- Generates full-text search index
|
||||
- Compresses and uploads as artifact
|
||||
|
||||
#### Build Modes
|
||||
|
||||
**Full Mode** (default):
|
||||
- Indexes all awesome lists
|
||||
- Takes ~2-3 hours
|
||||
- Produces comprehensive database
|
||||
|
||||
**Sample Mode**:
|
||||
- Indexes random sample of 10 lists
|
||||
- Takes ~5-10 minutes
|
||||
- Good for testing
|
||||
|
||||
#### GitHub Token Integration
|
||||
- Uses `GITHUB_TOKEN` secret for API access
|
||||
- Provides 5,000 requests/hour (vs 60 without auth)
|
||||
- Automatically configured during build
|
||||
|
||||
### Manual Triggering
|
||||
|
||||
#### Via GitHub CLI
|
||||
|
||||
```bash
|
||||
# Trigger full build
|
||||
gh workflow run build-database.yml -f index_mode=full
|
||||
|
||||
# Trigger sample build (for testing)
|
||||
gh workflow run build-database.yml -f index_mode=sample
|
||||
|
||||
# Check workflow status
|
||||
gh run list --workflow=build-database.yml
|
||||
|
||||
# View specific run
|
||||
gh run view <run-id>
|
||||
```
|
||||
|
||||
#### Via GitHub UI
|
||||
|
||||
1. Go to repository → Actions tab
|
||||
2. Select "Build Awesome Database" workflow
|
||||
3. Click "Run workflow" button
|
||||
4. Choose index mode (full/sample)
|
||||
5. Click "Run workflow"
|
||||
|
||||
### Outputs
|
||||
|
||||
#### Artifacts Uploaded
|
||||
|
||||
- `awesome-{timestamp}.db` - Timestamped database file
|
||||
- `awesome-latest.db` - Always points to newest build
|
||||
- `metadata.json` - Build information
|
||||
|
||||
**Artifact Naming:** `awesome-database-{run_id}`
|
||||
|
||||
**Retention:** 90 days
|
||||
|
||||
#### Metadata Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"build_date": "2025-10-26 02:15:43 UTC",
|
||||
"build_timestamp": 1730000143,
|
||||
"git_sha": "abc123...",
|
||||
"workflow_run_id": "12345678",
|
||||
"total_lists": 450,
|
||||
"total_repos": 15000,
|
||||
"total_readmes": 12500,
|
||||
"size_mb": 156.42,
|
||||
"node_version": "v22.0.0",
|
||||
"index_mode": "full"
|
||||
}
|
||||
```
|
||||
|
||||
#### Build Summary
|
||||
|
||||
Each run generates a summary with:
|
||||
- Statistics (lists, repos, READMEs, size)
|
||||
- Build timing information
|
||||
- Download instructions
|
||||
- Direct artifact link
|
||||
|
||||
### Monitoring
|
||||
|
||||
#### Check Recent Runs
|
||||
|
||||
```bash
|
||||
# List last 10 runs
|
||||
gh run list --workflow=build-database.yml --limit 10
|
||||
|
||||
# Show only failed runs
|
||||
gh run list --workflow=build-database.yml --status failure
|
||||
|
||||
# Watch current run
|
||||
gh run watch
|
||||
```
|
||||
|
||||
#### View Build Logs
|
||||
|
||||
```bash
|
||||
# Show logs for specific run
|
||||
gh run view <run-id> --log
|
||||
|
||||
# Show only failed steps
|
||||
gh run view <run-id> --log-failed
|
||||
```
|
||||
|
||||
## Cleanup Artifacts Workflow
|
||||
|
||||
**File:** `.github/workflows/cleanup-artifacts.yml`
|
||||
|
||||
### Schedule
|
||||
|
||||
- **Automatic:** Daily at 03:00 UTC (after database build)
|
||||
- **Manual:** Can be triggered with custom settings
|
||||
|
||||
### Features
|
||||
|
||||
#### Automatic Cleanup
|
||||
- Removes artifacts older than 30 days (default)
|
||||
- Cleans up old workflow runs (>30 days, keeping last 50)
|
||||
- Generates detailed cleanup report
|
||||
- Dry-run mode available
|
||||
|
||||
#### Configurable Retention
|
||||
- Default: 30 days
|
||||
- Can be customized per run
|
||||
- Artifacts within retention period are preserved
|
||||
|
||||
### Manual Triggering
|
||||
|
||||
#### Via GitHub CLI
|
||||
|
||||
```bash
|
||||
# Standard cleanup (30 days)
|
||||
gh workflow run cleanup-artifacts.yml
|
||||
|
||||
# Custom retention period (60 days)
|
||||
gh workflow run cleanup-artifacts.yml -f retention_days=60
|
||||
|
||||
# Dry run (preview only, no deletions)
|
||||
gh workflow run cleanup-artifacts.yml -f dry_run=true -f retention_days=30
|
||||
|
||||
# Aggressive cleanup (7 days)
|
||||
gh workflow run cleanup-artifacts.yml -f retention_days=7
|
||||
```
|
||||
|
||||
#### Via GitHub UI
|
||||
|
||||
1. Go to repository → Actions tab
|
||||
2. Select "Cleanup Old Artifacts" workflow
|
||||
3. Click "Run workflow" button
|
||||
4. Configure options:
|
||||
- **retention_days**: Days to keep (default: 30)
|
||||
- **dry_run**: Preview mode (default: false)
|
||||
5. Click "Run workflow"
|
||||
|
||||
### Cleanup Report
|
||||
|
||||
Each run generates a detailed report showing:
|
||||
|
||||
#### Summary Statistics
|
||||
- Total artifacts scanned
|
||||
- Number deleted
|
||||
- Number kept
|
||||
- Storage space freed (MB)
|
||||
|
||||
#### Deleted Artifacts Table
|
||||
- Artifact name
|
||||
- Size
|
||||
- Creation date
|
||||
- Age (in days)
|
||||
|
||||
#### Kept Artifacts Table
|
||||
- Recently created artifacts
|
||||
- Artifacts within retention period
|
||||
- Limited to first 10 for brevity
|
||||
|
||||
### Storage Management
|
||||
|
||||
#### Checking Storage Usage
|
||||
|
||||
```bash
|
||||
# List all artifacts with sizes
|
||||
gh api repos/:owner/:repo/actions/artifacts \
|
||||
| jq -r '.artifacts[] | "\(.name) - \(.size_in_bytes / 1024 / 1024 | floor)MB - \(.created_at)"'
|
||||
|
||||
# Calculate total storage
|
||||
gh api repos/:owner/:repo/actions/artifacts \
|
||||
| jq '[.artifacts[].size_in_bytes] | add / 1024 / 1024 | floor'
|
||||
```
|
||||
|
||||
#### Retention Strategy
|
||||
|
||||
**Recommended settings:**
|
||||
- **Production:** 30-60 days retention
|
||||
- **Development:** 14-30 days retention
|
||||
- **Testing:** 7-14 days retention
|
||||
|
||||
**Storage limits:**
|
||||
- Free GitHub: Limited artifact storage
|
||||
- GitHub Pro: More generous limits
|
||||
- GitHub Team/Enterprise: Higher limits
|
||||
|
||||
## Downloading Databases
|
||||
|
||||
### Method 1: Interactive Script (Recommended)
|
||||
|
||||
```bash
|
||||
./scripts/download-db.sh
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Lists all available builds
|
||||
- Shows metadata (date, size, commit)
|
||||
- Interactive selection
|
||||
- Automatic backup of existing database
|
||||
- Progress indication
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Interactive mode
|
||||
./scripts/download-db.sh
|
||||
|
||||
# Specify repository
|
||||
./scripts/download-db.sh --repo owner/awesome
|
||||
|
||||
# Download latest automatically
|
||||
./scripts/download-db.sh --repo owner/awesome --latest
|
||||
```
|
||||
|
||||
### Method 2: GitHub CLI Direct
|
||||
|
||||
```bash
|
||||
# List available artifacts
|
||||
gh api repos/OWNER/REPO/actions/artifacts | jq -r '.artifacts[].name'
|
||||
|
||||
# Download specific run
|
||||
gh run download <run-id> -n awesome-database-<run-id>
|
||||
|
||||
# Extract and install
|
||||
mkdir -p ~/.awesome
|
||||
cp awesome-*.db ~/.awesome/awesome.db
|
||||
```
|
||||
|
||||
### Method 3: GitHub API
|
||||
|
||||
```bash
|
||||
# Get latest successful run
|
||||
RUN_ID=$(gh api repos/OWNER/REPO/actions/workflows/build-database.yml/runs \
|
||||
| jq -r '.workflow_runs[0].id')
|
||||
|
||||
# Download artifact
|
||||
gh run download $RUN_ID -n awesome-database-$RUN_ID
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Build Failures
|
||||
|
||||
**Problem:** Workflow fails during indexing
|
||||
|
||||
**Solutions:**
|
||||
1. Check API rate limits
|
||||
2. Review build logs: `gh run view <run-id> --log-failed`
|
||||
3. Try sample mode for testing
|
||||
4. Check GitHub status page
|
||||
|
||||
**Common Issues:**
|
||||
- GitHub API rate limiting
|
||||
- Network timeouts
|
||||
- Invalid awesome list URLs
|
||||
|
||||
### Download Issues
|
||||
|
||||
**Problem:** Cannot download artifacts
|
||||
|
||||
**Solutions:**
|
||||
1. Ensure GitHub CLI is authenticated: `gh auth status`
|
||||
2. Check artifact exists: `gh run list --workflow=build-database.yml`
|
||||
3. Verify artifact hasn't expired (90 days)
|
||||
4. Try alternative download method
|
||||
|
||||
### Storage Issues
|
||||
|
||||
**Problem:** Running out of artifact storage
|
||||
|
||||
**Solutions:**
|
||||
1. Reduce retention period: `gh workflow run cleanup-artifacts.yml -f retention_days=14`
|
||||
2. Run manual cleanup: `gh workflow run cleanup-artifacts.yml`
|
||||
3. Check current usage with GitHub API
|
||||
4. Consider upgrading GitHub plan
|
||||
|
||||
### Permission Issues
|
||||
|
||||
**Problem:** Workflow lacks permissions
|
||||
|
||||
**Solutions:**
|
||||
1. Verify `GITHUB_TOKEN` has required scopes
|
||||
2. Check workflow permissions in `.yml` file
|
||||
3. Review repository settings → Actions → General
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Maintainers
|
||||
|
||||
1. **Monitor Build Success Rate**
|
||||
- Set up notifications for failed builds
|
||||
- Review logs regularly
|
||||
- Keep dependencies updated
|
||||
|
||||
2. **Optimize Build Times**
|
||||
- Use sample mode for development
|
||||
- Cache dependencies when possible
|
||||
- Monitor for slow API responses
|
||||
|
||||
3. **Manage Storage**
|
||||
- Run cleanups regularly
|
||||
- Adjust retention based on usage
|
||||
- Archive important builds
|
||||
|
||||
4. **Documentation**
|
||||
- Keep artifact metadata updated
|
||||
- Document any custom configurations
|
||||
- Update README with changes
|
||||
|
||||
### For Users
|
||||
|
||||
1. **Download Strategy**
|
||||
- Use latest builds for current data
|
||||
- Check metadata before downloading
|
||||
- Keep local backup of preferred versions
|
||||
|
||||
2. **Update Frequency**
|
||||
- Daily builds provide fresh data
|
||||
- Weekly downloads usually sufficient
|
||||
- On-demand for specific needs
|
||||
|
||||
3. **Storage Management**
|
||||
- Clean old local databases
|
||||
- Use compression for backups
|
||||
- Verify database integrity after download
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Build Scripts
|
||||
|
||||
You can create custom workflows based on the provided templates:
|
||||
|
||||
```yaml
|
||||
# Example: Weekly comprehensive build
|
||||
name: Weekly Full Index
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 0 * * 0' # Sundays at midnight
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
build:
|
||||
uses: ./.github/workflows/build-database.yml
|
||||
with:
|
||||
index_mode: full
|
||||
```
|
||||
|
||||
### Notification Integration
|
||||
|
||||
Add notifications to workflow:
|
||||
|
||||
```yaml
|
||||
- name: Notify on completion
|
||||
if: always()
|
||||
run: |
|
||||
# Send to Slack, Discord, email, etc.
|
||||
curl -X POST $WEBHOOK_URL -d "Build completed: ${{ job.status }}"
|
||||
```
|
||||
|
||||
### Multi-Platform Builds
|
||||
|
||||
Extend workflow for different platforms:
|
||||
|
||||
```yaml
|
||||
strategy:
|
||||
matrix:
|
||||
os: [ubuntu-latest, macos-latest, windows-latest]
|
||||
node-version: [22, 20, 18]
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
|
||||
- [GitHub CLI Manual](https://cli.github.com/manual/)
|
||||
- [Artifact Storage Limits](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions)
|
||||
- [Workflow Syntax](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions)
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
1. Check this documentation
|
||||
2. Review workflow logs
|
||||
3. Open an issue in the repository
|
||||
4. Consult GitHub Actions documentation
|
||||
Reference in New Issue
Block a user