# Scrapy UI A modern, stylish web interface for managing and monitoring [Scrapyd](https://scrapyd.readthedocs.io/) instances. Built with Next.js 15, shadcn/ui, and Tailwind CSS 4. ## Features - **Real-time Monitoring** - Live dashboard with job statistics and system status - **Project Management** - Upload, list, and delete Scrapy projects and versions - **Spider Management** - Browse spiders and schedule jobs with custom arguments - **Job Control** - Monitor running/pending/finished jobs with filtering and cancellation - **System Status** - View Scrapyd daemon health and metrics - **Modern UI** - Clean, responsive design with dark/light theme support - **Secure** - Server-side authentication with environment variables - **Docker Ready** - Multi-stage builds for production deployment ## Tech Stack - **Next.js 15** (App Router, Server Components) - **React 19** with Server Actions - **TypeScript** for type safety - **Tailwind CSS 4** for styling - **shadcn/ui** for UI components - **TanStack Query** for data fetching - **Zod** for runtime validation - **Docker** for containerization ## Prerequisites - Node.js 22+ or Docker - A running Scrapyd instance - Basic auth credentials for Scrapyd ## Quick Start ### 1. Clone the repository ```bash git clone cd scrapy-ui ``` ### 2. Configure environment variables Copy `.env.example` to `.env.local` and update with your credentials: ```bash cp .env.example .env.local ``` Edit `.env.local`: ```env SCRAPYD_URL=https://scrapy.pivoine.art SCRAPYD_USERNAME=your_username SCRAPYD_PASSWORD=your_password ``` ### 3. Run locally #### Using pnpm: ```bash pnpm install pnpm dev ``` #### Using Docker Compose (development): ```bash docker-compose -f docker-compose.dev.yml up ``` #### Using Docker Compose (production): ```bash docker-compose up -d ``` Visit [http://localhost:3000/ui](http://localhost:3000/ui) ## Project Structure ``` scrapy-ui/ ├── app/ # Next.js App Router │ ├── (dashboard)/ # Dashboard route group │ │ ├── page.tsx # Dashboard (/) │ │ ├── projects/ # Projects management │ │ ├── spiders/ # Spiders listing & scheduling │ │ ├── jobs/ # Jobs monitoring & control │ │ └── system/ # System status │ ├── api/scrapyd/ # API routes (server-side) │ │ ├── daemon/ # GET /api/scrapyd/daemon │ │ ├── projects/ # GET/DELETE /api/scrapyd/projects │ │ ├── versions/ # GET/POST/DELETE /api/scrapyd/versions │ │ ├── spiders/ # GET /api/scrapyd/spiders │ │ └── jobs/ # GET/POST/DELETE /api/scrapyd/jobs │ └── layout.tsx # Root layout with theme provider ├── components/ # React components │ ├── ui/ # shadcn/ui components │ ├── sidebar.tsx # Navigation sidebar │ ├── header.tsx # Page header │ └── theme-toggle.tsx # Dark/light mode toggle ├── lib/ # Utilities & API client │ ├── scrapyd-client.ts # Scrapyd API wrapper │ ├── types.ts # TypeScript types & Zod schemas │ └── utils.ts # Helper functions ├── Dockerfile # Production build ├── Dockerfile.dev # Development build ├── docker-compose.yml # Production deployment └── docker-compose.dev.yml # Development deployment ``` ## API Endpoints All Scrapyd endpoints are proxied through Next.js API routes with server-side authentication: | Endpoint | Method | Description | |----------|--------|-------------| | `/api/scrapyd/daemon` | GET | Daemon status | | `/api/scrapyd/projects` | GET | List projects | | `/api/scrapyd/projects` | DELETE | Delete project | | `/api/scrapyd/versions` | GET | List versions | | `/api/scrapyd/versions` | POST | Upload version | | `/api/scrapyd/versions` | DELETE | Delete version | | `/api/scrapyd/spiders` | GET | List spiders | | `/api/scrapyd/jobs` | GET | List jobs | | `/api/scrapyd/jobs` | POST | Schedule job | | `/api/scrapyd/jobs` | DELETE | Cancel job | ## Deployment ### Docker #### Build production image: ```bash docker build -t scrapy-ui:latest . ``` #### Run container: ```bash docker run -d \ -p 3000:3000 \ -e SCRAPYD_URL=https://scrapy.pivoine.art \ -e SCRAPYD_USERNAME=your_username \ -e SCRAPYD_PASSWORD=your_password \ --name scrapy-ui \ scrapy-ui:latest ``` ### GitHub Actions The project includes a GitHub Actions workflow (`.github/workflows/docker-build.yml`) that automatically builds and pushes Docker images to GitHub Container Registry on push to `main` or on tagged releases. To use it: 1. Ensure GitHub Actions has write permissions to packages 2. Push code to trigger the workflow 3. Images will be available at `ghcr.io//scrapy-ui` ### Reverse Proxy (Nginx) To deploy under `/ui` path with Nginx: ```nginx location /ui/ { proxy_pass http://localhost:3000/ui/; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; } ``` ## Configuration ### Environment Variables | Variable | Description | Required | Default | |----------|-------------|----------|---------| | `SCRAPYD_URL` | Scrapyd base URL | Yes | `https://scrapy.pivoine.art` | | `SCRAPYD_USERNAME` | Basic auth username | Yes | - | | `SCRAPYD_PASSWORD` | Basic auth password | Yes | - | | `NODE_ENV` | Environment mode | No | `production` | ### Next.js Configuration The `next.config.ts` includes: - `basePath: "/ui"` - Serves app under `/ui` path - `output: "standalone"` - Optimized Docker builds - Optimized imports for `lucide-react` ## Development ### Install dependencies: ```bash pnpm install ``` ### Run development server: ```bash pnpm dev ``` ### Build for production: ```bash pnpm build pnpm start ``` ### Lint code: ```bash pnpm lint ``` ### Add shadcn/ui components: ```bash pnpm dlx shadcn@latest add ``` ## Features in Detail ### Dashboard - Real-time job statistics (running, pending, finished) - System health indicators - Quick project overview - Auto-refresh every 30 seconds ### Projects - List all Scrapy projects - Upload new versions (.egg files) - View version history - Delete projects/versions - Drag & drop file upload ### Spiders - Browse spiders by project - Schedule jobs with custom arguments - JSON argument validation - Quick schedule dialog ### Jobs - Filter by status (pending/running/finished) - Real-time status updates (5-second refresh) - Cancel running/pending jobs - View job logs and items - Detailed job information ### System - Daemon status monitoring - Job queue statistics - Environment information - Auto-refresh every 10 seconds ## Security - **Server-side authentication**: Credentials are stored in environment variables and never exposed to the client - **API proxy**: All Scrapyd requests go through Next.js API routes - **Basic auth**: Automatic injection of credentials in request headers - **No client-side secrets**: Zero credential exposure in browser ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT License - feel free to use this project for personal or commercial purposes. ## Acknowledgments - Built with [Next.js](https://nextjs.org/) - UI components from [shadcn/ui](https://ui.shadcn.com/) - Icons from [Lucide](https://lucide.dev/) - Designed for [Scrapyd](https://scrapyd.readthedocs.io/)