A full-stack World Cup statistics web app covering every tournament from 1930 to 2026. Built with Next.js 16, TailwindCSS 4, GraphQL, and PostgreSQL. Historical data is scraped from English Wikipedia and committed to the repo; live 2026 results are synced from Wikipedia on a schedule so scores appear within minutes of the final whistle.
- **Group standings** — computed from match results for every tournament, with 0-row entries seeded so all groups appear even before any matches are played
- **Deep-linked pages** — every tournament, team, and player has a permanent URL (`/tournaments/1966`, `/teams/brazil`, `/players/Pelé`) with server-side metadata for SEO
pnpm scrape # all years (1930–2022), matches + squads
pnpm scrape 2002# single year
pnpm scrape 2002 --matches # matches, meta, stadiums, groups only
pnpm scrape 2002 --squads # squads only
```
Fetches structured match data from English Wikipedia using the [MediaWiki parse API](https://en.wikipedia.org/w/api.php) and writes JSON files to `data/{year}/`. These files are **committed to git** so the production build never needs to hit Wikipedia for historical data.
Each year produces up to five files:
| File | Content |
|---|---|
| `worldcup.json` | Matches with scores (FT/HT/ET/P) and goal-scorer events |
| `worldcup.stadiums.json` | Stadium names and cities |
| `worldcup.groups.json` | Group compositions (teams per group) |
| `worldcup.squads.json` | Player rosters (where available on Wikipedia) |
The scraper has built-in rate-limit handling: it detects Wikipedia's plain-text `"You are making too many requests"` response, waits 30 seconds, and retries with exponential back-off (up to 6 attempts, 15 s × attempt delay between retries). Group sub-pages are fetched with a 3-second delay between requests.
DATABASE_URL="..." pnpm seed --force # drop and re-seed from scratch
```
Reads the committed `data/{year}/` JSON files and loads them into the database. Also creates all tables (if they do not exist). Intended for first-time setup and for re-seeding after schema changes. Covers **1930–2022 only** — 2026 data is handled by sync.
Seed is **idempotent** and skips silently if data is already present (unless `--force` is passed).
### 3. Sync — scheduled live updates (2026 only)
```bash
DATABASE_URL="..." pnpm sync # normal run
DATABASE_URL="..." pnpm sync --force # clear and re-fetch all 2026 data
Fetches the current state of the 2026 Wikipedia pages and upserts everything into the database. Historical years (1930–2022) are not touched — they come from the committed JSON files via seed.
What sync does on each run:
1. Fetches `2026_FIFA_World_Cup` via the MediaWiki API
2. Determines which groups are fully complete (all matches have FT scores) and skips their sub-pages to save requests
3. Upserts matches, scores, and goal events
4. Fetches `2026_FIFA_World_Cup_squads` and upserts squad rosters
5. Recomputes group standings from match results
6. Seeds 0-row standing entries for groups with no played matches yet (so all groups appear in the UI)
Coolify builds the Docker image via `docker compose up` and attaches the container to the Traefik network automatically. TLS certificates are issued by the `resolver` cert resolver configured in Traefik.
**Live match detection** — A match is considered live when its date equals today and the current time falls within 5 minutes before kick-off to 125 minutes after. Kick-off times are stored as `"HH:MM UTC±N"` strings; the resolver computes the UTC timestamp at query time using PostgreSQL interval arithmetic. Apollo's `pollInterval: 60_000` re-queries `liveMatches` and `recentMatches` every minute.
**UTC kickoff ordering** — Both `upcomingMatches` (ascending) and `recentMatches` (descending) sort by computed UTC kickoff time using a `CASE` expression that parses the `time_local` string and subtracts the UTC offset as an interval. This ensures correct ordering across time zones — a match starting later in a westward timezone is not incorrectly ranked ahead of an earlier match with a higher database ID.
**Server/client split** — All pages use a server wrapper `page.tsx` that exports `metadata` (or `generateMetadata`) and a `client.tsx` that contains the Apollo query and interactive rendering. This lets Next.js generate accurate `<title>`, OpenGraph, and Twitter card tags for each route without requiring server-side data fetching in client components.
**`NEXT_PUBLIC_SITE_URL`** — The public hostname is read from this environment variable in `sitemap.ts`, `robots.ts`, and `layout.tsx` (`metadataBase`). All per-page `openGraph.url` values use relative paths (`/groups`, `/tournaments/2026`, etc.) which Next.js resolves against `metadataBase` automatically. The sitemap is marked `export const dynamic = 'force-dynamic'` so it runs at request time when the database is reachable, not at build time.
**Apollo Client v4** — This project uses Apollo Client 4 which moved hooks to `@apollo/client/react` and core utilities to `@apollo/client/core`. A thin wrapper in `lib/graphql/hooks.ts` re-exports `useQuery` typed as `Record<string, any>` to avoid the v4 `TData = {}` default breaking all field accesses.
**Standalone Docker output** — `next.config.ts` sets `output: 'standalone'` which produces a self-contained `server.js`. The `scripts/`, `lib/`, and `data/` directories are copied separately into the runner stage so `pnpm seed` and `pnpm sync` work inside the container without needing a full Node/TypeScript toolchain reinstall.
**Group standings** — Standings are computed live from match results via a SQL `GROUP BY` query in the `groupStandings` resolver. After each sync, 0-row standing entries are inserted for all teams in all 2026 groups, ensuring every group appears in the UI even before its first match is played.
**Wikipedia scraper rate limits** — The MediaWiki API occasionally returns a plain-text `"You are making too many requests to the API"` response instead of JSON. The scraper detects this by reading the response as text first, then parses JSON only if the body does not start with that phrase. On rate-limit (or HTTP 429), it waits 30 seconds before retrying. Retries use exponential back-off: 15 s × attempt number, up to 6 attempts per page.