README.md

# World Cup

A full-stack World Cup statistics web app covering every tournament from 1930 to 2026. Built with Next.js 16, TailwindCSS 4, GraphQL, and PostgreSQL. Historical data is scraped from English Wikipedia and committed to the repo; live 2026 results are synced from Wikipedia on a schedule so scores appear within minutes of the final whistle.

## Features

- **Live 2026 matches** — detected automatically when today's date matches a scheduled fixture; Apollo polls every 60 seconds for score updates
- **All-time statistics** — goals, hat-tricks, biggest wins, highest-scoring games, penalty stats, goals-by-minute heatmap, confederation performance, title counts
- **Group standings** — computed from match results for every tournament, with 0-row entries seeded so all groups appear even before any matches are played
- **Deep-linked pages** — every tournament, team, and player has a permanent URL (`/tournaments/1966`, `/teams/brazil`, `/players/Pelé`) with server-side metadata for SEO
- **Full-text search** — across teams, tournaments, and players
- **Squad data** — 26-man rosters for 2026 with position, shirt number, and date of birth
- **Qualification playoffs** — 2026 inter-confederation playoff results stored separately
- **Country flags** — via `flag-icons` CSS classes, ~200 nations covered
- **Dark pitch aesthetic** — Bebas Neue headings, Space Grotesk body, green-on-black design

## Pages

| Route | Content |
|---|---|
| `/` | Home: live matches, stat pills, latest result, upcoming fixtures, Golden Boot race |
| `/groups` | All 12 group tables for 2026 (P/W/D/L/GD/Pts) with results and upcoming fixtures |
| `/stats` | Historical stats: goals chart, top scorers, hat-tricks, biggest wins, goals by minute, ET/shootout stats, confederation stats |
| `/history` | All 24 tournament cards newest-first, each with host, winner, top scorer |
| `/search?q=…` | Full-text search across teams, players, tournaments |
| `/tournaments/[year]` | Tournament detail: group stage with standings + matches, knockout rounds, scorer sidebar |
| `/teams/[slug]` | Team profile: all-time record, top scorers, WC appearances |
| `/players/[name]` | Player profile: goals by tournament, penalties vs open play breakdown |

## Tech stack

| Layer | Technology |
|---|---|
| Framework | Next.js 16.2 (App Router, standalone output) |
| Styling | TailwindCSS 4 (CSS-first `@theme` config) |
| GraphQL server | GraphQL Yoga in `/api/graphql` Next.js route |
| GraphQL client | Apollo Client 4 with 60 s poll for live matches |
| ORM | Drizzle ORM with `postgres` driver |
| Database | PostgreSQL 16 |
| Flags | `flag-icons` npm package |
| Fonts | Bebas Neue + Space Grotesk (Google Fonts) |
| Container | Docker multi-stage build, Traefik-compatible |

## Data pipeline

Data flows through three scripts that are run at different times and for different purposes.

### 1. Scrape — one-time developer task

```bash
pnpm scrape                   # all years (1930–2022), matches + squads
pnpm scrape 2002              # single year
pnpm scrape 2002 --matches    # matches, meta, stadiums, groups only
pnpm scrape 2002 --squads     # squads only
```

Fetches structured match data from English Wikipedia using the [MediaWiki parse API](https://en.wikipedia.org/w/api.php) and writes JSON files to `data/{year}/`. These files are **committed to git** so the production build never needs to hit Wikipedia for historical data.

Each year produces up to five files:

| File | Content |
|---|---|
| `worldcup.json` | Matches with scores (FT/HT/ET/P) and goal-scorer events |
| `worldcup.meta.json` | Tournament metadata: host, winner, runner-up, team count |
| `worldcup.stadiums.json` | Stadium names and cities |
| `worldcup.groups.json` | Group compositions (teams per group) |
| `worldcup.squads.json` | Player rosters (where available on Wikipedia) |

The scraper has built-in rate-limit handling: it detects Wikipedia's plain-text `"You are making too many requests"` response, waits 30 seconds, and retries with exponential back-off (up to 6 attempts, 15 s × attempt delay between retries). Group sub-pages are fetched with a 3-second delay between requests.

### 2. Seed — initial database population

```bash
DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm seed
DATABASE_URL="..." pnpm seed --force   # drop and re-seed from scratch
```

Reads the committed `data/{year}/` JSON files and loads them into the database. Also creates all tables (if they do not exist). Intended for first-time setup and for re-seeding after schema changes. Covers **1930–2022 only** — 2026 data is handled by sync.

Seed is **idempotent** and skips silently if data is already present (unless `--force` is passed).

### 3. Sync — scheduled live updates (2026 only)

```bash
DATABASE_URL="..." pnpm sync           # normal run
DATABASE_URL="..." pnpm sync --force   # clear and re-fetch all 2026 data
```

Fetches the current state of the 2026 Wikipedia pages and upserts everything into the database. Historical years (1930–2022) are not touched — they come from the committed JSON files via seed.

What sync does on each run:

1. Fetches `2026_FIFA_World_Cup` via the MediaWiki API
2. Determines which groups are fully complete (all matches have FT scores) and skips their sub-pages to save requests
3. Upserts matches, scores, and goal events
4. Fetches `2026_FIFA_World_Cup_squads` and upserts squad rosters
5. Recomputes group standings from match results
6. Seeds 0-row standing entries for groups with no played matches yet (so all groups appear in the UI)
7. Updates tournament aggregates (total goals, matches played, avg goals/game)

Sync is designed to run on a **10-minute cron** in production. Each run is safe to repeat — all writes use `ON CONFLICT DO UPDATE`.

## Database schema

```
tournaments       year PK, host, winner, runner_up, third_place, fourth_place,
                  teams_count, matches_count, total_goals, avg_goals_per_game

teams             id, name UNIQUE, iso2, fifa_code, continent, confederation

stadiums          id, tournament_year FK, name, city, country_code,
                  capacity, timezone, coordinates

matches           id, tournament_year FK, round, group_name, date, time_local,
                  stadium_id FK, team1_id FK, team2_id FK,
                  score_ft_home, score_ft_away,
                  score_ht_home, score_ht_away,
                  score_et_home, score_et_away,
                  score_p_home,  score_p_away,
                  is_quali_playoff

goals             id, match_id FK, team_id FK, player_name,
                  minute, minute_offset, is_penalty, is_own_goal

group_standings   tournament_year FK, group_name, team_id FK,
                  pos, played, won, drawn, lost,
                  goals_for, goals_against, goal_diff, pts

squads            id, tournament_year FK, team_id FK, player_name,
                  shirt_number, position, date_of_birth
```

## Local development

**Prerequisites:** Node.js 22+, pnpm 10+, Docker

```bash
# 1. Clone and install
git clone <repo-url> worldcup
cd worldcup
pnpm install

# 2. Start the database
docker compose -f docker-compose.dev.yml up -d

# 3. Seed historical data (1930–2022) from committed JSON files
DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm seed

# 4. Sync 2026 data from Wikipedia
DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm sync

# 5. Start the dev server
DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm dev
```

Open [http://localhost:3000](http://localhost:3000).

To stop the database: `docker compose -f docker-compose.dev.yml down`

If you need to re-scrape historical data (e.g. after a Wikipedia article correction):

```bash
pnpm scrape 2002              # re-scrape a single year
git add data/2002/ && git commit -m "chore: refresh 2002 scraped data"
```

## Environment variables

| Variable | Required | Description |
|---|---|---|
| `DATABASE_URL` | Yes | PostgreSQL connection string |
| `NEXT_PUBLIC_SITE_URL` | Production | Public base URL, e.g. `https://worldcup.example.com` — used for sitemap and OG metadata |
| `DB_PASSWORD` | Production | Password for the `wc` DB user (used by docker-compose.yml) |
| `TRAEFIK_ENABLED` | Production | Set to `true` to activate Traefik router labels |
| `TRAEFIK_HOST` | Production | Public hostname, e.g. `worldcup.example.com` |
| `NETWORK_NAME` | Production | Name of the external Docker network Traefik is attached to |
| `UMAMI_ID` | Optional | Umami analytics site ID |
| `UMAMI_SRC` | Optional | Umami analytics script URL |

Copy `.env.example` to `.env` and fill in the values before deploying.

## Deployment (Coolify + Traefik)

The app is designed for self-hosted deployment via [Coolify](https://coolify.io) behind a [Traefik](https://traefik.io) reverse proxy.

### 1. Configure environment

In Coolify's environment variable editor set:

```
DB_PASSWORD=<strong-random-password>
DATABASE_URL=postgres://wc:<DB_PASSWORD>@db:5432/worldcup
NEXT_PUBLIC_SITE_URL=https://worldcup.yourdomain.com
TRAEFIK_ENABLED=true
TRAEFIK_HOST=worldcup.yourdomain.com
NETWORK_NAME=<your-traefik-network-name>
```

### 2. Deploy

Coolify builds the Docker image via `docker compose up` and attaches the container to the Traefik network automatically. TLS certificates are issued by the `resolver` cert resolver configured in Traefik.

### 3. Initial data load

After the first deployment, seed historical data and then sync 2026:

```bash
# In Coolify's terminal for the app container:
pnpm seed    # loads 1930–2022 from committed JSON files
pnpm sync    # fetches 2026 from Wikipedia
```

### 4. Scheduled sync (live updates)

In Coolify → your service → **Scheduled Tasks**, add:

| Field | Value |
|---|---|
| Command | `pnpm sync` |
| Schedule | `*/10 * * * *` |
| Container | `app` |

This re-syncs 2026 from Wikipedia every 10 minutes. New match results appear within 10 minutes of the final whistle.

## Project structure

```
worldcup/
├── app/
│   ├── layout.tsx                      # Root layout: nav, fonts, Apollo provider, global metadata
│   ├── robots.ts                       # robots.txt (Next.js convention)
│   ├── sitemap.ts                      # sitemap.xml — dynamic, rendered at request time
│   ├── page.tsx                        # Home — server wrapper (exports metadata)
│   ├── client.tsx                      # Home — Apollo/interactive client component
│   ├── groups/
│   │   ├── page.tsx                    # Groups — server wrapper
│   │   └── client.tsx                  # Groups — client component
│   ├── stats/page.tsx + client.tsx
│   ├── history/page.tsx + client.tsx
│   ├── search/page.tsx + client.tsx
│   ├── tournaments/[year]/
│   │   ├── page.tsx                    # generateMetadata fetches tournament from DB
│   │   └── client.tsx                  # Tournament detail, group standings, bracket
│   ├── teams/[slug]/page.tsx + client.tsx
│   ├── players/[name]/page.tsx + client.tsx
│   └── api/graphql/route.ts            # GraphQL Yoga endpoint
├── components/
│   ├── apollo-provider.tsx             # Apollo Client provider wrapper
│   ├── nav.tsx                         # Top navigation bar
│   ├── team-flag.tsx                   # flag-icons wrapper component
│   ├── match-card.tsx                  # Match result / fixture card
│   └── live-badge.tsx                  # Pulsing LIVE indicator
├── lib/
│   ├── db/
│   │   ├── schema.ts                   # Drizzle table definitions
│   │   └── index.ts                    # DB connection singleton
│   ├── graphql/
│   │   ├── schema.ts                   # GraphQL SDL
│   │   ├── resolvers/index.ts          # All resolvers
│   │   ├── hooks.ts                    # Apollo v4 useQuery wrapper
│   │   └── client.ts                   # Apollo Client factory
│   ├── wiki-scraper.ts                 # Wikipedia HTML parser (cheerio), rate-limit retry
│   └── iso-codes.ts                    # Team name → ISO2 country code map
├── scripts/
│   ├── scrape-wikipedia.ts             # Developer-only: scrape Wikipedia → data/{year}/
│   ├── seed.ts                         # Initial DB load from data/{year}/ JSON files
│   └── sync.ts                         # Scheduled: sync 2026 live data from Wikipedia
├── data/
│   ├── 1930/ … 2022/                   # Committed Wikipedia scrape output (per-year JSON)
│   └── {year}/
│       ├── worldcup.json               # Matches + goals
│       ├── worldcup.meta.json          # Tournament metadata
│       ├── worldcup.stadiums.json      # Stadiums
│       ├── worldcup.groups.json        # Group compositions
│       └── worldcup.squads.json        # Squad rosters (where available)
├── docker-compose.yml                  # Production (Traefik + external network)
├── docker-compose.dev.yml              # Local dev (DB only, port 5432 exposed)
├── Dockerfile                          # Multi-stage pnpm build
├── .env.example                        # Environment variable template
├── next.config.ts                      # standalone output, serverExternalPackages
├── drizzle.config.ts                   # Drizzle Kit config
└── tsconfig.json
```

## Architecture notes

**Live match detection** — A match is considered live when its date equals today and the current time falls within 5 minutes before kick-off to 125 minutes after. Kick-off times are stored as `"HH:MM UTC±N"` strings; the resolver computes the UTC timestamp at query time using PostgreSQL interval arithmetic. Apollo's `pollInterval: 60_000` re-queries `liveMatches` and `recentMatches` every minute.

**UTC kickoff ordering** — Both `upcomingMatches` (ascending) and `recentMatches` (descending) sort by computed UTC kickoff time using a `CASE` expression that parses the `time_local` string and subtracts the UTC offset as an interval. This ensures correct ordering across time zones — a match starting later in a westward timezone is not incorrectly ranked ahead of an earlier match with a higher database ID.

**Server/client split** — All pages use a server wrapper `page.tsx` that exports `metadata` (or `generateMetadata`) and a `client.tsx` that contains the Apollo query and interactive rendering. This lets Next.js generate accurate `<title>`, OpenGraph, and Twitter card tags for each route without requiring server-side data fetching in client components.

**`NEXT_PUBLIC_SITE_URL`** — The public hostname is read from this environment variable in `sitemap.ts`, `robots.ts`, and `layout.tsx` (`metadataBase`). All per-page `openGraph.url` values use relative paths (`/groups`, `/tournaments/2026`, etc.) which Next.js resolves against `metadataBase` automatically. The sitemap is marked `export const dynamic = 'force-dynamic'` so it runs at request time when the database is reachable, not at build time.

**Apollo Client v4** — This project uses Apollo Client 4 which moved hooks to `@apollo/client/react` and core utilities to `@apollo/client/core`. A thin wrapper in `lib/graphql/hooks.ts` re-exports `useQuery` typed as `Record<string, any>` to avoid the v4 `TData = {}` default breaking all field accesses.

**Standalone Docker output** — `next.config.ts` sets `output: 'standalone'` which produces a self-contained `server.js`. The `scripts/`, `lib/`, and `data/` directories are copied separately into the runner stage so `pnpm seed` and `pnpm sync` work inside the container without needing a full Node/TypeScript toolchain reinstall.

**Group standings** — Standings are computed live from match results via a SQL `GROUP BY` query in the `groupStandings` resolver. After each sync, 0-row standing entries are inserted for all teams in all 2026 groups, ensuring every group appears in the UI even before its first match is played.

**Wikipedia scraper rate limits** — The MediaWiki API occasionally returns a plain-text `"You are making too many requests to the API"` response instead of JSON. The scraper detects this by reading the response as text first, then parses JSON only if the body does not start with that phrase. On rate-limit (or HTTP 429), it waits 30 seconds before retrying. Retries use exponential back-off: 15 s × attempt number, up to 6 attempts per page.

## GraphQL API

The GraphQL playground is available at `/api/graphql` in development.

Key queries:

```graphql
# Live matches right now
{ liveMatches { id date time team1 { name } team2 { name } scoreFt isLive } }

# All-time top scorers
{ topScorers(limit: 10) { playerName goals penalties team { name iso2 } } }

# 2026 group standings
{ groupStandings(year: 2026) { groupName pos team { name iso2 } played won drawn lost goalsFor goalsAgainst pts } }

# Tournament detail
{ tournament(year: 2022) { year host winner totalGoals avgGoalsPerGame } }

# Team stats
{ team(slug: "brazil") { name stats { appearances wins losses titles goalsFor } } }

# Full-text search
{ search(query: "Ronaldo") { teams { name } players { playerName goals } } }

# Hat-tricks in World Cup history
{ hatTricks { playerName goals year round team { name } opponent { name } } }

# Global stats
{ tournamentStats { totalTournaments totalMatches totalGoals avgGoalsPerGame } }
```
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								# World Cup
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								A full-stack World Cup statistics web app covering every tournament from 1930 to 2026. Built with Next.js 16, TailwindCSS 4, GraphQL, and PostgreSQL. Historical data is scraped from English Wikipedia and committed to the repo; live 2026 results are synced from Wikipedia on a schedule so scores appear within minutes of the final whistle.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								## Features
 								- **Live 2026 matches** — detected automatically when today's date matches a scheduled fixture; Apollo polls every 60 seconds for score updates
 								- **All-time statistics** — goals, hat-tricks, biggest wins, highest-scoring games, penalty stats, goals-by-minute heatmap, confederation performance, title counts
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								- **Group standings** — computed from match results for every tournament, with 0-row entries seeded so all groups appear even before any matches are played
 								- **Deep-linked pages** — every tournament, team, and player has a permanent URL (`/tournaments/1966`, `/teams/brazil`, `/players/Pelé`) with server-side metadata for SEO
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								- **Full-text search** — across teams, tournaments, and players
 								- **Squad data** — 26-man rosters for 2026 with position, shirt number, and date of birth
 								- **Qualification playoffs** — 2026 inter-confederation playoff results stored separately
 								- **Country flags** — via `flag-icons` CSS classes, ~200 nations covered
 								- **Dark pitch aesthetic** — Bebas Neue headings, Space Grotesk body, green-on-black design
 								## Pages
 								| Route | Content |
 								|---|---|
 								| `/` | Home: live matches, stat pills, latest result, upcoming fixtures, Golden Boot race |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								| `/groups` | All 12 group tables for 2026 (P/W/D/L/GD/Pts) with results and upcoming fixtures |
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								| `/stats` | Historical stats: goals chart, top scorers, hat-tricks, biggest wins, goals by minute, ET/shootout stats, confederation stats |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								| `/history` | All 24 tournament cards newest-first, each with host, winner, top scorer |
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								| `/search?q=…` | Full-text search across teams, players, tournaments |
 								| `/tournaments/[year]` | Tournament detail: group stage with standings + matches, knockout rounds, scorer sidebar |
 								| `/teams/[slug]` | Team profile: all-time record, top scorers, WC appearances |
 								| `/players/[name]` | Player profile: goals by tournament, penalties vs open play breakdown |
 								## Tech stack
 								| Layer | Technology |
 								|---|---|
 								| Framework | Next.js 16.2 (App Router, standalone output) |
 								| Styling | TailwindCSS 4 (CSS-first `@theme` config) |
 								| GraphQL server | GraphQL Yoga in `/api/graphql` Next.js route |
 								| GraphQL client | Apollo Client 4 with 60 s poll for live matches |
 								| ORM | Drizzle ORM with `postgres` driver |
 								| Database | PostgreSQL 16 |
 								| Flags | `flag-icons` npm package |
 								| Fonts | Bebas Neue + Space Grotesk (Google Fonts) |
 								| Container | Docker multi-stage build, Traefik-compatible |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								## Data pipeline
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								Data flows through three scripts that are run at different times and for different purposes.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								### 1. Scrape — one-time developer task
 								```bash
 								pnpm scrape                   # all years (1930–2022), matches + squads
 								pnpm scrape 2002              # single year
 								pnpm scrape 2002 --matches    # matches, meta, stadiums, groups only
 								pnpm scrape 2002 --squads     # squads only
 								```
 								Fetches structured match data from English Wikipedia using the [MediaWiki parse API](https://en.wikipedia.org/w/api.php) and writes JSON files to `data/{year}/`. These files are **committed to git** so the production build never needs to hit Wikipedia for historical data.
 								Each year produces up to five files:
 								| File | Content |
 								|---|---|
 								| `worldcup.json` | Matches with scores (FT/HT/ET/P) and goal-scorer events |
 								| `worldcup.meta.json` | Tournament metadata: host, winner, runner-up, team count |
 								| `worldcup.stadiums.json` | Stadium names and cities |
 								| `worldcup.groups.json` | Group compositions (teams per group) |
 								| `worldcup.squads.json` | Player rosters (where available on Wikipedia) |
 								The scraper has built-in rate-limit handling: it detects Wikipedia's plain-text `"You are making too many requests"` response, waits 30 seconds, and retries with exponential back-off (up to 6 attempts, 15 s × attempt delay between retries). Group sub-pages are fetched with a 3-second delay between requests.
 								### 2. Seed — initial database population
 								```bash
 								DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm seed
 								DATABASE_URL="..." pnpm seed --force   # drop and re-seed from scratch
 								```
 								Reads the committed `data/{year}/` JSON files and loads them into the database. Also creates all tables (if they do not exist). Intended for first-time setup and for re-seeding after schema changes. Covers **1930–2022 only** — 2026 data is handled by sync.
 								Seed is **idempotent** and skips silently if data is already present (unless `--force` is passed).
 								### 3. Sync — scheduled live updates (2026 only)
 								```bash
 								DATABASE_URL="..." pnpm sync           # normal run
 								DATABASE_URL="..." pnpm sync --force   # clear and re-fetch all 2026 data
 								```
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								Fetches the current state of the 2026 Wikipedia pages and upserts everything into the database. Historical years (1930–2022) are not touched — they come from the committed JSON files via seed.
 								What sync does on each run:
 . Fetches `2026_FIFA_World_Cup` via the MediaWiki API
 . Determines which groups are fully complete (all matches have FT scores) and skips their sub-pages to save requests
 . Upserts matches, scores, and goal events
 . Fetches `2026_FIFA_World_Cup_squads` and upserts squad rosters
 . Recomputes group standings from match results
 . Seeds 0-row standing entries for groups with no played matches yet (so all groups appear in the UI)
 . Updates tournament aggregates (total goals, matches played, avg goals/game)
 								Sync is designed to run on a **10-minute cron** in production. Each run is safe to repeat — all writes use `ON CONFLICT DO UPDATE`.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								## Database schema
 								```
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								tournaments       year PK, host, winner, runner_up, third_place, fourth_place,
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								                  teams_count, matches_count, total_goals, avg_goals_per_game
 								teams             id, name UNIQUE, iso2, fifa_code, continent, confederation
 								stadiums          id, tournament_year FK, name, city, country_code,
 								                  capacity, timezone, coordinates
 								matches           id, tournament_year FK, round, group_name, date, time_local,
 								                  stadium_id FK, team1_id FK, team2_id FK,
 								                  score_ft_home, score_ft_away,
 								                  score_ht_home, score_ht_away,
 								                  score_et_home, score_et_away,
 								                  score_p_home,  score_p_away,
 								                  is_quali_playoff
 								goals             id, match_id FK, team_id FK, player_name,
 								                  minute, minute_offset, is_penalty, is_own_goal
 								group_standings   tournament_year FK, group_name, team_id FK,
 								                  pos, played, won, drawn, lost,
 								                  goals_for, goals_against, goal_diff, pts
 								squads            id, tournament_year FK, team_id FK, player_name,
 								                  shirt_number, position, date_of_birth
 								```
 								## Local development
 								**Prerequisites:** Node.js 22+, pnpm 10+, Docker
 								```bash
 								# 1. Clone and install
 								git clone <repo-url> worldcup
 								cd worldcup
 								pnpm install
 								# 2. Start the database
 								docker compose -f docker-compose.dev.yml up -d
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								# 3. Seed historical data (1930–2022) from committed JSON files
 								DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm seed
 								# 4. Sync 2026 data from Wikipedia
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm sync
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								# 5. Start the dev server
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								DATABASE_URL="postgres://wc:wc@localhost:5432/worldcup" pnpm dev
 								```
 								Open [http://localhost:3000](http://localhost:3000).
 								To stop the database: `docker compose -f docker-compose.dev.yml down`
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								If you need to re-scrape historical data (e.g. after a Wikipedia article correction):
 								```bash
 								pnpm scrape 2002              # re-scrape a single year
 								git add data/2002/ && git commit -m "chore: refresh 2002 scraped data"
 								```
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								## Environment variables
 								| Variable | Required | Description |
 								|---|---|---|
 								| `DATABASE_URL` | Yes | PostgreSQL connection string |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								| `NEXT_PUBLIC_SITE_URL` | Production | Public base URL, e.g. `https://worldcup.example.com` — used for sitemap and OG metadata |
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								| `DB_PASSWORD` | Production | Password for the `wc` DB user (used by docker-compose.yml) |
 								| `TRAEFIK_ENABLED` | Production | Set to `true` to activate Traefik router labels |
 								| `TRAEFIK_HOST` | Production | Public hostname, e.g. `worldcup.example.com` |
 								| `NETWORK_NAME` | Production | Name of the external Docker network Traefik is attached to |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								| `UMAMI_ID` | Optional | Umami analytics site ID |
 								| `UMAMI_SRC` | Optional | Umami analytics script URL |
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								Copy `.env.example` to `.env` and fill in the values before deploying.
 								## Deployment (Coolify + Traefik)
 								The app is designed for self-hosted deployment via [Coolify](https://coolify.io) behind a [Traefik](https://traefik.io) reverse proxy.
 								### 1. Configure environment
 								In Coolify's environment variable editor set:
 								```
 								DB_PASSWORD=<strong-random-password>
 								DATABASE_URL=postgres://wc:<DB_PASSWORD>@db:5432/worldcup
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								NEXT_PUBLIC_SITE_URL=https://worldcup.yourdomain.com
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								TRAEFIK_ENABLED=true
 								TRAEFIK_HOST=worldcup.yourdomain.com
 								NETWORK_NAME=<your-traefik-network-name>
 								```
 								### 2. Deploy
 								Coolify builds the Docker image via `docker compose up` and attaches the container to the Traefik network automatically. TLS certificates are issued by the `resolver` cert resolver configured in Traefik.
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								### 3. Initial data load
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								After the first deployment, seed historical data and then sync 2026:
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								```bash
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								# In Coolify's terminal for the app container:
 								pnpm seed    # loads 1930–2022 from committed JSON files
 								pnpm sync    # fetches 2026 from Wikipedia
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								```
 								### 4. Scheduled sync (live updates)
 								In Coolify → your service → **Scheduled Tasks**, add:
 								| Field | Value |
 								|---|---|
 								| Command | `pnpm sync` |
 								| Schedule | `*/10 * * * *` |
 								| Container | `app` |
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								This re-syncs 2026 from Wikipedia every 10 minutes. New match results appear within 10 minutes of the final whistle.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								## Project structure
 								```
 								worldcup/
 								├── app/
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								│   ├── layout.tsx                      # Root layout: nav, fonts, Apollo provider, global metadata
 								│   ├── robots.ts                       # robots.txt (Next.js convention)
 								│   ├── sitemap.ts                      # sitemap.xml — dynamic, rendered at request time
 								│   ├── page.tsx                        # Home — server wrapper (exports metadata)
 								│   ├── client.tsx                      # Home — Apollo/interactive client component
 								│   ├── groups/
 								│   │   ├── page.tsx                    # Groups — server wrapper
 								│   │   └── client.tsx                  # Groups — client component
 								│   ├── stats/page.tsx + client.tsx
 								│   ├── history/page.tsx + client.tsx
 								│   ├── search/page.tsx + client.tsx
 								│   ├── tournaments/[year]/
 								│   │   ├── page.tsx                    # generateMetadata fetches tournament from DB
 								│   │   └── client.tsx                  # Tournament detail, group standings, bracket
 								│   ├── teams/[slug]/page.tsx + client.tsx
 								│   ├── players/[name]/page.tsx + client.tsx
 								│   └── api/graphql/route.ts            # GraphQL Yoga endpoint
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								├── components/
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								│   ├── apollo-provider.tsx             # Apollo Client provider wrapper
 								│   ├── nav.tsx                         # Top navigation bar
 								│   ├── team-flag.tsx                   # flag-icons wrapper component
 								│   ├── match-card.tsx                  # Match result / fixture card
 								│   └── live-badge.tsx                  # Pulsing LIVE indicator
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								├── lib/
 								│   ├── db/
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								│   │   ├── schema.ts                   # Drizzle table definitions
 								│   │   └── index.ts                    # DB connection singleton
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								│   ├── graphql/
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								│   │   ├── schema.ts                   # GraphQL SDL
 								│   │   ├── resolvers/index.ts          # All resolvers
 								│   │   ├── hooks.ts                    # Apollo v4 useQuery wrapper
 								│   │   └── client.ts                   # Apollo Client factory
 								│   ├── wiki-scraper.ts                 # Wikipedia HTML parser (cheerio), rate-limit retry
 								│   └── iso-codes.ts                    # Team name → ISO2 country code map
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								├── scripts/
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								│   ├── scrape-wikipedia.ts             # Developer-only: scrape Wikipedia → data/{year}/
 								│   ├── seed.ts                         # Initial DB load from data/{year}/ JSON files
 								│   └── sync.ts                         # Scheduled: sync 2026 live data from Wikipedia
 								├── data/
 								│   ├── 1930/ … 2022/                   # Committed Wikipedia scrape output (per-year JSON)
 								│   └── {year}/
 								│       ├── worldcup.json               # Matches + goals
 								│       ├── worldcup.meta.json          # Tournament metadata
 								│       ├── worldcup.stadiums.json      # Stadiums
 								│       ├── worldcup.groups.json        # Group compositions
 								│       └── worldcup.squads.json        # Squad rosters (where available)
 								├── docker-compose.yml                  # Production (Traefik + external network)
 								├── docker-compose.dev.yml              # Local dev (DB only, port 5432 exposed)
 								├── Dockerfile                          # Multi-stage pnpm build
 								├── .env.example                        # Environment variable template
 								├── next.config.ts                      # standalone output, serverExternalPackages
 								├── drizzle.config.ts                   # Drizzle Kit config
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
+								└── tsconfig.json
 								```
 								## Architecture notes
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								**Live match detection** — A match is considered live when its date equals today and the current time falls within 5 minutes before kick-off to 125 minutes after. Kick-off times are stored as `"HH:MM UTC±N"` strings; the resolver computes the UTC timestamp at query time using PostgreSQL interval arithmetic. Apollo's `pollInterval: 60_000` re-queries `liveMatches` and `recentMatches` every minute.
 								**UTC kickoff ordering** — Both `upcomingMatches` (ascending) and `recentMatches` (descending) sort by computed UTC kickoff time using a `CASE` expression that parses the `time_local` string and subtracts the UTC offset as an interval. This ensures correct ordering across time zones — a match starting later in a westward timezone is not incorrectly ranked ahead of an earlier match with a higher database ID.
 								**Server/client split** — All pages use a server wrapper `page.tsx` that exports `metadata` (or `generateMetadata`) and a `client.tsx` that contains the Apollo query and interactive rendering. This lets Next.js generate accurate `<title>`, OpenGraph, and Twitter card tags for each route without requiring server-side data fetching in client components.
 								**`NEXT_PUBLIC_SITE_URL`** — The public hostname is read from this environment variable in `sitemap.ts`, `robots.ts`, and `layout.tsx` (`metadataBase`). All per-page `openGraph.url` values use relative paths (`/groups`, `/tournaments/2026`, etc.) which Next.js resolves against `metadataBase` automatically. The sitemap is marked `export const dynamic = 'force-dynamic'` so it runs at request time when the database is reachable, not at build time.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								**Apollo Client v4** — This project uses Apollo Client 4 which moved hooks to `@apollo/client/react` and core utilities to `@apollo/client/core`. A thin wrapper in `lib/graphql/hooks.ts` re-exports `useQuery` typed as `Record<string, any>` to avoid the v4 `TData = {}` default breaking all field accesses.
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								**Standalone Docker output** — `next.config.ts` sets `output: 'standalone'` which produces a self-contained `server.js`. The `scripts/`, `lib/`, and `data/` directories are copied separately into the runner stage so `pnpm seed` and `pnpm sync` work inside the container without needing a full Node/TypeScript toolchain reinstall.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								**Group standings** — Standings are computed live from match results via a SQL `GROUP BY` query in the `groupStandings` resolver. After each sync, 0-row standing entries are inserted for all teams in all 2026 groups, ensuring every group appears in the UI even before its first match is played.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
-											docs: rewrite README with accurate data pipeline documentation
										
										
											2026-06-16 07:50:12 +02:00
+								**Wikipedia scraper rate limits** — The MediaWiki API occasionally returns a plain-text `"You are making too many requests to the API"` response instead of JSON. The scraper detects this by reading the response as text first, then parses JSON only if the body does not start with that phrase. On rate-limit (or HTTP 429), it waits 30 seconds before retrying. Retries use exponential back-off: 15 s × attempt number, up to 6 attempts per page.
-											feat: initial commit — World Cup stats app with pnpm, Traefik, Docker
										
										
											2026-06-14 15:36:44 +02:00
 								## GraphQL API
 								The GraphQL playground is available at `/api/graphql` in development.
 								Key queries:
 								```graphql
 								# Live matches right now
 								{ liveMatches { id date time team1 { name } team2 { name } scoreFt isLive } }
 								# All-time top scorers
 								{ topScorers(limit: 10) { playerName goals penalties team { name iso2 } } }
 								# 2026 group standings
 								{ groupStandings(year: 2026) { groupName pos team { name iso2 } played won drawn lost goalsFor goalsAgainst pts } }
 								# Tournament detail
 								{ tournament(year: 2022) { year host winner totalGoals avgGoalsPerGame } }
 								# Team stats
 								{ team(slug: "brazil") { name stats { appearances wins losses titles goalsFor } } }
 								# Full-text search
 								{ search(query: "Ronaldo") { teams { name } players { playerName goals } } }
 								# Hat-tricks in World Cup history
 								{ hatTricks { playerName goals year round team { name } opponent { name } } }
 								# Global stats
 								{ tournamentStats { totalTournaments totalMatches totalGoals avgGoalsPerGame } }
 								```