Enforce ASCII in README.md (#513)

This all started because I was going to write a script to autogenerate the Table of Contents in the root `README.md`, but I noticed that the `href` for the "Why Codex?" heading was `#whycodex` instead of `#why-codex`. This piqued my curiosity and it turned out that the space in "Why Codex?" was not an ASCII space but **U+00A0**, a non-breaking space, and so GitHub ignored it when generating the `href` for the heading. This also meant that when I did a text search for `why codex` in the `README.md` in VS Code, the "Why Codex" heading did not match because of the presence of **U+00A0**. In short, these types of Unicode characters seem like a hazard, so I decided to introduce this script to flag them, and if desired, to replace them with "good enough" ASCII equivalents. For now, this only applies to the root `README.md` file, but I think we should ultimately apply this across our source code, as well, as we seem to have quite a lot of non-ASCII Unicode and it's probably going to cause `rg` to miss things. Contributions of this PR: * `./scripts/asciicheck.py`, which takes a list of filepaths and returns non-zero if any of them contain non-ASCII characters. (Currently, there is one exception for ✨ aka **U+2728**, though I would like to default to an empty allowlist and then require all exceptions to be specified as flags.) * A `--fix` option that will attempt to rewrite files with violations using a equivalents from a hardcoded substitution list. * An update to `ci.yml` to verify `./scripts/asciicheck.py README.md` succeeds. * A cleanup of `README.md` using the `--fix` option as well as some editorial decisions on my part. * I tried to update the `href`s in the Table of Contents to reflect the changes in the heading titles. (TIL that if a heading has a character like `&` surrounded by spaces, it becomes `--` in the generated `href`.)
2025-04-22 07:07:40 -07:00
parent 2cb8355968
commit c00ae2dcc1
3 changed files with 211 additions and 82 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -67,3 +67,6 @@ jobs:

      - name: Build
        run: pnpm run build
+
+      - name: Ensure README.md contains only ASCII and certain Unicode code points
+        run: ./scripts/asciicheck.py README.md
--- a/README.md
+++ b/README.md
@@ -12,13 +12,13 @@

 - [Experimental Technology Disclaimer](#experimental-technology-disclaimer)
 - [Quickstart](#quickstart)
- [Why Codex?](#whycodex)
- [Security Model \& Permissions](#securitymodelpermissions)
+- [Why Codex?](#why-codex)
+- [Security Model & Permissions](#security-model--permissions)
  - [Platform sandboxing details](#platform-sandboxing-details)
- [System Requirements](#systemrequirements)
- [CLI Reference](#clireference)
- [Memory \& Project Docs](#memoryprojectdocs)
- [Non‑interactive / CI mode](#noninteractivecimode)
+- [System Requirements](#system-requirements)
+- [CLI Reference](#cli-reference)
+- [Memory & Project Docs](#memory--project-docs)
+- [Non-interactive / CI mode](#non-interactive--ci-mode)
 - [Recipes](#recipes)
 - [Installation](#installation)
 - [Configuration](#configuration)
@@ -27,7 +27,7 @@
 - [Contributing](#contributing)
  - [Development workflow](#development-workflow)
    - [Nix Flake Development](#nix-flake-development)
-  - [Writing high‑impact code changes](#writing-highimpact-code-changes)
+  - [Writing high-impact code changes](#writing-high-impact-code-changes)
  - [Opening a pull request](#opening-a-pull-request)
  - [Review process](#review-process)
  - [Community values](#community-values)
@@ -35,7 +35,7 @@
  - [Contributor License Agreement (CLA)](#contributor-license-agreement-cla)
    - [Quick fixes](#quick-fixes)
  - [Releasing `codex`](#releasing-codex)
- [Security \& Responsible AI](#securityresponsibleai)
+- [Security & Responsible AI](#security--responsible-ai)
 - [License](#license)
 - [Zero Data Retention (ZDR) Organization Limitation](#zero-data-retention-zdr-organization-limitation)

@@ -45,7 +45,7 @@

 ## Experimental Technology Disclaimer

-Codex CLI is an experimental project under active development. It is not yet stable, may contain bugs, incomplete features, or undergo breaking changes. We’re building it in the open with the community and welcome:
+Codex CLI is an experimental project under active development. It is not yet stable, may contain bugs, incomplete features, or undergo breaking changes. We're building it in the open with the community and welcome:

 - Bug reports
 - Feature requests
@@ -115,59 +115,59 @@ codex "explain this codebase to me"
 codex --approval-mode full-auto "create the fanciest todo-list app"
 ```

-That’s it – Codex will scaffold a file, run it inside a sandbox, install any
+That's it - Codex will scaffold a file, run it inside a sandbox, install any
 missing dependencies, and show you the live result. Approve the changes and
-they’ll be committed to your working directory.
+they'll be committed to your working directory.

 ---

-## Why Codex?
+## Why Codex?

 Codex CLI is built for developers who already **live in the terminal** and want
-ChatGPT‑level reasoning **plus** the power to actually run code, manipulate
-files, and iterate – all under version control. In short, it’s _chat‑driven
+ChatGPT-level reasoning **plus** the power to actually run code, manipulate
+files, and iterate - all under version control. In short, it's _chat-driven
 development_ that understands and executes your repo.

- **Zero setup** — bring your OpenAI API key and it just works!
+- **Zero setup** - bring your OpenAI API key and it just works!
 - **Full auto-approval, while safe + secure** by running network-disabled and directory-sandboxed
- **Multimodal** — pass in screenshots or diagrams to implement features ✨
+- **Multimodal** - pass in screenshots or diagrams to implement features ✨

 And it's **fully open-source** so you can see and contribute to how it develops!

 ---

-## Security Model & Permissions
+## Security Model & Permissions

 Codex lets you decide _how much autonomy_ the agent receives and auto-approval policy via the
 `--approval-mode` flag (or the interactive onboarding prompt):

-| Mode                      | What the agent may do without asking                                                               | Still requires approval                                                                         |
-| ------------------------- | -------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
-| **Suggest** <br>(default) | • Read any file in the repo                                                                        | • **All** file writes/patches <br>• **Any** arbitrary shell commands (aside from reading files) |
-| **Auto Edit**             | • Read **and** apply‑patch writes to files                                                         | • **All** shell commands                                                                        |
-| **Full Auto**             | • Read/write files <br>• Execute shell commands (network disabled, writes limited to your workdir) | –                                                                                               |
+| Mode                      | What the agent may do without asking                                                                | Still requires approval                                                                         |
+| ------------------------- | --------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
+| **Suggest** <br>(default) | <li>Read any file in the repo                                                                       | <li>**All** file writes/patches<li> **Any** arbitrary shell commands (aside from reading files) |
+| **Auto Edit**             | <li>Read **and** apply-patch writes to files                                                        | <li>**All** shell commands                                                                      |
+| **Full Auto**             | <li>Read/write files <li> Execute shell commands (network disabled, writes limited to your workdir) | -                                                                                               |

-In **Full Auto** every command is run **network‑disabled** and confined to the
-current working directory (plus temporary files) for defense‑in‑depth. Codex
-will also show a warning/confirmation if you start in **auto‑edit** or
-**full‑auto** while the directory is _not_ tracked by Git, so you always have a
+In **Full Auto** every command is run **network-disabled** and confined to the
+current working directory (plus temporary files) for defense-in-depth. Codex
+will also show a warning/confirmation if you start in **auto-edit** or
+**full-auto** while the directory is _not_ tracked by Git, so you always have a
 safety net.

-Coming soon: you’ll be able to whitelist specific commands to auto‑execute with
-the network enabled, once we’re confident in additional safeguards.
+Coming soon: you'll be able to whitelist specific commands to auto-execute with
+the network enabled, once we're confident in additional safeguards.

 ### Platform sandboxing details

 The hardening mechanism Codex uses depends on your OS:

- **macOS 12+** – commands are wrapped with **Apple Seatbelt** (`sandbox-exec`).
+- **macOS 12+** - commands are wrapped with **Apple Seatbelt** (`sandbox-exec`).

-  - Everything is placed in a read‑only jail except for a small set of
+  - Everything is placed in a read-only jail except for a small set of
    writable roots (`$PWD`, `$TMPDIR`, `~/.codex`, etc.).
-  - Outbound network is _fully blocked_ by default – even if a child process
+  - Outbound network is _fully blocked_ by default - even if a child process
    tries to `curl` somewhere it will fail.

- **Linux** – there is no sandboxing by default.
+- **Linux** - there is no sandboxing by default.
  We recommend using Docker for sandboxing, where Codex launches itself inside a **minimal
  container image** and mounts your repo _read/write_ at the same path. A
  custom `iptables`/`ipset` firewall script denies all egress except the
@@ -176,47 +176,47 @@ The hardening mechanism Codex uses depends on your OS:

 ---

-## System Requirements
+## System Requirements

 | Requirement                 | Details                                                         |
 | --------------------------- | --------------------------------------------------------------- |
-| Operating systems           | macOS 12+, Ubuntu 20.04+/Debian 10+, or Windows 11 **via WSL2** |
+| Operating systems           | macOS 12+, Ubuntu 20.04+/Debian 10+, or Windows 11 **via WSL2** |
 | Node.js                     | **22 or newer** (LTS recommended)                               |
-| Git (optional, recommended) | 2.23+ for built‑in PR helpers                                   |
-| RAM                         | 4‑GB minimum (8‑GB recommended)                                 |
+| Git (optional, recommended) | 2.23+ for built-in PR helpers                                   |
+| RAM                         | 4-GB minimum (8-GB recommended)                                 |

 > Never run `sudo npm install -g`; fix npm permissions instead.

 ---

-## CLI Reference
+## CLI Reference

 | Command                              | Purpose                             | Example                              |
 | ------------------------------------ | ----------------------------------- | ------------------------------------ |
 | `codex`                              | Interactive REPL                    | `codex`                              |
-| `codex "…"`                          | Initial prompt for interactive REPL | `codex "fix lint errors"`            |
-| `codex -q "…"`                       | Non‑interactive "quiet mode"        | `codex -q --json "explain utils.ts"` |
+| `codex "..."`                        | Initial prompt for interactive REPL | `codex "fix lint errors"`            |
+| `codex -q "..."`                     | Non-interactive "quiet mode"        | `codex -q --json "explain utils.ts"` |
 | `codex completion <bash\|zsh\|fish>` | Print shell completion script       | `codex completion bash`              |

 Key flags: `--model/-m`, `--approval-mode/-a`, `--quiet/-q`, and `--notify`.

 ---

-## Memory & Project Docs
+## Memory & Project Docs

 Codex merges Markdown instructions in this order:

-1. `~/.codex/instructions.md` – personal global guidance
-2. `codex.md` at repo root – shared project notes
-3. `codex.md` in cwd – sub‑package specifics
+1. `~/.codex/instructions.md` - personal global guidance
+2. `codex.md` at repo root - shared project notes
+3. `codex.md` in cwd - sub-package specifics

 Disable with `--no-project-doc` or `CODEX_DISABLE_PROJECT_DOC=1`.

 ---

-## Non‑interactive / CI mode
+## Non-interactive / CI mode

-Run Codex head‑less in pipelines. Example GitHub Action step:
+Run Codex head-less in pipelines. Example GitHub Action step:

 ```yaml
 - name: Update changelog via Codex
@@ -240,15 +240,15 @@ DEBUG=true codex

 ## Recipes

-Below are a few bite‑size examples you can copy‑paste. Replace the text in quotes with your own task. See the [prompting guide](https://github.com/openai/codex/blob/main/codex-cli/examples/prompting_guide.md) for more tips and usage patterns.
+Below are a few bite-size examples you can copy-paste. Replace the text in quotes with your own task. See the [prompting guide](https://github.com/openai/codex/blob/main/codex-cli/examples/prompting_guide.md) for more tips and usage patterns.

 | ✨  | What you type                                                                   | What happens                                                               |
 | --- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
-| 1   | `codex "Refactor the Dashboard component to React Hooks"`                       | Codex rewrites the class component, runs `npm test`, and shows the diff.   |
+| 1   | `codex "Refactor the Dashboard component to React Hooks"`                       | Codex rewrites the class component, runs `npm test`, and shows the diff.   |
 | 2   | `codex "Generate SQL migrations for adding a users table"`                      | Infers your ORM, creates migration files, and runs them in a sandboxed DB. |
 | 3   | `codex "Write unit tests for utils/date.ts"`                                    | Generates tests, executes them, and iterates until they pass.              |
-| 4   | `codex "Bulk‑rename *.jpeg → *.jpg with git mv"`                                | Safely renames files and updates imports/usages.                           |
-| 5   | `codex "Explain what this regex does: ^(?=.*[A-Z]).{8,}$"`                      | Outputs a step‑by‑step human explanation.                                  |
+| 4   | `codex "Bulk-rename *.jpeg -> *.jpg with git mv"`                               | Safely renames files and updates imports/usages.                           |
+| 5   | `codex "Explain what this regex does: ^(?=.*[A-Z]).{8,}$"`                      | Outputs a step-by-step human explanation.                                  |
 | 6   | `codex "Carefully review this repo, and propose 3 high impact well-scoped PRs"` | Suggests impactful PRs in the current codebase.                            |
 | 7   | `codex "Look for vulnerabilities and create a security review report"`          | Finds and explains security bugs.                                          |

@@ -257,7 +257,7 @@ Below are a few bite‑size examples you can copy‑paste. Replace the text in q
 ## Installation

 <details open>
-<summary><strong>From npm (Recommended)</strong></summary>
+<summary><strong>From npm (Recommended)</strong></summary>

 ```bash
 npm install -g @openai/codex
@@ -272,7 +272,7 @@ pnpm add -g @openai/codex
 </details>

 <details>
-<summary><strong>Build from source</strong></summary>
+<summary><strong>Build from source</strong></summary>

 ```bash
 # Clone the repository and navigate to the CLI package
@@ -289,7 +289,7 @@ pnpm build
 # Get the usage and the options
 node ./dist/cli.js --help

-# Run the locally‑built CLI directly
+# Run the locally-built CLI directly
 node ./dist/cli.js

 # Or link the command globally for convenience
@@ -363,7 +363,7 @@ Codex runs model-generated commands in a sandbox. If a proposed command or file
 <details>
 <summary>Does it work on Windows?</summary>

-Not directly. It requires [Windows Subsystem for Linux (WSL2)](https://learn.microsoft.com/en-us/windows/wsl/install) – Codex has been tested on macOS and Linux with Node ≥ 22.
+Not directly. It requires [Windows Subsystem for Linux (WSL2)](https://learn.microsoft.com/en-us/windows/wsl/install) - Codex has been tested on macOS and Linux with Node 22.

 </details>

@@ -394,12 +394,12 @@ OpenAI rejected the request. Error details: Status: 400, Code: unsupported_param

 ## Funding Opportunity

-We’re excited to launch a **$1 million initiative** supporting open source projects that use Codex CLI and other OpenAI models.
+We're excited to launch a **$1 million initiative** supporting open source projects that use Codex CLI and other OpenAI models.

 - Grants are awarded in **$25,000** API credit increments.
 - Applications are reviewed **on a rolling basis**.

-**Interested? [Apply here](https://openai.com/form/codex-open-source-fund/).**
+**Interested? [Apply here](https://openai.com/form/codex-open-source-fund/).**

 ---

@@ -407,14 +407,14 @@ We’re excited to launch a **$1 million initiative** supporting open source pr

 This project is under active development and the code will likely change pretty significantly. We'll update this message once that's complete!

-More broadly we welcome contributions – whether you are opening your very first pull request or you’re a seasoned maintainer. At the same time we care about reliability and long‑term maintainability, so the bar for merging code is intentionally **high**. The guidelines below spell out what “high‑quality” means in practice and should make the whole process transparent and friendly.
+More broadly we welcome contributions - whether you are opening your very first pull request or you're a seasoned maintainer. At the same time we care about reliability and long-term maintainability, so the bar for merging code is intentionally **high**. The guidelines below spell out what "high-quality" means in practice and should make the whole process transparent and friendly.

 ### Development workflow

- Create a _topic branch_ from `main` – e.g. `feat/interactive-prompt`.
+- Create a _topic branch_ from `main` - e.g. `feat/interactive-prompt`.
 - Keep your changes focused. Multiple unrelated fixes should be opened as separate PRs.
- Use `pnpm test:watch` during development for super‑fast feedback.
- We use **Vitest** for unit tests, **ESLint** + **Prettier** for style, and **TypeScript** for type‑checking.
+- Use `pnpm test:watch` during development for super-fast feedback.
+- We use **Vitest** for unit tests, **ESLint** + **Prettier** for style, and **TypeScript** for type-checking.
 - Before pushing, run the full test/type/lint suite:

 ### Git Hooks with Husky
@@ -436,16 +436,16 @@ npm test && npm run lint && npm run typecheck
  I have read the CLA Document and I hereby sign the CLA
  ```

-  The CLA‑Assistant bot will turn the PR status green once all authors have signed.
+  The CLA-Assistant bot will turn the PR status green once all authors have signed.

 ```bash
-# Watch mode (tests rerun on change)
+# Watch mode (tests rerun on change)
 pnpm test:watch

-# Type‑check without emitting files
+# Type-check without emitting files
 pnpm typecheck

-# Automatically fix lint + prettier issues
+# Automatically fix lint + prettier issues
 pnpm lint:fix
 pnpm format:fix
 ```
@@ -475,35 +475,35 @@ Run the CLI via the flake app:
 nix run .#codex
 ```

-### Writing high‑impact code changes
+### Writing high-impact code changes

 1. **Start with an issue.** Open a new one or comment on an existing discussion so we can agree on the solution before code is written.
-2. **Add or update tests.** Every new feature or bug‑fix should come with test coverage that fails before your change and passes afterwards. 100 % coverage is not required, but aim for meaningful assertions.
-3. **Document behaviour.** If your change affects user‑facing behaviour, update the README, inline help (`codex --help`), or relevant example projects.
+2. **Add or update tests.** Every new feature or bug-fix should come with test coverage that fails before your change and passes afterwards. 100% coverage is not required, but aim for meaningful assertions.
+3. **Document behaviour.** If your change affects user-facing behaviour, update the README, inline help (`codex --help`), or relevant example projects.
 4. **Keep commits atomic.** Each commit should compile and the tests should pass. This makes reviews and potential rollbacks easier.

 ### Opening a pull request

- Fill in the PR template (or include similar information) – **What? Why? How?**
+- Fill in the PR template (or include similar information) - **What? Why? How?**
 - Run **all** checks locally (`npm test && npm run lint && npm run typecheck`). CI failures that could have been caught locally slow down the process.
- Make sure your branch is up‑to‑date with `main` and that you have resolved merge conflicts.
- Mark the PR as **Ready for review** only when you believe it is in a merge‑able state.
+- Make sure your branch is up-to-date with `main` and that you have resolved merge conflicts.
+- Mark the PR as **Ready for review** only when you believe it is in a merge-able state.

 ### Review process

 1. One maintainer will be assigned as a primary reviewer.
-2. We may ask for changes – please do not take this personally. We value the work, we just also value consistency and long‑term maintainability.
-3. When there is consensus that the PR meets the bar, a maintainer will squash‑and‑merge.
+2. We may ask for changes - please do not take this personally. We value the work, we just also value consistency and long-term maintainability.
+3. When there is consensus that the PR meets the bar, a maintainer will squash-and-merge.

 ### Community values

 - **Be kind and inclusive.** Treat others with respect; we follow the [Contributor Covenant](https://www.contributor-covenant.org/).
- **Assume good intent.** Written communication is hard – err on the side of generosity.
+- **Assume good intent.** Written communication is hard - err on the side of generosity.
 - **Teach & learn.** If you spot something confusing, open an issue or PR with improvements.

 ### Getting help

-If you run into problems setting up the project, would like feedback on an idea, or just want to say _hi_ – please open a Discussion or jump into the relevant issue. We are happy to help.
+If you run into problems setting up the project, would like feedback on an idea, or just want to say _hi_ - please open a Discussion or jump into the relevant issue. We are happy to help.

 Together we can make Codex CLI an incredible tool. **Happy hacking!** :rocket:

@@ -512,22 +512,21 @@ Together we can make Codex CLI an incredible tool. **Happy hacking!** :rocket:
 All contributors **must** accept the CLA. The process is lightweight:

 1. Open your pull request.
-2. Paste the following comment (or reply `recheck` if you’ve signed before):
+2. Paste the following comment (or reply `recheck` if you've signed before):

   ```text
   I have read the CLA Document and I hereby sign the CLA
   ```

-3. The CLA‑Assistant bot records your signature in the repo and marks the status check as passed.
+3. The CLA-Assistant bot records your signature in the repo and marks the status check as passed.

 No special Git commands, email attachments, or commit footers required.

 #### Quick fixes

-| Scenario          | Command                                                                                   |
-| ----------------- | ----------------------------------------------------------------------------------------- |
-| Amend last commit | `git commit --amend -s --no-edit && git push -f`                                          |
-| GitHub UI only    | Edit the commit message in the PR → add<br>`Signed-off-by: Your Name <email@example.com>` |
+| Scenario          | Command                                          |
+| ----------------- | ------------------------------------------------ |
+| Amend last commit | `git commit --amend -s --no-edit && git push -f` |

 The **DCO check** blocks merges until every commit in the PR carries the footer (with squash this is just the one).

@@ -548,12 +547,12 @@ To publish a new version of the CLI, run the release scripts defined in `codex-c

 ---

-## Security &amp; Responsible AI
+## Security & Responsible AI

-Have you discovered a vulnerability or have concerns about model output? Please e‑mail **security@openai.com** and we will respond promptly.
+Have you discovered a vulnerability or have concerns about model output? Please e-mail **security@openai.com** and we will respond promptly.

 ---

 ## License

-This repository is licensed under the [Apache-2.0 License](LICENSE).
+This repository is licensed under the [Apache-2.0 License](LICENSE).
--- a/scripts/asciicheck.py
+++ b/scripts/asciicheck.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+
+import argparse
+import sys
+from pathlib import Path
+
+"""
+Utility script that takes a list of files and returns non-zero if any of them
+contain non-ASCII characters other than those in the allowed list.
+
+If --fix is used, it will attempt to replace non-ASCII characters with ASCII
+equivalents.
+
+The motivation behind this script is that characters like U+00A0 (non-breaking
+space) can cause regexes not to match and can result in surprising anchor
+values for headings when GitHub renders Markdown as HTML.
+"""
+
+
+"""
+When --fix is used, perform the following substitutions.
+"""
+substitutions: dict[int, str] = {
+    0x00A0: " ",  # non-breaking space
+    0x2011: "-",  # non-breaking hyphen
+    0x2013: "-",  # en dash
+    0x2014: "-",  # em dash
+    0x2018: "'",  # left single quote
+    0x2019: "'",  # right single quote
+    0x201C: '"',  # left double quote
+    0x201D: '"',  # right double quote
+    0x2026: "...",  # ellipsis
+    0x202F: " ",  # narrow non-breaking space
+}
+
+"""
+Unicode codepoints that are allowed in addition to ASCII.
+Be conservative with this list.
+
+Note that it is always an option to use the hex HTML representation
+instead of the character itself so the source code is ASCII-only.
+For example, U+2728 (sparkles) can be written as `&#x2728;`.
+"""
+allowed_unicode_codepoints = {
+    0x2728,  # sparkles
+}
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Check for non-ASCII characters in files."
+    )
+    parser.add_argument(
+        "--fix",
+        action="store_true",
+        help="Rewrite files, replacing non-ASCII characters with ASCII equivalents, where possible.",
+    )
+    parser.add_argument(
+        "files",
+        nargs="+",
+        help="Files to check for non-ASCII characters.",
+    )
+    args = parser.parse_args()
+
+    has_errors = False
+    for filename in args.files:
+        path = Path(filename)
+        has_errors |= lint_utf8_ascii(path, fix=args.fix)
+    return 1 if has_errors else 0
+
+
+def lint_utf8_ascii(filename: Path, fix: bool) -> bool:
+    """Returns True if an error was printed."""
+    try:
+        with open(filename, "rb") as f:
+            raw = f.read()
+        text = raw.decode("utf-8")
+    except UnicodeDecodeError as e:
+        print("UTF-8 decoding error:")
+        print(f"  byte offset: {e.start}")
+        print(f"  reason: {e.reason}")
+        # Attempt to find line/column
+        partial = raw[: e.start]
+        line = partial.count(b"\n") + 1
+        col = e.start - (partial.rfind(b"\n") if b"\n" in partial else -1)
+        print(f"  location: line {line}, column {col}")
+        return True
+
+    errors = []
+    for lineno, line in enumerate(text.splitlines(keepends=True), 1):
+        for colno, char in enumerate(line, 1):
+            codepoint = ord(char)
+            if char == "\n":
+                continue
+            if (
+                not (0x20 <= codepoint <= 0x7E)
+                and codepoint not in allowed_unicode_codepoints
+            ):
+                errors.append((lineno, colno, char, codepoint))
+
+    if errors:
+        for lineno, colno, char, codepoint in errors:
+            safe_char = repr(char)[1:-1]  # nicely escape things like \u202f
+            print(
+                f"Invalid character at line {lineno}, column {colno}: U+{codepoint:04X} ({safe_char})"
+            )
+
+    if errors and fix:
+        print(f"Attempting to fix {filename}...")
+        num_replacements = 0
+        new_contents = ""
+        for char in text:
+            codepoint = ord(char)
+            if codepoint in substitutions:
+                num_replacements += 1
+                new_contents += substitutions[codepoint]
+            else:
+                new_contents += char
+        with open(filename, "w", encoding="utf-8") as f:
+            f.write(new_contents)
+        print(f"Fixed {num_replacements} of {len(errors)} errors in {filename}.")
+
+    return bool(errors)
+
+
+if __name__ == "__main__":
+    sys.exit(main())