103
codex-cli/examples/prompt-analyzer/template/README.md
Normal file
103
codex-cli/examples/prompt-analyzer/template/README.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Prompt‑Clustering Utility
|
||||
|
||||
This repository contains a small utility (`cluster_prompts.py`) that embeds a
|
||||
list of prompts with the OpenAI Embedding API, discovers natural groupings with
|
||||
unsupervised clustering, lets ChatGPT name & describe each cluster and finally
|
||||
produces a concise Markdown report plus a couple of diagnostic plots.
|
||||
|
||||
The default input file (`prompts.csv`) ships with the repo so you can try the
|
||||
script immediately, but you can of course point it at your own file.
|
||||
|
||||
---
|
||||
|
||||
## 1. Setup
|
||||
|
||||
1. Install the Python dependencies (preferably inside a virtual env):
|
||||
|
||||
```bash
|
||||
pip install pandas numpy scikit-learn matplotlib openai
|
||||
```
|
||||
|
||||
2. Export your OpenAI API key (**required**):
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="sk‑..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Basic usage
|
||||
|
||||
```bash
|
||||
# Minimal command – runs on prompts.csv and writes analysis.md + plots/
|
||||
python cluster_prompts.py
|
||||
```
|
||||
|
||||
This will
|
||||
|
||||
* create embeddings with the `text-embedding-3-small` model,
|
||||
* pick a suitable number *k* via silhouette score (K‑Means),
|
||||
* ask `gpt‑4o‑mini` to label & describe each cluster,
|
||||
* store the results in `analysis.md`,
|
||||
* and save two plots to `plots/` (`cluster_sizes.png` and `tsne.png`).
|
||||
|
||||
The script prints a short success message once done.
|
||||
|
||||
---
|
||||
|
||||
## 3. Command‑line options
|
||||
|
||||
| flag | default | description |
|
||||
|------|---------|-------------|
|
||||
| `--csv` | `prompts.csv` | path to the input CSV (must contain a `prompt` column; an `act` column is used as context if present) |
|
||||
| `--cache` | _(none)_ | embedding cache path (JSON). Speeds up repeated runs – new texts are appended automatically. |
|
||||
| `--cluster-method` | `kmeans` | `kmeans` (with automatic *k*) or `dbscan` |
|
||||
| `--k-max` | `10` | upper bound for *k* when `kmeans` is selected |
|
||||
| `--dbscan-min-samples` | `3` | min samples parameter for DBSCAN |
|
||||
| `--embedding-model` | `text-embedding-3-small` | any OpenAI embedding model |
|
||||
| `--chat-model` | `gpt-4o-mini` | chat model used to generate cluster names / descriptions |
|
||||
| `--output-md` | `analysis.md` | where to write the Markdown report |
|
||||
| `--plots-dir` | `plots` | directory for generated PNGs |
|
||||
|
||||
Example with customised options:
|
||||
|
||||
```bash
|
||||
python cluster_prompts.py \
|
||||
--csv my_prompts.csv \
|
||||
--cache .cache/embeddings.json \
|
||||
--cluster-method dbscan \
|
||||
--embedding-model text-embedding-3-large \
|
||||
--chat-model gpt-4o \
|
||||
--output-md my_analysis.md \
|
||||
--plots-dir my_plots
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Interpreting the output
|
||||
|
||||
### analysis.md
|
||||
|
||||
* Overview table: cluster label, generated name, member count and description.
|
||||
* Detailed section for every cluster with five representative example prompts.
|
||||
* Separate lists for
|
||||
* **Noise / outliers** (label `‑1` when DBSCAN is used) and
|
||||
* **Potentially ambiguous prompts** (only with K‑Means) – these are items that
|
||||
lie almost equally close to two centroids and might belong to multiple
|
||||
groups.
|
||||
|
||||
### plots/cluster_sizes.png
|
||||
|
||||
Quick bar‑chart visualisation of how many prompts ended up in each cluster.
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
* **Rate‑limits / quota errors** – lower the number of prompts per run or switch
|
||||
to a larger quota account.
|
||||
* **Authentication errors** – make sure `OPENAI_API_KEY` is exported in the
|
||||
shell where you run the script.
|
||||
* **Inadequate clusters** – try the other clustering method, adjust `--k-max`
|
||||
or tune DBSCAN parameters (`eps` range is inferred, `min_samples` exposed via
|
||||
CLI).
|
||||
Reference in New Issue
Block a user