codex-rs/core/src/openai_model_info.rs

use crate::model_family::ModelFamily;

/// Metadata about a model, particularly OpenAI models.
/// We may want to consider including details like the pricing for
/// input tokens, output tokens, etc., though users will need to be able to
/// override this in config.toml, as this information can get out of date.
/// Though this would help present more accurate pricing information in the UI.
#[derive(Debug)]
pub(crate) struct ModelInfo {
    /// Size of the context window in tokens.
    pub(crate) context_window: u64,

    /// Maximum number of output tokens that can be generated for the model.
    pub(crate) max_output_tokens: u64,
}

pub(crate) fn get_model_info(model_family: &ModelFamily) -> Option<ModelInfo> {
    let slug = model_family.slug.as_str();
    match slug {
        // OSS models have a 128k shared token pool.
        // Arbitrarily splitting it: 3/4 input context, 1/4 output.
        // https://openai.com/index/gpt-oss-model-card/
        "gpt-oss-20b" => Some(ModelInfo {
            context_window: 96_000,
            max_output_tokens: 32_000,
        }),
        "gpt-oss-120b" => Some(ModelInfo {
            context_window: 96_000,
            max_output_tokens: 32_000,
        }),
        // https://platform.openai.com/docs/models/o3
        "o3" => Some(ModelInfo {
            context_window: 200_000,
            max_output_tokens: 100_000,
        }),

        // https://platform.openai.com/docs/models/o4-mini
        "o4-mini" => Some(ModelInfo {
            context_window: 200_000,
            max_output_tokens: 100_000,
        }),

        // https://platform.openai.com/docs/models/codex-mini-latest
        "codex-mini-latest" => Some(ModelInfo {
            context_window: 200_000,
            max_output_tokens: 100_000,
        }),

        // As of Jun 25, 2025, gpt-4.1 defaults to gpt-4.1-2025-04-14.
        // https://platform.openai.com/docs/models/gpt-4.1
        "gpt-4.1" | "gpt-4.1-2025-04-14" => Some(ModelInfo {
            context_window: 1_047_576,
            max_output_tokens: 32_768,
        }),

        // As of Jun 25, 2025, gpt-4o defaults to gpt-4o-2024-08-06.
        // https://platform.openai.com/docs/models/gpt-4o
        "gpt-4o" | "gpt-4o-2024-08-06" => Some(ModelInfo {
            context_window: 128_000,
            max_output_tokens: 16_384,
        }),

        // https://platform.openai.com/docs/models/gpt-4o?snapshot=gpt-4o-2024-05-13
        "gpt-4o-2024-05-13" => Some(ModelInfo {
            context_window: 128_000,
            max_output_tokens: 4_096,
        }),

        // https://platform.openai.com/docs/models/gpt-4o?snapshot=gpt-4o-2024-11-20
        "gpt-4o-2024-11-20" => Some(ModelInfo {
            context_window: 128_000,
            max_output_tokens: 16_384,
        }),

        // https://platform.openai.com/docs/models/gpt-3.5-turbo
        "gpt-3.5-turbo" => Some(ModelInfo {
            context_window: 16_385,
            max_output_tokens: 4_096,
        }),

        "gpt-5" => Some(ModelInfo {
            context_window: 272_000,
            max_output_tokens: 128_000,
        }),

        _ if slug.starts_with("codex-") => Some(ModelInfo {
            context_window: 272_000,
            max_output_tokens: 128_000,
        }),

        _ => None,
    }
}
chore: introduce ModelFamily abstraction (#1838) To date, we have a number of hardcoded OpenAI model slug checks spread throughout the codebase, which makes it hard to audit the various special cases for each model. To mitigate this issue, this PR introduces the idea of a `ModelFamily` that has fields to represent the existing special cases, such as `supports_reasoning_summaries` and `uses_local_shell_tool`. There is a `find_family_for_model()` function that maps the raw model slug to a `ModelFamily`. This function hardcodes all the knowledge about the special attributes for each model. This PR then replaces the hardcoded model name checks with checks against a `ModelFamily`. Note `ModelFamily` is now available as `Config::model_family`. We should ultimately remove `Config::model` in favor of `Config::model_family::slug`. 2025-08-04 23:50:03 -07:00			`use crate::model_family::ModelFamily;`

feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242 2025-06-25 23:31:11 -07:00			`/// Metadata about a model, particularly OpenAI models.`
			`/// We may want to consider including details like the pricing for`
			`/// input tokens, output tokens, etc., though users will need to be able to`
			`/// override this in config.toml, as this information can get out of date.`
			`/// Though this would help present more accurate pricing information in the UI.`
			`#[derive(Debug)]`
			`pub(crate) struct ModelInfo {`
			`/// Size of the context window in tokens.`
			`pub(crate) context_window: u64,`

			`/// Maximum number of output tokens that can be generated for the model.`
			`pub(crate) max_output_tokens: u64,`
			`}`

chore: introduce ModelFamily abstraction (#1838) To date, we have a number of hardcoded OpenAI model slug checks spread throughout the codebase, which makes it hard to audit the various special cases for each model. To mitigate this issue, this PR introduces the idea of a `ModelFamily` that has fields to represent the existing special cases, such as `supports_reasoning_summaries` and `uses_local_shell_tool`. There is a `find_family_for_model()` function that maps the raw model slug to a `ModelFamily`. This function hardcodes all the knowledge about the special attributes for each model. This PR then replaces the hardcoded model name checks with checks against a `ModelFamily`. Note `ModelFamily` is now available as `Config::model_family`. We should ultimately remove `Config::model` in favor of `Config::model_family::slug`. 2025-08-04 23:50:03 -07:00			`pub(crate) fn get_model_info(model_family: &ModelFamily) -> Option<ModelInfo> {`
Enable reasoning for codex-prefixed models (#2275) ## Summary - enable reasoning for any model slug starting with `codex-` - provide default model info for `codex-` slugs - test that codex models are detected and support reasoning ## Testing - `just fmt` - `just fix` (fails: E0658 `let` expressions in this position are unstable)* - `cargo test --all-features` (fails: E0658 `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_689d13f8705483208a6ed21c076868e1 2025-08-13 17:02:50 -07:00			`let slug = model_family.slug.as_str();`
			`match slug {`
Add OSS model info (#1860) Add somewhat arbitrarily chosen context window/output limit. 2025-08-05 22:35:00 -07:00			`// OSS models have a 128k shared token pool.`
			`// Arbitrarily splitting it: 3/4 input context, 1/4 output.`
			`// https://openai.com/index/gpt-oss-model-card/`
			`"gpt-oss-20b" => Some(ModelInfo {`
			`context_window: 96_000,`
			`max_output_tokens: 32_000,`
			`}),`
			`"gpt-oss-120b" => Some(ModelInfo {`
			`context_window: 96_000,`
			`max_output_tokens: 32_000,`
			`}),`
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242 2025-06-25 23:31:11 -07:00			`// https://platform.openai.com/docs/models/o3`
			`"o3" => Some(ModelInfo {`
			`context_window: 200_000,`
			`max_output_tokens: 100_000,`
			`}),`

			`// https://platform.openai.com/docs/models/o4-mini`
			`"o4-mini" => Some(ModelInfo {`
			`context_window: 200_000,`
			`max_output_tokens: 100_000,`
			`}),`

			`// https://platform.openai.com/docs/models/codex-mini-latest`
			`"codex-mini-latest" => Some(ModelInfo {`
			`context_window: 200_000,`
			`max_output_tokens: 100_000,`
			`}),`

			`// As of Jun 25, 2025, gpt-4.1 defaults to gpt-4.1-2025-04-14.`
			`// https://platform.openai.com/docs/models/gpt-4.1`
			`"gpt-4.1" \| "gpt-4.1-2025-04-14" => Some(ModelInfo {`
			`context_window: 1_047_576,`
			`max_output_tokens: 32_768,`
			`}),`

			`// As of Jun 25, 2025, gpt-4o defaults to gpt-4o-2024-08-06.`
			`// https://platform.openai.com/docs/models/gpt-4o`
			`"gpt-4o" \| "gpt-4o-2024-08-06" => Some(ModelInfo {`
			`context_window: 128_000,`
			`max_output_tokens: 16_384,`
			`}),`

			`// https://platform.openai.com/docs/models/gpt-4o?snapshot=gpt-4o-2024-05-13`
			`"gpt-4o-2024-05-13" => Some(ModelInfo {`
			`context_window: 128_000,`
			`max_output_tokens: 4_096,`
			`}),`

			`// https://platform.openai.com/docs/models/gpt-4o?snapshot=gpt-4o-2024-11-20`
			`"gpt-4o-2024-11-20" => Some(ModelInfo {`
			`context_window: 128_000,`
			`max_output_tokens: 16_384,`
			`}),`

			`// https://platform.openai.com/docs/models/gpt-3.5-turbo`
			`"gpt-3.5-turbo" => Some(ModelInfo {`
			`context_window: 16_385,`
			`max_output_tokens: 4_096,`
			`}),`

Rename the model (#1942) 2025-08-07 09:07:51 -07:00			`"gpt-5" => Some(ModelInfo {`
Correctly calculate remaining context size (#3190) We had multiple issues with context size calculation: 1. `initial_prompt_tokens` calculation based on cache size is not reliable, cache misses might set it to much higher value. For now hardcoded to a safer constant. 2. Input context size for GPT-5 is 272k (that's where 33% came from). Fixes. 2025-09-04 16:34:14 -07:00			`context_window: 272_000,`
fix: update gpt-5 stats (#2649) - To match what's on <https://platform.openai.com/docs/models/gpt-5>. 2025-08-24 16:45:41 -07:00			`max_output_tokens: 128_000,`
Add 2025-08-06 model family (#1899) 2025-08-06 16:14:02 -07:00			`}),`

Enable reasoning for codex-prefixed models (#2275) ## Summary - enable reasoning for any model slug starting with `codex-` - provide default model info for `codex-` slugs - test that codex models are detected and support reasoning ## Testing - `just fmt` - `just fix` (fails: E0658 `let` expressions in this position are unstable)* - `cargo test --all-features` (fails: E0658 `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_689d13f8705483208a6ed21c076868e1 2025-08-13 17:02:50 -07:00			`_ if slug.starts_with("codex-") => Some(ModelInfo {`
Correctly calculate remaining context size (#3190) We had multiple issues with context size calculation: 1. `initial_prompt_tokens` calculation based on cache size is not reliable, cache misses might set it to much higher value. For now hardcoded to a safer constant. 2. Input context size for GPT-5 is 272k (that's where 33% came from). Fixes. 2025-09-04 16:34:14 -07:00			`context_window: 272_000,`
fix: update gpt-5 stats (#2649) - To match what's on <https://platform.openai.com/docs/models/gpt-5>. 2025-08-24 16:45:41 -07:00			`max_output_tokens: 128_000,`
Enable reasoning for codex-prefixed models (#2275) ## Summary - enable reasoning for any model slug starting with `codex-` - provide default model info for `codex-` slugs - test that codex models are detected and support reasoning ## Testing - `just fmt` - `just fix` (fails: E0658 `let` expressions in this position are unstable)* - `cargo test --all-features` (fails: E0658 `let` expressions in this position are unstable) ------ https://chatgpt.com/codex/tasks/task_i_689d13f8705483208a6ed21c076868e1 2025-08-13 17:02:50 -07:00			`}),`

feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242 2025-06-25 23:31:11 -07:00			`_ => None,`
			`}`
			`}`