2025-10-27 11:01:14 -07:00
mod storage ;
2025-09-02 18:36:19 -07:00
use chrono ::Utc ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
use reqwest ::StatusCode ;
2025-09-02 18:36:19 -07:00
use serde ::Deserialize ;
use serde ::Serialize ;
2025-10-20 08:50:54 -07:00
#[ cfg(test) ]
use serial_test ::serial ;
2025-09-02 18:36:19 -07:00
use std ::env ;
2025-10-27 11:01:14 -07:00
use std ::fmt ::Debug ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
use std ::io ::ErrorKind ;
2025-09-02 18:36:19 -07:00
use std ::path ::Path ;
use std ::path ::PathBuf ;
use std ::sync ::Arc ;
use std ::sync ::Mutex ;
use std ::time ::Duration ;
fix: remove mcp-types from app server protocol (#4537)
We continue the separation between `codex app-server` and `codex
mcp-server`.
In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.
Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.
We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.
Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
2025-09-30 19:16:26 -07:00
use codex_app_server_protocol ::AuthMode ;
2025-10-20 08:50:54 -07:00
use codex_protocol ::config_types ::ForcedLoginMethod ;
2025-09-02 18:36:19 -07:00
2025-10-27 11:01:14 -07:00
pub use crate ::auth ::storage ::AuthCredentialsStoreMode ;
pub use crate ::auth ::storage ::AuthDotJson ;
use crate ::auth ::storage ::AuthStorageBackend ;
use crate ::auth ::storage ::create_auth_storage ;
2025-10-20 08:50:54 -07:00
use crate ::config ::Config ;
2025-10-24 09:47:52 -07:00
use crate ::default_client ::CodexHttpClient ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
use crate ::error ::RefreshTokenFailedError ;
use crate ::error ::RefreshTokenFailedReason ;
2025-09-11 18:01:25 -04:00
use crate ::token_data ::PlanType ;
2025-09-02 18:36:19 -07:00
use crate ::token_data ::TokenData ;
use crate ::token_data ::parse_id_token ;
2025-10-28 18:55:53 -07:00
use crate ::util ::try_parse_error_message ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
use serde_json ::Value ;
use thiserror ::Error ;
2025-09-02 18:36:19 -07:00
#[ derive(Debug, Clone) ]
pub struct CodexAuth {
pub mode : AuthMode ,
pub ( crate ) api_key : Option < String > ,
pub ( crate ) auth_dot_json : Arc < Mutex < Option < AuthDotJson > > > ,
2025-10-27 11:01:14 -07:00
storage : Arc < dyn AuthStorageBackend > ,
2025-10-24 09:47:52 -07:00
pub ( crate ) client : CodexHttpClient ,
2025-09-02 18:36:19 -07:00
}
impl PartialEq for CodexAuth {
fn eq ( & self , other : & Self ) -> bool {
self . mode = = other . mode
}
}
2025-10-28 18:55:53 -07:00
// TODO(pakrym): use token exp field to check for expiration instead
const TOKEN_REFRESH_INTERVAL : i64 = 8 ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
const REFRESH_TOKEN_EXPIRED_MESSAGE : & str = " Your access token could not be refreshed because your refresh token has expired. Please log out and sign in again. " ;
const REFRESH_TOKEN_REUSED_MESSAGE : & str = " Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again. " ;
const REFRESH_TOKEN_INVALIDATED_MESSAGE : & str = " Your access token could not be refreshed because your refresh token was revoked. Please log out and sign in again. " ;
const REFRESH_TOKEN_UNKNOWN_MESSAGE : & str =
" Your access token could not be refreshed. Please log out and sign in again. " ;
const REFRESH_TOKEN_URL : & str = " https://auth.openai.com/oauth/token " ;
pub const REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR : & str = " CODEX_REFRESH_TOKEN_URL_OVERRIDE " ;
#[ derive(Debug, Error) ]
pub enum RefreshTokenError {
#[ error( " {0} " ) ]
Permanent ( #[ from ] RefreshTokenFailedError ) ,
#[ error(transparent) ]
Transient ( #[ from ] std ::io ::Error ) ,
}
impl RefreshTokenError {
pub fn failed_reason ( & self ) -> Option < RefreshTokenFailedReason > {
match self {
Self ::Permanent ( error ) = > Some ( error . reason ) ,
Self ::Transient ( _ ) = > None ,
}
}
fn other_with_message ( message : impl Into < String > ) -> Self {
Self ::Transient ( std ::io ::Error ::other ( message . into ( ) ) )
}
}
impl From < RefreshTokenError > for std ::io ::Error {
fn from ( err : RefreshTokenError ) -> Self {
match err {
RefreshTokenError ::Permanent ( failed ) = > std ::io ::Error ::other ( failed ) ,
RefreshTokenError ::Transient ( inner ) = > inner ,
}
}
}
2025-09-02 18:36:19 -07:00
impl CodexAuth {
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
pub async fn refresh_token ( & self ) -> Result < String , RefreshTokenError > {
2025-10-24 12:12:03 -07:00
tracing ::info! ( " Refreshing token " ) ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let token_data = self . get_current_token_data ( ) . ok_or_else ( | | {
RefreshTokenError ::Transient ( std ::io ::Error ::other ( " Token data is not available. " ) )
} ) ? ;
2025-09-02 18:36:19 -07:00
let token = token_data . refresh_token ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let refresh_response = try_refresh_token ( token , & self . client ) . await ? ;
2025-09-02 18:36:19 -07:00
let updated = update_tokens (
2025-10-27 11:01:14 -07:00
& self . storage ,
2025-09-02 18:36:19 -07:00
refresh_response . id_token ,
refresh_response . access_token ,
refresh_response . refresh_token ,
)
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
. await
. map_err ( RefreshTokenError ::from ) ? ;
2025-09-02 18:36:19 -07:00
if let Ok ( mut auth_lock ) = self . auth_dot_json . lock ( ) {
* auth_lock = Some ( updated . clone ( ) ) ;
}
let access = match updated . tokens {
Some ( t ) = > t . access_token ,
None = > {
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
return Err ( RefreshTokenError ::other_with_message (
2025-09-02 18:36:19 -07:00
" Token data is not available after refresh. " ,
) ) ;
}
} ;
Ok ( access )
}
2025-10-27 11:01:14 -07:00
/// Loads the available auth information from auth storage.
2025-10-27 19:41:49 -07:00
pub fn from_auth_storage (
codex_home : & Path ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < Option < CodexAuth > > {
load_auth ( codex_home , false , auth_credentials_store_mode )
2025-09-02 18:36:19 -07:00
}
pub async fn get_token_data ( & self ) -> Result < TokenData , std ::io ::Error > {
let auth_dot_json : Option < AuthDotJson > = self . get_current_auth_json ( ) ;
match auth_dot_json {
Some ( AuthDotJson {
tokens : Some ( mut tokens ) ,
last_refresh : Some ( last_refresh ) ,
..
} ) = > {
2025-10-28 18:55:53 -07:00
if last_refresh < Utc ::now ( ) - chrono ::Duration ::days ( TOKEN_REFRESH_INTERVAL ) {
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let refresh_result = tokio ::time ::timeout (
2025-09-02 18:36:19 -07:00
Duration ::from_secs ( 60 ) ,
2025-09-03 10:11:02 -07:00
try_refresh_token ( tokens . refresh_token . clone ( ) , & self . client ) ,
2025-09-02 18:36:19 -07:00
)
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
. await ;
let refresh_response = match refresh_result {
Ok ( Ok ( response ) ) = > response ,
Ok ( Err ( err ) ) = > return Err ( err . into ( ) ) ,
Err ( _ ) = > {
return Err ( std ::io ::Error ::new (
ErrorKind ::TimedOut ,
" timed out while refreshing OpenAI API key " ,
) ) ;
}
} ;
2025-09-02 18:36:19 -07:00
let updated_auth_dot_json = update_tokens (
2025-10-27 11:01:14 -07:00
& self . storage ,
2025-09-02 18:36:19 -07:00
refresh_response . id_token ,
refresh_response . access_token ,
refresh_response . refresh_token ,
)
. await ? ;
tokens = updated_auth_dot_json
. tokens
. clone ( )
. ok_or ( std ::io ::Error ::other (
" Token data is not available after refresh. " ,
) ) ? ;
#[ expect(clippy::unwrap_used) ]
let mut auth_lock = self . auth_dot_json . lock ( ) . unwrap ( ) ;
* auth_lock = Some ( updated_auth_dot_json ) ;
}
Ok ( tokens )
}
_ = > Err ( std ::io ::Error ::other ( " Token data is not available. " ) ) ,
}
}
pub async fn get_token ( & self ) -> Result < String , std ::io ::Error > {
match self . mode {
AuthMode ::ApiKey = > Ok ( self . api_key . clone ( ) . unwrap_or_default ( ) ) ,
AuthMode ::ChatGPT = > {
let id_token = self . get_token_data ( ) . await ? . access_token ;
Ok ( id_token )
}
}
}
pub fn get_account_id ( & self ) -> Option < String > {
2025-09-11 11:59:37 -07:00
self . get_current_token_data ( ) . and_then ( | t | t . account_id )
2025-09-02 18:36:19 -07:00
}
2025-10-15 17:53:33 -07:00
pub fn get_account_email ( & self ) -> Option < String > {
self . get_current_token_data ( ) . and_then ( | t | t . id_token . email )
}
2025-09-11 18:01:25 -04:00
pub ( crate ) fn get_plan_type ( & self ) -> Option < PlanType > {
2025-09-02 18:36:19 -07:00
self . get_current_token_data ( )
2025-09-11 18:01:25 -04:00
. and_then ( | t | t . id_token . chatgpt_plan_type )
2025-09-02 18:36:19 -07:00
}
fn get_current_auth_json ( & self ) -> Option < AuthDotJson > {
#[ expect(clippy::unwrap_used) ]
self . auth_dot_json . lock ( ) . unwrap ( ) . clone ( )
}
fn get_current_token_data ( & self ) -> Option < TokenData > {
2025-09-11 11:59:37 -07:00
self . get_current_auth_json ( ) . and_then ( | t | t . tokens )
2025-09-02 18:36:19 -07:00
}
/// Consider this private to integration tests.
pub fn create_dummy_chatgpt_auth_for_testing ( ) -> Self {
let auth_dot_json = AuthDotJson {
openai_api_key : None ,
tokens : Some ( TokenData {
id_token : Default ::default ( ) ,
access_token : " Access Token " . to_string ( ) ,
refresh_token : " test " . to_string ( ) ,
account_id : Some ( " account_id " . to_string ( ) ) ,
} ) ,
last_refresh : Some ( Utc ::now ( ) ) ,
} ;
let auth_dot_json = Arc ::new ( Mutex ::new ( Some ( auth_dot_json ) ) ) ;
Self {
api_key : None ,
mode : AuthMode ::ChatGPT ,
2025-10-27 11:01:14 -07:00
storage : create_auth_storage ( PathBuf ::new ( ) , AuthCredentialsStoreMode ::File ) ,
2025-09-02 18:36:19 -07:00
auth_dot_json ,
2025-09-09 14:23:23 -07:00
client : crate ::default_client ::create_client ( ) ,
2025-09-02 18:36:19 -07:00
}
}
2025-09-03 10:11:02 -07:00
2025-10-24 09:47:52 -07:00
fn from_api_key_with_client ( api_key : & str , client : CodexHttpClient ) -> Self {
2025-09-03 10:11:02 -07:00
Self {
api_key : Some ( api_key . to_owned ( ) ) ,
mode : AuthMode ::ApiKey ,
2025-10-27 11:01:14 -07:00
storage : create_auth_storage ( PathBuf ::new ( ) , AuthCredentialsStoreMode ::File ) ,
2025-09-03 10:11:02 -07:00
auth_dot_json : Arc ::new ( Mutex ::new ( None ) ) ,
client ,
}
}
pub fn from_api_key ( api_key : & str ) -> Self {
2025-09-09 14:23:23 -07:00
Self ::from_api_key_with_client ( api_key , crate ::default_client ::create_client ( ) )
2025-09-03 10:11:02 -07:00
}
2025-09-02 18:36:19 -07:00
}
pub const OPENAI_API_KEY_ENV_VAR : & str = " OPENAI_API_KEY " ;
2025-10-02 09:59:45 -07:00
pub const CODEX_API_KEY_ENV_VAR : & str = " CODEX_API_KEY " ;
2025-09-02 18:36:19 -07:00
2025-09-11 09:16:34 -07:00
pub fn read_openai_api_key_from_env ( ) -> Option < String > {
2025-09-02 18:36:19 -07:00
env ::var ( OPENAI_API_KEY_ENV_VAR )
. ok ( )
2025-09-11 09:16:34 -07:00
. map ( | value | value . trim ( ) . to_string ( ) )
. filter ( | value | ! value . is_empty ( ) )
2025-09-02 18:36:19 -07:00
}
2025-10-02 09:59:45 -07:00
pub fn read_codex_api_key_from_env ( ) -> Option < String > {
env ::var ( CODEX_API_KEY_ENV_VAR )
. ok ( )
. map ( | value | value . trim ( ) . to_string ( ) )
. filter ( | value | ! value . is_empty ( ) )
}
2025-09-02 18:36:19 -07:00
/// Delete the auth.json file inside `codex_home` if it exists. Returns `Ok(true)`
/// if a file was removed, `Ok(false)` if no auth file was present.
2025-10-27 19:41:49 -07:00
pub fn logout (
codex_home : & Path ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < bool > {
let storage = create_auth_storage ( codex_home . to_path_buf ( ) , auth_credentials_store_mode ) ;
2025-10-27 11:01:14 -07:00
storage . delete ( )
2025-09-02 18:36:19 -07:00
}
2025-09-11 09:16:34 -07:00
/// Writes an `auth.json` that contains only the API key.
2025-10-27 19:41:49 -07:00
pub fn login_with_api_key (
codex_home : & Path ,
api_key : & str ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < ( ) > {
2025-09-02 18:36:19 -07:00
let auth_dot_json = AuthDotJson {
openai_api_key : Some ( api_key . to_string ( ) ) ,
tokens : None ,
last_refresh : None ,
} ;
2025-10-27 19:41:49 -07:00
save_auth ( codex_home , & auth_dot_json , auth_credentials_store_mode )
2025-10-27 11:01:14 -07:00
}
/// Persist the provided auth payload using the specified backend.
2025-10-27 19:41:49 -07:00
pub fn save_auth (
codex_home : & Path ,
auth : & AuthDotJson ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < ( ) > {
let storage = create_auth_storage ( codex_home . to_path_buf ( ) , auth_credentials_store_mode ) ;
2025-10-27 11:01:14 -07:00
storage . save ( auth )
}
/// Load CLI auth data using the configured credential store backend.
/// Returns `None` when no credentials are stored.
2025-10-27 19:41:49 -07:00
pub fn load_auth_dot_json (
codex_home : & Path ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < Option < AuthDotJson > > {
let storage = create_auth_storage ( codex_home . to_path_buf ( ) , auth_credentials_store_mode ) ;
2025-10-27 11:01:14 -07:00
storage . load ( )
2025-09-02 18:36:19 -07:00
}
2025-10-20 08:50:54 -07:00
pub async fn enforce_login_restrictions ( config : & Config ) -> std ::io ::Result < ( ) > {
2025-10-27 19:41:49 -07:00
let Some ( auth ) = load_auth (
& config . codex_home ,
true ,
config . cli_auth_credentials_store_mode ,
) ?
else {
2025-10-20 08:50:54 -07:00
return Ok ( ( ) ) ;
} ;
if let Some ( required_method ) = config . forced_login_method {
let method_violation = match ( required_method , auth . mode ) {
( ForcedLoginMethod ::Api , AuthMode ::ApiKey ) = > None ,
( ForcedLoginMethod ::Chatgpt , AuthMode ::ChatGPT ) = > None ,
( ForcedLoginMethod ::Api , AuthMode ::ChatGPT ) = > Some (
" API key login is required, but ChatGPT is currently being used. Logging out. "
. to_string ( ) ,
) ,
( ForcedLoginMethod ::Chatgpt , AuthMode ::ApiKey ) = > Some (
" ChatGPT login is required, but an API key is currently being used. Logging out. "
. to_string ( ) ,
) ,
} ;
if let Some ( message ) = method_violation {
2025-10-27 19:41:49 -07:00
return logout_with_message (
& config . codex_home ,
message ,
config . cli_auth_credentials_store_mode ,
) ;
2025-10-20 08:50:54 -07:00
}
}
if let Some ( expected_account_id ) = config . forced_chatgpt_workspace_id . as_deref ( ) {
if auth . mode ! = AuthMode ::ChatGPT {
return Ok ( ( ) ) ;
}
let token_data = match auth . get_token_data ( ) . await {
Ok ( data ) = > data ,
Err ( err ) = > {
return logout_with_message (
& config . codex_home ,
format! (
" Failed to load ChatGPT credentials while enforcing workspace restrictions: {err}. Logging out. "
) ,
2025-10-27 19:41:49 -07:00
config . cli_auth_credentials_store_mode ,
2025-10-20 08:50:54 -07:00
) ;
}
} ;
// workspace is the external identifier for account id.
let chatgpt_account_id = token_data . id_token . chatgpt_account_id . as_deref ( ) ;
if chatgpt_account_id ! = Some ( expected_account_id ) {
let message = match chatgpt_account_id {
Some ( actual ) = > format! (
" Login is restricted to workspace {expected_account_id}, but current credentials belong to {actual}. Logging out. "
) ,
None = > format! (
" Login is restricted to workspace {expected_account_id}, but current credentials lack a workspace identifier. Logging out. "
) ,
} ;
2025-10-27 19:41:49 -07:00
return logout_with_message (
& config . codex_home ,
message ,
config . cli_auth_credentials_store_mode ,
) ;
2025-10-20 08:50:54 -07:00
}
}
Ok ( ( ) )
}
2025-10-27 19:41:49 -07:00
fn logout_with_message (
codex_home : & Path ,
message : String ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> std ::io ::Result < ( ) > {
match logout ( codex_home , auth_credentials_store_mode ) {
2025-10-20 08:50:54 -07:00
Ok ( _ ) = > Err ( std ::io ::Error ::other ( message ) ) ,
Err ( err ) = > Err ( std ::io ::Error ::other ( format! (
" {message}. Failed to remove auth.json: {err} "
) ) ) ,
}
}
2025-10-02 09:59:45 -07:00
fn load_auth (
codex_home : & Path ,
enable_codex_api_key_env : bool ,
2025-10-27 19:41:49 -07:00
auth_credentials_store_mode : AuthCredentialsStoreMode ,
2025-10-02 09:59:45 -07:00
) -> std ::io ::Result < Option < CodexAuth > > {
if enable_codex_api_key_env & & let Some ( api_key ) = read_codex_api_key_from_env ( ) {
let client = crate ::default_client ::create_client ( ) ;
return Ok ( Some ( CodexAuth ::from_api_key_with_client (
api_key . as_str ( ) ,
client ,
) ) ) ;
}
2025-10-27 19:41:49 -07:00
let storage = create_auth_storage ( codex_home . to_path_buf ( ) , auth_credentials_store_mode ) ;
2025-10-27 11:01:14 -07:00
2025-09-09 14:23:23 -07:00
let client = crate ::default_client ::create_client ( ) ;
2025-10-27 11:01:14 -07:00
let auth_dot_json = match storage . load ( ) ? {
Some ( auth ) = > auth ,
None = > return Ok ( None ) ,
2025-09-02 18:36:19 -07:00
} ;
let AuthDotJson {
openai_api_key : auth_json_api_key ,
tokens ,
last_refresh ,
} = auth_dot_json ;
2025-09-11 09:16:34 -07:00
// Prefer AuthMode.ApiKey if it's set in the auth.json.
2025-09-02 18:36:19 -07:00
if let Some ( api_key ) = & auth_json_api_key {
2025-09-11 09:16:34 -07:00
return Ok ( Some ( CodexAuth ::from_api_key_with_client ( api_key , client ) ) ) ;
2025-09-02 18:36:19 -07:00
}
Ok ( Some ( CodexAuth {
api_key : None ,
mode : AuthMode ::ChatGPT ,
2025-10-27 11:01:14 -07:00
storage : storage . clone ( ) ,
2025-09-02 18:36:19 -07:00
auth_dot_json : Arc ::new ( Mutex ::new ( Some ( AuthDotJson {
openai_api_key : None ,
tokens ,
last_refresh ,
} ) ) ) ,
2025-09-03 10:11:02 -07:00
client ,
2025-09-02 18:36:19 -07:00
} ) )
}
async fn update_tokens (
2025-10-27 11:01:14 -07:00
storage : & Arc < dyn AuthStorageBackend > ,
2025-10-27 12:09:53 -05:00
id_token : Option < String > ,
2025-09-02 18:36:19 -07:00
access_token : Option < String > ,
refresh_token : Option < String > ,
) -> std ::io ::Result < AuthDotJson > {
2025-10-27 11:01:14 -07:00
let mut auth_dot_json = storage
. load ( ) ?
. ok_or ( std ::io ::Error ::other ( " Token data is not available. " ) ) ? ;
2025-09-02 18:36:19 -07:00
let tokens = auth_dot_json . tokens . get_or_insert_with ( TokenData ::default ) ;
2025-10-27 12:09:53 -05:00
if let Some ( id_token ) = id_token {
tokens . id_token = parse_id_token ( & id_token ) . map_err ( std ::io ::Error ::other ) ? ;
}
2025-09-02 18:36:19 -07:00
if let Some ( access_token ) = access_token {
2025-09-11 11:59:37 -07:00
tokens . access_token = access_token ;
2025-09-02 18:36:19 -07:00
}
if let Some ( refresh_token ) = refresh_token {
2025-09-11 11:59:37 -07:00
tokens . refresh_token = refresh_token ;
2025-09-02 18:36:19 -07:00
}
auth_dot_json . last_refresh = Some ( Utc ::now ( ) ) ;
2025-10-27 11:01:14 -07:00
storage . save ( & auth_dot_json ) ? ;
2025-09-02 18:36:19 -07:00
Ok ( auth_dot_json )
}
2025-09-03 10:11:02 -07:00
async fn try_refresh_token (
refresh_token : String ,
2025-10-24 09:47:52 -07:00
client : & CodexHttpClient ,
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
) -> Result < RefreshResponse , RefreshTokenError > {
2025-09-02 18:36:19 -07:00
let refresh_request = RefreshRequest {
client_id : CLIENT_ID ,
grant_type : " refresh_token " ,
refresh_token ,
scope : " openid profile email " ,
} ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let endpoint = refresh_token_endpoint ( ) ;
2025-09-03 10:11:02 -07:00
// Use shared client factory to include standard headers
2025-09-02 18:36:19 -07:00
let response = client
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
. post ( endpoint . as_str ( ) )
2025-09-02 18:36:19 -07:00
. header ( " Content-Type " , " application/json " )
. json ( & refresh_request )
. send ( )
. await
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
. map_err ( | err | RefreshTokenError ::Transient ( std ::io ::Error ::other ( err ) ) ) ? ;
2025-09-02 18:36:19 -07:00
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let status = response . status ( ) ;
if status . is_success ( ) {
2025-09-02 18:36:19 -07:00
let refresh_response = response
. json ::< RefreshResponse > ( )
. await
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
. map_err ( | err | RefreshTokenError ::Transient ( std ::io ::Error ::other ( err ) ) ) ? ;
2025-09-02 18:36:19 -07:00
Ok ( refresh_response )
} else {
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
let body = response . text ( ) . await . unwrap_or_default ( ) ;
if status = = StatusCode ::UNAUTHORIZED {
let failed = classify_refresh_token_failure ( & body ) ;
Err ( RefreshTokenError ::Permanent ( failed ) )
} else {
let message = try_parse_error_message ( & body ) ;
Err ( RefreshTokenError ::Transient ( std ::io ::Error ::other (
format! ( " Failed to refresh token: {status} : {message} " ) ,
) ) )
}
2025-09-02 18:36:19 -07:00
}
}
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
fn classify_refresh_token_failure ( body : & str ) -> RefreshTokenFailedError {
let code = extract_refresh_token_error_code ( body ) ;
let normalized_code = code . as_deref ( ) . map ( str ::to_ascii_lowercase ) ;
let reason = match normalized_code . as_deref ( ) {
Some ( " refresh_token_expired " ) = > RefreshTokenFailedReason ::Expired ,
Some ( " refresh_token_reused " ) = > RefreshTokenFailedReason ::Exhausted ,
Some ( " refresh_token_invalidated " ) = > RefreshTokenFailedReason ::Revoked ,
_ = > RefreshTokenFailedReason ::Other ,
} ;
if reason = = RefreshTokenFailedReason ::Other {
tracing ::warn! (
backend_code = normalized_code . as_deref ( ) ,
backend_body = body ,
" Encountered unknown 401 response while refreshing token "
) ;
}
let message = match reason {
RefreshTokenFailedReason ::Expired = > REFRESH_TOKEN_EXPIRED_MESSAGE . to_string ( ) ,
RefreshTokenFailedReason ::Exhausted = > REFRESH_TOKEN_REUSED_MESSAGE . to_string ( ) ,
RefreshTokenFailedReason ::Revoked = > REFRESH_TOKEN_INVALIDATED_MESSAGE . to_string ( ) ,
RefreshTokenFailedReason ::Other = > REFRESH_TOKEN_UNKNOWN_MESSAGE . to_string ( ) ,
} ;
RefreshTokenFailedError ::new ( reason , message )
}
fn extract_refresh_token_error_code ( body : & str ) -> Option < String > {
if body . trim ( ) . is_empty ( ) {
return None ;
}
let Value ::Object ( map ) = serde_json ::from_str ::< Value > ( body ) . ok ( ) ? else {
return None ;
} ;
if let Some ( error_value ) = map . get ( " error " ) {
match error_value {
Value ::Object ( obj ) = > {
if let Some ( code ) = obj . get ( " code " ) . and_then ( Value ::as_str ) {
return Some ( code . to_string ( ) ) ;
}
}
Value ::String ( code ) = > {
return Some ( code . to_string ( ) ) ;
}
_ = > { }
}
}
map . get ( " code " ) . and_then ( Value ::as_str ) . map ( str ::to_string )
}
2025-09-02 18:36:19 -07:00
#[ derive(Serialize) ]
struct RefreshRequest {
client_id : & 'static str ,
grant_type : & 'static str ,
refresh_token : String ,
scope : & 'static str ,
}
#[ derive(Deserialize, Clone) ]
struct RefreshResponse {
2025-10-27 12:09:53 -05:00
id_token : Option < String > ,
2025-09-02 18:36:19 -07:00
access_token : Option < String > ,
refresh_token : Option < String > ,
}
// Shared constant for token refresh (client id used for oauth token refresh flow)
pub const CLIENT_ID : & str = " app_EMoamEEZ73f0CkXaXp7hrann " ;
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
fn refresh_token_endpoint ( ) -> String {
std ::env ::var ( REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR )
. unwrap_or_else ( | _ | REFRESH_TOKEN_URL . to_string ( ) )
}
2025-09-02 18:36:19 -07:00
use std ::sync ::RwLock ;
/// Internal cached auth state.
#[ derive(Clone, Debug) ]
struct CachedAuth {
auth : Option < CodexAuth > ,
}
#[ cfg(test) ]
mod tests {
use super ::* ;
2025-10-27 11:01:14 -07:00
use crate ::auth ::storage ::FileAuthStorage ;
use crate ::auth ::storage ::get_auth_file ;
2025-10-20 08:50:54 -07:00
use crate ::config ::Config ;
use crate ::config ::ConfigOverrides ;
use crate ::config ::ConfigToml ;
2025-09-02 18:36:19 -07:00
use crate ::token_data ::IdTokenInfo ;
use crate ::token_data ::KnownPlan ;
use crate ::token_data ::PlanType ;
2025-10-27 11:01:14 -07:00
2025-09-02 18:36:19 -07:00
use base64 ::Engine ;
2025-10-20 08:50:54 -07:00
use codex_protocol ::config_types ::ForcedLoginMethod ;
2025-09-02 18:36:19 -07:00
use pretty_assertions ::assert_eq ;
use serde ::Serialize ;
use serde_json ::json ;
use tempfile ::tempdir ;
2025-10-27 12:09:53 -05:00
#[ tokio::test ]
async fn refresh_without_id_token ( ) {
let codex_home = tempdir ( ) . unwrap ( ) ;
let fake_jwt = write_auth_file (
AuthFileParams {
openai_api_key : None ,
chatgpt_plan_type : " pro " . to_string ( ) ,
chatgpt_account_id : None ,
} ,
codex_home . path ( ) ,
)
. expect ( " failed to write auth file " ) ;
2025-10-27 11:01:14 -07:00
let storage = create_auth_storage (
codex_home . path ( ) . to_path_buf ( ) ,
AuthCredentialsStoreMode ::File ,
) ;
2025-10-27 12:09:53 -05:00
let updated = super ::update_tokens (
2025-10-27 11:01:14 -07:00
& storage ,
2025-10-27 12:09:53 -05:00
None ,
Some ( " new-access-token " . to_string ( ) ) ,
Some ( " new-refresh-token " . to_string ( ) ) ,
)
. await
. expect ( " update_tokens should succeed " ) ;
let tokens = updated . tokens . expect ( " tokens should exist " ) ;
assert_eq! ( tokens . id_token . raw_jwt , fake_jwt ) ;
assert_eq! ( tokens . access_token , " new-access-token " ) ;
assert_eq! ( tokens . refresh_token , " new-refresh-token " ) ;
}
2025-09-14 19:48:18 -07:00
#[ test ]
fn login_with_api_key_overwrites_existing_auth_json ( ) {
let dir = tempdir ( ) . unwrap ( ) ;
let auth_path = dir . path ( ) . join ( " auth.json " ) ;
let stale_auth = json! ( {
" OPENAI_API_KEY " : " sk-old " ,
" tokens " : {
" id_token " : " stale.header.payload " ,
" access_token " : " stale-access " ,
" refresh_token " : " stale-refresh " ,
" account_id " : " stale-acc "
}
} ) ;
std ::fs ::write (
& auth_path ,
serde_json ::to_string_pretty ( & stale_auth ) . unwrap ( ) ,
)
. unwrap ( ) ;
2025-10-27 19:41:49 -07:00
super ::login_with_api_key ( dir . path ( ) , " sk-new " , AuthCredentialsStoreMode ::File )
. expect ( " login_with_api_key should succeed " ) ;
2025-09-14 19:48:18 -07:00
2025-10-27 11:01:14 -07:00
let storage = FileAuthStorage ::new ( dir . path ( ) . to_path_buf ( ) ) ;
let auth = storage
. try_read_auth_json ( & auth_path )
. expect ( " auth.json should parse " ) ;
2025-09-14 19:48:18 -07:00
assert_eq! ( auth . openai_api_key . as_deref ( ) , Some ( " sk-new " ) ) ;
assert! ( auth . tokens . is_none ( ) , " tokens should be cleared " ) ;
}
2025-10-20 08:50:54 -07:00
#[ test ]
fn missing_auth_json_returns_none ( ) {
let dir = tempdir ( ) . unwrap ( ) ;
2025-10-27 19:41:49 -07:00
let auth = CodexAuth ::from_auth_storage ( dir . path ( ) , AuthCredentialsStoreMode ::File )
. expect ( " call should succeed " ) ;
2025-10-20 08:50:54 -07:00
assert_eq! ( auth , None ) ;
}
2025-09-02 18:36:19 -07:00
#[ tokio::test ]
2025-10-21 09:08:34 -07:00
#[ serial(codex_api_key) ]
2025-09-02 18:36:19 -07:00
async fn pro_account_with_no_api_key_uses_chatgpt_auth ( ) {
let codex_home = tempdir ( ) . unwrap ( ) ;
let fake_jwt = write_auth_file (
AuthFileParams {
openai_api_key : None ,
chatgpt_plan_type : " pro " . to_string ( ) ,
2025-10-20 08:50:54 -07:00
chatgpt_account_id : None ,
2025-09-02 18:36:19 -07:00
} ,
codex_home . path ( ) ,
)
. expect ( " failed to write auth file " ) ;
let CodexAuth {
api_key ,
mode ,
auth_dot_json ,
2025-10-27 11:01:14 -07:00
storage : _ ,
2025-09-03 10:11:02 -07:00
..
2025-10-27 19:41:49 -07:00
} = super ::load_auth ( codex_home . path ( ) , false , AuthCredentialsStoreMode ::File )
. unwrap ( )
. unwrap ( ) ;
2025-09-02 18:36:19 -07:00
assert_eq! ( None , api_key ) ;
assert_eq! ( AuthMode ::ChatGPT , mode ) ;
let guard = auth_dot_json . lock ( ) . unwrap ( ) ;
let auth_dot_json = guard . as_ref ( ) . expect ( " AuthDotJson should exist " ) ;
2025-10-20 08:50:54 -07:00
let last_refresh = auth_dot_json
. last_refresh
. expect ( " last_refresh should be recorded " ) ;
2025-09-02 18:36:19 -07:00
assert_eq! (
& AuthDotJson {
openai_api_key : None ,
tokens : Some ( TokenData {
id_token : IdTokenInfo {
email : Some ( " user@example.com " . to_string ( ) ) ,
chatgpt_plan_type : Some ( PlanType ::Known ( KnownPlan ::Pro ) ) ,
2025-10-20 08:50:54 -07:00
chatgpt_account_id : None ,
2025-09-02 18:36:19 -07:00
raw_jwt : fake_jwt ,
} ,
access_token : " test-access-token " . to_string ( ) ,
refresh_token : " test-refresh-token " . to_string ( ) ,
account_id : None ,
} ) ,
2025-10-20 08:50:54 -07:00
last_refresh : Some ( last_refresh ) ,
2025-09-02 18:36:19 -07:00
} ,
auth_dot_json
2025-10-20 08:50:54 -07:00
) ;
2025-09-02 18:36:19 -07:00
}
#[ tokio::test ]
2025-10-21 09:08:34 -07:00
#[ serial(codex_api_key) ]
2025-09-02 18:36:19 -07:00
async fn loads_api_key_from_auth_json ( ) {
let dir = tempdir ( ) . unwrap ( ) ;
let auth_file = dir . path ( ) . join ( " auth.json " ) ;
std ::fs ::write (
auth_file ,
r # "{"OPENAI_API_KEY":"sk-test-key","tokens":null,"last_refresh":null}"# ,
)
. unwrap ( ) ;
2025-10-27 19:41:49 -07:00
let auth = super ::load_auth ( dir . path ( ) , false , AuthCredentialsStoreMode ::File )
. unwrap ( )
. unwrap ( ) ;
2025-09-02 18:36:19 -07:00
assert_eq! ( auth . mode , AuthMode ::ApiKey ) ;
assert_eq! ( auth . api_key , Some ( " sk-test-key " . to_string ( ) ) ) ;
assert! ( auth . get_token_data ( ) . await . is_err ( ) ) ;
}
#[ test ]
fn logout_removes_auth_file ( ) -> Result < ( ) , std ::io ::Error > {
let dir = tempdir ( ) ? ;
let auth_dot_json = AuthDotJson {
openai_api_key : Some ( " sk-test-key " . to_string ( ) ) ,
tokens : None ,
last_refresh : None ,
} ;
2025-10-27 19:41:49 -07:00
super ::save_auth ( dir . path ( ) , & auth_dot_json , AuthCredentialsStoreMode ::File ) ? ;
2025-10-27 11:01:14 -07:00
let auth_file = get_auth_file ( dir . path ( ) ) ;
assert! ( auth_file . exists ( ) ) ;
2025-10-27 19:41:49 -07:00
assert! ( logout ( dir . path ( ) , AuthCredentialsStoreMode ::File ) ? ) ;
2025-10-27 11:01:14 -07:00
assert! ( ! auth_file . exists ( ) ) ;
2025-09-02 18:36:19 -07:00
Ok ( ( ) )
}
struct AuthFileParams {
openai_api_key : Option < String > ,
chatgpt_plan_type : String ,
2025-10-20 08:50:54 -07:00
chatgpt_account_id : Option < String > ,
2025-09-02 18:36:19 -07:00
}
fn write_auth_file ( params : AuthFileParams , codex_home : & Path ) -> std ::io ::Result < String > {
let auth_file = get_auth_file ( codex_home ) ;
// Create a minimal valid JWT for the id_token field.
#[ derive(Serialize) ]
struct Header {
alg : & 'static str ,
typ : & 'static str ,
}
let header = Header {
alg : " none " ,
typ : " JWT " ,
} ;
2025-10-20 08:50:54 -07:00
let mut auth_payload = serde_json ::json! ( {
" chatgpt_plan_type " : params . chatgpt_plan_type ,
" chatgpt_user_id " : " user-12345 " ,
" user_id " : " user-12345 " ,
} ) ;
if let Some ( chatgpt_account_id ) = params . chatgpt_account_id {
let org_value = serde_json ::Value ::String ( chatgpt_account_id ) ;
auth_payload [ " chatgpt_account_id " ] = org_value ;
}
2025-09-02 18:36:19 -07:00
let payload = serde_json ::json! ( {
" email " : " user@example.com " ,
" email_verified " : true ,
2025-10-20 08:50:54 -07:00
" https://api.openai.com/auth " : auth_payload ,
2025-09-02 18:36:19 -07:00
} ) ;
let b64 = | b : & [ u8 ] | base64 ::engine ::general_purpose ::URL_SAFE_NO_PAD . encode ( b ) ;
let header_b64 = b64 ( & serde_json ::to_vec ( & header ) ? ) ;
let payload_b64 = b64 ( & serde_json ::to_vec ( & payload ) ? ) ;
let signature_b64 = b64 ( b " sig " ) ;
let fake_jwt = format! ( " {header_b64} . {payload_b64} . {signature_b64} " ) ;
let auth_json_data = json! ( {
" OPENAI_API_KEY " : params . openai_api_key ,
" tokens " : {
" id_token " : fake_jwt ,
" access_token " : " test-access-token " ,
" refresh_token " : " test-refresh-token "
} ,
2025-10-20 08:50:54 -07:00
" last_refresh " : Utc ::now ( ) ,
2025-09-02 18:36:19 -07:00
} ) ;
let auth_json = serde_json ::to_string_pretty ( & auth_json_data ) ? ;
std ::fs ::write ( auth_file , auth_json ) ? ;
Ok ( fake_jwt )
}
2025-10-20 08:50:54 -07:00
fn build_config (
codex_home : & Path ,
forced_login_method : Option < ForcedLoginMethod > ,
forced_chatgpt_workspace_id : Option < String > ,
) -> Config {
let mut config = Config ::load_from_base_config_with_overrides (
ConfigToml ::default ( ) ,
ConfigOverrides ::default ( ) ,
codex_home . to_path_buf ( ) ,
)
. expect ( " config should load " ) ;
config . forced_login_method = forced_login_method ;
config . forced_chatgpt_workspace_id = forced_chatgpt_workspace_id ;
config
}
/// Use sparingly.
/// TODO (gpeal): replace this with an injectable env var provider.
#[ cfg(test) ]
struct EnvVarGuard {
key : & 'static str ,
original : Option < std ::ffi ::OsString > ,
}
#[ cfg(test) ]
impl EnvVarGuard {
fn set ( key : & 'static str , value : & str ) -> Self {
let original = env ::var_os ( key ) ;
unsafe {
env ::set_var ( key , value ) ;
}
Self { key , original }
}
}
#[ cfg(test) ]
impl Drop for EnvVarGuard {
fn drop ( & mut self ) {
unsafe {
match & self . original {
Some ( value ) = > env ::set_var ( self . key , value ) ,
None = > env ::remove_var ( self . key ) ,
}
}
}
}
#[ tokio::test ]
async fn enforce_login_restrictions_logs_out_for_method_mismatch ( ) {
let codex_home = tempdir ( ) . unwrap ( ) ;
2025-10-27 19:41:49 -07:00
login_with_api_key ( codex_home . path ( ) , " sk-test " , AuthCredentialsStoreMode ::File )
. expect ( " seed api key " ) ;
2025-10-20 08:50:54 -07:00
let config = build_config ( codex_home . path ( ) , Some ( ForcedLoginMethod ::Chatgpt ) , None ) ;
let err = super ::enforce_login_restrictions ( & config )
. await
. expect_err ( " expected method mismatch to error " ) ;
assert! ( err . to_string ( ) . contains ( " ChatGPT login is required " ) ) ;
assert! (
! codex_home . path ( ) . join ( " auth.json " ) . exists ( ) ,
" auth.json should be removed on mismatch "
) ;
}
#[ tokio::test ]
2025-10-21 09:08:34 -07:00
#[ serial(codex_api_key) ]
2025-10-20 08:50:54 -07:00
async fn enforce_login_restrictions_logs_out_for_workspace_mismatch ( ) {
let codex_home = tempdir ( ) . unwrap ( ) ;
let _jwt = write_auth_file (
AuthFileParams {
openai_api_key : None ,
chatgpt_plan_type : " pro " . to_string ( ) ,
chatgpt_account_id : Some ( " org_another_org " . to_string ( ) ) ,
} ,
codex_home . path ( ) ,
)
. expect ( " failed to write auth file " ) ;
let config = build_config ( codex_home . path ( ) , None , Some ( " org_mine " . to_string ( ) ) ) ;
let err = super ::enforce_login_restrictions ( & config )
. await
. expect_err ( " expected workspace mismatch to error " ) ;
assert! ( err . to_string ( ) . contains ( " workspace org_mine " ) ) ;
assert! (
! codex_home . path ( ) . join ( " auth.json " ) . exists ( ) ,
" auth.json should be removed on mismatch "
) ;
}
#[ tokio::test ]
2025-10-21 09:08:34 -07:00
#[ serial(codex_api_key) ]
2025-10-20 08:50:54 -07:00
async fn enforce_login_restrictions_allows_matching_workspace ( ) {
let codex_home = tempdir ( ) . unwrap ( ) ;
let _jwt = write_auth_file (
AuthFileParams {
openai_api_key : None ,
chatgpt_plan_type : " pro " . to_string ( ) ,
chatgpt_account_id : Some ( " org_mine " . to_string ( ) ) ,
} ,
codex_home . path ( ) ,
)
. expect ( " failed to write auth file " ) ;
let config = build_config ( codex_home . path ( ) , None , Some ( " org_mine " . to_string ( ) ) ) ;
super ::enforce_login_restrictions ( & config )
. await
. expect ( " matching workspace should succeed " ) ;
assert! (
codex_home . path ( ) . join ( " auth.json " ) . exists ( ) ,
" auth.json should remain when restrictions pass "
) ;
}
#[ tokio::test ]
async fn enforce_login_restrictions_allows_api_key_if_login_method_not_set_but_forced_chatgpt_workspace_id_is_set ( )
{
let codex_home = tempdir ( ) . unwrap ( ) ;
2025-10-27 19:41:49 -07:00
login_with_api_key ( codex_home . path ( ) , " sk-test " , AuthCredentialsStoreMode ::File )
. expect ( " seed api key " ) ;
2025-10-20 08:50:54 -07:00
let config = build_config ( codex_home . path ( ) , None , Some ( " org_mine " . to_string ( ) ) ) ;
super ::enforce_login_restrictions ( & config )
. await
. expect ( " matching workspace should succeed " ) ;
assert! (
codex_home . path ( ) . join ( " auth.json " ) . exists ( ) ,
" auth.json should remain when restrictions pass "
) ;
}
#[ tokio::test ]
#[ serial(codex_api_key) ]
async fn enforce_login_restrictions_blocks_env_api_key_when_chatgpt_required ( ) {
let _guard = EnvVarGuard ::set ( CODEX_API_KEY_ENV_VAR , " sk-env " ) ;
let codex_home = tempdir ( ) . unwrap ( ) ;
let config = build_config ( codex_home . path ( ) , Some ( ForcedLoginMethod ::Chatgpt ) , None ) ;
let err = super ::enforce_login_restrictions ( & config )
. await
. expect_err ( " environment API key should not satisfy forced ChatGPT login " ) ;
assert! (
err . to_string ( )
. contains ( " ChatGPT login is required, but an API key is currently being used. " )
) ;
}
2025-09-02 18:36:19 -07:00
}
/// Central manager providing a single source of truth for auth.json derived
/// authentication data. It loads once (or on preference change) and then
/// hands out cloned `CodexAuth` values so the rest of the program has a
/// consistent snapshot.
///
/// External modifications to `auth.json` will NOT be observed until
/// `reload()` is called explicitly. This matches the design goal of avoiding
/// different parts of the program seeing inconsistent auth data mid‑ run.
#[ derive(Debug) ]
pub struct AuthManager {
codex_home : PathBuf ,
inner : RwLock < CachedAuth > ,
2025-10-02 09:59:45 -07:00
enable_codex_api_key_env : bool ,
2025-10-27 19:41:49 -07:00
auth_credentials_store_mode : AuthCredentialsStoreMode ,
2025-09-02 18:36:19 -07:00
}
impl AuthManager {
/// Create a new manager loading the initial auth using the provided
/// preferred auth method. Errors loading auth are swallowed; `auth()` will
/// simply return `None` in that case so callers can treat it as an
/// unauthenticated state.
2025-10-27 19:41:49 -07:00
pub fn new (
codex_home : PathBuf ,
enable_codex_api_key_env : bool ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> Self {
let auth = load_auth (
& codex_home ,
enable_codex_api_key_env ,
auth_credentials_store_mode ,
)
. ok ( )
. flatten ( ) ;
2025-09-02 18:36:19 -07:00
Self {
codex_home ,
2025-09-11 09:16:34 -07:00
inner : RwLock ::new ( CachedAuth { auth } ) ,
2025-10-02 09:59:45 -07:00
enable_codex_api_key_env ,
2025-10-27 19:41:49 -07:00
auth_credentials_store_mode ,
2025-09-02 18:36:19 -07:00
}
}
/// Create an AuthManager with a specific CodexAuth, for testing only.
pub fn from_auth_for_testing ( auth : CodexAuth ) -> Arc < Self > {
2025-09-11 09:16:34 -07:00
let cached = CachedAuth { auth : Some ( auth ) } ;
2025-09-02 18:36:19 -07:00
Arc ::new ( Self {
codex_home : PathBuf ::new ( ) ,
inner : RwLock ::new ( cached ) ,
2025-10-02 09:59:45 -07:00
enable_codex_api_key_env : false ,
2025-10-27 19:41:49 -07:00
auth_credentials_store_mode : AuthCredentialsStoreMode ::File ,
2025-09-02 18:36:19 -07:00
} )
}
/// Current cached auth (clone). May be `None` if not logged in or load failed.
pub fn auth ( & self ) -> Option < CodexAuth > {
self . inner . read ( ) . ok ( ) . and_then ( | c | c . auth . clone ( ) )
}
2025-09-11 09:16:34 -07:00
/// Force a reload of the auth information from auth.json. Returns
2025-09-02 18:36:19 -07:00
/// whether the auth value changed.
pub fn reload ( & self ) -> bool {
2025-10-27 19:41:49 -07:00
let new_auth = load_auth (
& self . codex_home ,
self . enable_codex_api_key_env ,
self . auth_credentials_store_mode ,
)
. ok ( )
. flatten ( ) ;
2025-09-02 18:36:19 -07:00
if let Ok ( mut guard ) = self . inner . write ( ) {
let changed = ! AuthManager ::auths_equal ( & guard . auth , & new_auth ) ;
guard . auth = new_auth ;
changed
} else {
false
}
}
fn auths_equal ( a : & Option < CodexAuth > , b : & Option < CodexAuth > ) -> bool {
match ( a , b ) {
( None , None ) = > true ,
( Some ( a ) , Some ( b ) ) = > a = = b ,
_ = > false ,
}
}
/// Convenience constructor returning an `Arc` wrapper.
2025-10-27 19:41:49 -07:00
pub fn shared (
codex_home : PathBuf ,
enable_codex_api_key_env : bool ,
auth_credentials_store_mode : AuthCredentialsStoreMode ,
) -> Arc < Self > {
Arc ::new ( Self ::new (
codex_home ,
enable_codex_api_key_env ,
auth_credentials_store_mode ,
) )
2025-09-02 18:36:19 -07:00
}
/// Attempt to refresh the current auth token (if any). On success, reload
/// the auth state from disk so other components observe refreshed token.
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
2025-11-05 12:51:57 -06:00
/// If the token refresh fails in a permanent (non‑ transient) way, logs out
/// to clear invalid auth state.
pub async fn refresh_token ( & self ) -> Result < Option < String > , RefreshTokenError > {
2025-09-02 18:36:19 -07:00
let auth = match self . auth ( ) {
Some ( a ) = > a ,
None = > return Ok ( None ) ,
} ;
match auth . refresh_token ( ) . await {
Ok ( token ) = > {
// Reload to pick up persisted changes.
self . reload ( ) ;
Ok ( Some ( token ) )
}
2025-10-24 12:12:03 -07:00
Err ( e ) = > {
tracing ::error! ( " Failed to refresh token: {} " , e ) ;
Err ( e )
}
2025-09-02 18:36:19 -07:00
}
}
/// Log out by deleting the on‑ disk auth.json (if present). Returns Ok(true)
/// if a file was removed, Ok(false) if no auth file existed. On success,
/// reloads the in‑ memory auth cache so callers immediately observe the
/// unauthenticated state.
pub fn logout ( & self ) -> std ::io ::Result < bool > {
2025-10-27 19:41:49 -07:00
let removed = super ::auth ::logout ( & self . codex_home , self . auth_credentials_store_mode ) ? ;
2025-09-02 18:36:19 -07:00
// Always reload to clear any cached auth (even if file absent).
self . reload ( ) ;
Ok ( removed )
}
}