[apply-patch] Clean up apply-patch tool definitions (#2539)

## Summary
We've experienced a bit of drift in system prompting for `apply_patch`:
- As pointed out in #2030 , our prettier formatting started altering
prompt.md in a few ways
- We introduced a separate markdown file for apply_patch instructions in
#993, but currently duplicate them in the prompt.md file
- We added a first-class apply_patch tool in #2303, which has yet
another definition

This PR starts to consolidate our logic in a few ways:
- We now only use
`apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea)
for system instructions
- We no longer include apply_patch system instructions if the tool is
specified

I'm leaving the definition in openai_tools.rs as duplicated text for now
because we're going to be iterated on the first-class tool soon.

## Testing
- [x] Added integration tests to verify prompt stability
- [x] Tested locally with several different models (gpt-5, gpt-oss,
o4-mini)
This commit is contained in:
Dylan
2025-08-21 20:07:41 -07:00
committed by GitHub
parent 9f71dcbf57
commit e4c275d615
6 changed files with 282 additions and 107 deletions

View File

@@ -1,3 +1,7 @@
/codex-cli/dist /codex-cli/dist
/codex-cli/node_modules /codex-cli/node_modules
pnpm-lock.yaml pnpm-lock.yaml
prompt.md
*_prompt.md
*_instructions.md

View File

@@ -1,17 +1,23 @@
To edit files, ALWAYS use the `shell` tool with `apply_patch` CLI. `apply_patch` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the `apply_patch` CLI, you should call the shell tool with the following structure: ## `apply_patch`
```bash Use the `apply_patch` shell command to edit files.
{"cmd": ["apply_patch", "<<'EOF'\\n*** Begin Patch\\n[YOUR_PATCH]\\n*** End Patch\\nEOF\\n"], "workdir": "..."} Your patch language is a strippeddown, fileoriented diff format designed to be easy to parse and safe to apply. You can think of it as a highlevel envelope:
```
Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format. *** Begin Patch
[ one or more file sections ]
*** End Patch
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete. Within that envelope, you get a sequence of file operations.
For each snippet of code that needs to be changed, repeat the following: You MUST include a header to specify the action you are taking.
[context_before] -> See below for further instructions on context. Each operation starts with one of three headers:
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign. *** Add File: <path> - create a new file. Every following line is a + line (the initial contents).
[context_after] -> See below for further instructions on context. *** Delete File: <path> - remove an existing file. Nothing follows.
*** Update File: <path> - patch an existing file in place (optionally with a rename).
May be immediately followed by *** Move to: <new path> if you want to rename the file.
Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header).
Within a hunk each line starts with:
For instructions on [context_before] and [context_after]: For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first changes [context_after] lines in the second changes [context_before] lines. - By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first changes [context_after] lines in the second changes [context_before] lines.
@@ -25,16 +31,45 @@ For instructions on [context_before] and [context_after]:
- If a code block is repeated so many times in a class or function such that even a single `@@` statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance: - If a code block is repeated so many times in a class or function such that even a single `@@` statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
@@ class BaseClass @@ class BaseClass
@@ def method(): @@ def method():
[3 lines of pre-context] [3 lines of pre-context]
- [old_code] - [old_code]
+ [new_code] + [new_code]
[3 lines of post-context] [3 lines of post-context]
Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below. The full grammar definition is below:
Patch := Begin { FileOp } End
Begin := "*** Begin Patch" NEWLINE
End := "*** End Patch" NEWLINE
FileOp := AddFile | DeleteFile | UpdateFile
AddFile := "*** Add File: " path NEWLINE { "+" line NEWLINE }
DeleteFile := "*** Delete File: " path NEWLINE
UpdateFile := "*** Update File: " path NEWLINE [ MoveTo ] { Hunk }
MoveTo := "*** Move to: " newPath NEWLINE
Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ]
HunkLine := (" " | "-" | "+") text NEWLINE
A full patch can combine several operations:
*** Begin Patch
*** Add File: hello.txt
+Hello world
*** Update File: src/app.py
*** Move to: src/main.py
@@ def greet():
-print("Hi")
+print("Hello, world!")
*** Delete File: obsolete.txt
*** End Patch
It is important to remember:
- You must include a header with your intended action (Add/Delete/Update)
- You must prefix new lines with `+` even when creating a new file
- File references can only be relative, NEVER ABSOLUTE.
You can invoke apply_patch like:
```bash
{"cmd": ["apply_patch", "<<'EOF'\\n*** Begin Patch\\n*** Update File: pygorithm/searching/binary_search.py\\n@@ class BaseClass\\n@@ def search():\\n- pass\\n+ raise NotImplementedError()\\n@@ class Subclass\\n@@ def search():\\n- pass\\n+ raise NotImplementedError()\\n*** End Patch\\nEOF\\n"], "workdir": "..."}
``` ```
shell {"command":["apply_patch","*** Begin Patch\n*** Add File: hello.txt\n+Hello, world!\n*** End Patch\n"]}
File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, it will always say "Done!", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the "Done!" is output. ```

View File

@@ -270,67 +270,6 @@ When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.) - When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
- Read files in chunks with a max chunk size of 250 lines. Do not use python scripts to attempt to output larger chunks of a file. Command line output will be truncated after 10 kilobytes or 256 lines of output, regardless of the command used. - Read files in chunks with a max chunk size of 250 lines. Do not use python scripts to attempt to output larger chunks of a file. Command line output will be truncated after 10 kilobytes or 256 lines of output, regardless of the command used.
## `apply_patch`
Your patch language is a strippeddown, fileoriented diff format designed to be easy to parse and safe to apply. You can think of it as a highlevel envelope:
**_ Begin Patch
[ one or more file sections ]
_** End Patch
Within that envelope, you get a sequence of file operations.
You MUST include a header to specify the action you are taking.
Each operation starts with one of three headers:
**_ Add File: <path> - create a new file. Every following line is a + line (the initial contents).
_** Delete File: <path> - remove an existing file. Nothing follows.
\*\*\* Update File: <path> - patch an existing file in place (optionally with a rename).
May be immediately followed by \*\*\* Move to: <new path> if you want to rename the file.
Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header).
Within a hunk each line starts with:
- for inserted text,
* for removed text, or
space ( ) for context.
At the end of a truncated hunk you can emit \*\*\* End of File.
Patch := Begin { FileOp } End
Begin := "**_ Begin Patch" NEWLINE
End := "_** End Patch" NEWLINE
FileOp := AddFile | DeleteFile | UpdateFile
AddFile := "**_ Add File: " path NEWLINE { "+" line NEWLINE }
DeleteFile := "_** Delete File: " path NEWLINE
UpdateFile := "**_ Update File: " path NEWLINE [ MoveTo ] { Hunk }
MoveTo := "_** Move to: " newPath NEWLINE
Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ]
HunkLine := (" " | "-" | "+") text NEWLINE
A full patch can combine several operations:
**_ Begin Patch
_** Add File: hello.txt
+Hello world
**_ Update File: src/app.py
_** Move to: src/main.py
@@ def greet():
-print("Hi")
+print("Hello, world!")
**_ Delete File: obsolete.txt
_** End Patch
It is important to remember:
- You must include a header with your intended action (Add/Delete/Update)
- You must prefix new lines with `+` even when creating a new file
You can invoke apply_patch like:
```
shell {"command":["apply_patch","*** Begin Patch\n*** Add File: hello.txt\n+Hello, world!\n*** End Patch\n"]}
```
## `update_plan` ## `update_plan`
A tool named `update_plan` is available to you. You can use it to keep an uptodate, stepbystep plan for the task. A tool named `update_plan` is available to you. You can use it to keep an uptodate, stepbystep plan for the task.

View File

@@ -47,7 +47,17 @@ impl Prompt {
.as_deref() .as_deref()
.unwrap_or(BASE_INSTRUCTIONS); .unwrap_or(BASE_INSTRUCTIONS);
let mut sections: Vec<&str> = vec![base]; let mut sections: Vec<&str> = vec![base];
if model.needs_special_apply_patch_instructions {
// When there are no custom instructions, add apply_patch if either:
// - the model needs special instructions, or
// - there is no apply_patch tool present
let is_apply_patch_tool_present = self
.tools
.iter()
.any(|t| matches!(t, OpenAiTool::Function(f) if f.name == "apply_patch"));
if self.base_instructions_override.is_none()
&& (model.needs_special_apply_patch_instructions || !is_apply_patch_tool_present)
{
sections.push(APPLY_PATCH_TOOL_INSTRUCTIONS); sections.push(APPLY_PATCH_TOOL_INSTRUCTIONS);
} }
Cow::Owned(sections.join("\n")) Cow::Owned(sections.join("\n"))

View File

@@ -175,7 +175,7 @@ fn create_shell_tool_for_sandbox(sandbox_policy: &SandboxPolicy) -> OpenAiTool {
properties.insert( properties.insert(
"justification".to_string(), "justification".to_string(),
JsonSchema::String { JsonSchema::String {
description: Some("Only set if ask_for_escalated_permissions is true. 1-sentence explanation of why we want to run this command.".to_string()), description: Some("Only set if with_escalated_permissions is true. 1-sentence explanation of why we want to run this command.".to_string()),
}, },
); );
} }
@@ -247,8 +247,8 @@ pub(crate) struct ApplyPatchToolArgs {
pub(crate) input: String, pub(crate) input: String,
} }
fn create_apply_patch_tool() -> OpenAiTool { /// Returns a JSON tool that can be used to edit files. Public for testing, please use `get_openai_tools`.
// Minimal schema: one required string argument containing the patch body fn create_apply_patch_json_tool() -> OpenAiTool {
let mut properties = BTreeMap::new(); let mut properties = BTreeMap::new();
properties.insert( properties.insert(
"input".to_string(), "input".to_string(),
@@ -259,59 +259,73 @@ fn create_apply_patch_tool() -> OpenAiTool {
OpenAiTool::Function(ResponsesApiTool { OpenAiTool::Function(ResponsesApiTool {
name: "apply_patch".to_string(), name: "apply_patch".to_string(),
description: r#"Use this tool to edit files. description: r#"Use the `apply_patch` tool to edit files.
Your patch language is a strippeddown, fileoriented diff format designed to be easy to parse and safe to apply. You can think of it as a highlevel envelope: Your patch language is a strippeddown, fileoriented diff format designed to be easy to parse and safe to apply. You can think of it as a highlevel envelope:
**_ Begin Patch *** Begin Patch
[ one or more file sections ] [ one or more file sections ]
_** End Patch *** End Patch
Within that envelope, you get a sequence of file operations. Within that envelope, you get a sequence of file operations.
You MUST include a header to specify the action you are taking. You MUST include a header to specify the action you are taking.
Each operation starts with one of three headers: Each operation starts with one of three headers:
**_ Add File: <path> - create a new file. Every following line is a + line (the initial contents). *** Add File: <path> - create a new file. Every following line is a + line (the initial contents).
_** Delete File: <path> - remove an existing file. Nothing follows. *** Delete File: <path> - remove an existing file. Nothing follows.
\*\*\* Update File: <path> - patch an existing file in place (optionally with a rename). *** Update File: <path> - patch an existing file in place (optionally with a rename).
May be immediately followed by \*\*\* Move to: <new path> if you want to rename the file. May be immediately followed by *** Move to: <new path> if you want to rename the file.
Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header). Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header).
Within a hunk each line starts with: Within a hunk each line starts with:
- for inserted text, For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first changes [context_after] lines in the second changes [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
* for removed text, or - If a code block is repeated so many times in a class or function such that even a single `@@` statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
space ( ) for context.
At the end of a truncated hunk you can emit \*\*\* End of File.
@@ class BaseClass
@@ def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
The full grammar definition is below:
Patch := Begin { FileOp } End Patch := Begin { FileOp } End
Begin := "**_ Begin Patch" NEWLINE Begin := "*** Begin Patch" NEWLINE
End := "_** End Patch" NEWLINE End := "*** End Patch" NEWLINE
FileOp := AddFile | DeleteFile | UpdateFile FileOp := AddFile | DeleteFile | UpdateFile
AddFile := "**_ Add File: " path NEWLINE { "+" line NEWLINE } AddFile := "*** Add File: " path NEWLINE { "+" line NEWLINE }
DeleteFile := "_** Delete File: " path NEWLINE DeleteFile := "*** Delete File: " path NEWLINE
UpdateFile := "**_ Update File: " path NEWLINE [ MoveTo ] { Hunk } UpdateFile := "*** Update File: " path NEWLINE [ MoveTo ] { Hunk }
MoveTo := "_** Move to: " newPath NEWLINE MoveTo := "*** Move to: " newPath NEWLINE
Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ] Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ]
HunkLine := (" " | "-" | "+") text NEWLINE HunkLine := (" " | "-" | "+") text NEWLINE
A full patch can combine several operations: A full patch can combine several operations:
**_ Begin Patch *** Begin Patch
_** Add File: hello.txt *** Add File: hello.txt
+Hello world +Hello world
**_ Update File: src/app.py *** Update File: src/app.py
_** Move to: src/main.py *** Move to: src/main.py
@@ def greet(): @@ def greet():
-print("Hi") -print("Hi")
+print("Hello, world!") +print("Hello, world!")
**_ Delete File: obsolete.txt *** Delete File: obsolete.txt
_** End Patch *** End Patch
It is important to remember: It is important to remember:
- You must include a header with your intended action (Add/Delete/Update) - You must include a header with your intended action (Add/Delete/Update)
- You must prefix new lines with `+` even when creating a new file - You must prefix new lines with `+` even when creating a new file
- File references can only be relative, NEVER ABSOLUTE.
"# "#
.to_string(), .to_string(),
strict: false, strict: false,
@@ -326,7 +340,7 @@ It is important to remember:
/// Returns JSON values that are compatible with Function Calling in the /// Returns JSON values that are compatible with Function Calling in the
/// Responses API: /// Responses API:
/// https://platform.openai.com/docs/guides/function-calling?api-mode=responses /// https://platform.openai.com/docs/guides/function-calling?api-mode=responses
pub(crate) fn create_tools_json_for_responses_api( pub fn create_tools_json_for_responses_api(
tools: &Vec<OpenAiTool>, tools: &Vec<OpenAiTool>,
) -> crate::error::Result<Vec<serde_json::Value>> { ) -> crate::error::Result<Vec<serde_json::Value>> {
let mut tools_json = Vec::new(); let mut tools_json = Vec::new();
@@ -544,7 +558,7 @@ pub(crate) fn get_openai_tools(
} }
if config.apply_patch_tool { if config.apply_patch_tool {
tools.push(create_apply_patch_tool()); tools.push(create_apply_patch_json_tool());
} }
if let Some(mcp_tools) = mcp_tools { if let Some(mcp_tools) = mcp_tools {

View File

@@ -25,6 +25,179 @@ fn sse_completed(id: &str) -> String {
load_sse_fixture_with_id("tests/fixtures/completed_template.json", id) load_sse_fixture_with_id("tests/fixtures/completed_template.json", id)
} }
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn default_system_instructions_contain_apply_patch() {
use pretty_assertions::assert_eq;
let server = MockServer::start().await;
let sse = sse_completed("resp");
let template = ResponseTemplate::new(200)
.insert_header("content-type", "text/event-stream")
.set_body_raw(sse, "text/event-stream");
// Expect two POSTs to /v1/responses
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(template)
.expect(2)
.mount(&server)
.await;
let model_provider = ModelProviderInfo {
base_url: Some(format!("{}/v1", server.uri())),
..built_in_model_providers()["openai"].clone()
};
let cwd = TempDir::new().unwrap();
let codex_home = TempDir::new().unwrap();
let mut config = load_default_config_for_test(&codex_home);
config.cwd = cwd.path().to_path_buf();
config.model_provider = model_provider;
config.user_instructions = Some("be consistent and helpful".to_string());
let conversation_manager = ConversationManager::default();
let codex = conversation_manager
.new_conversation_with_auth(config, Some(CodexAuth::from_api_key("Test API Key")))
.await
.expect("create new conversation")
.conversation;
codex
.submit(Op::UserInput {
items: vec![InputItem::Text {
text: "hello 1".into(),
}],
})
.await
.unwrap();
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex
.submit(Op::UserInput {
items: vec![InputItem::Text {
text: "hello 2".into(),
}],
})
.await
.unwrap();
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
let requests = server.received_requests().await.unwrap();
assert_eq!(requests.len(), 2, "expected two POST requests");
let expected_instructions = [
include_str!("../prompt.md"),
include_str!("../../apply-patch/apply_patch_tool_instructions.md"),
]
.join("\n");
let body0 = requests[0].body_json::<serde_json::Value>().unwrap();
assert_eq!(
body0["instructions"],
serde_json::json!(expected_instructions),
);
let body1 = requests[1].body_json::<serde_json::Value>().unwrap();
assert_eq!(
body1["instructions"],
serde_json::json!(expected_instructions),
);
}
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn prompt_tools_are_consistent_across_requests() {
use pretty_assertions::assert_eq;
let server = MockServer::start().await;
let sse = sse_completed("resp");
let template = ResponseTemplate::new(200)
.insert_header("content-type", "text/event-stream")
.set_body_raw(sse, "text/event-stream");
// Expect two POSTs to /v1/responses
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(template)
.expect(2)
.mount(&server)
.await;
let model_provider = ModelProviderInfo {
base_url: Some(format!("{}/v1", server.uri())),
..built_in_model_providers()["openai"].clone()
};
let cwd = TempDir::new().unwrap();
let codex_home = TempDir::new().unwrap();
let mut config = load_default_config_for_test(&codex_home);
config.cwd = cwd.path().to_path_buf();
config.model_provider = model_provider;
config.user_instructions = Some("be consistent and helpful".to_string());
config.include_apply_patch_tool = true;
config.include_plan_tool = true;
let conversation_manager = ConversationManager::default();
let codex = conversation_manager
.new_conversation_with_auth(config, Some(CodexAuth::from_api_key("Test API Key")))
.await
.expect("create new conversation")
.conversation;
codex
.submit(Op::UserInput {
items: vec![InputItem::Text {
text: "hello 1".into(),
}],
})
.await
.unwrap();
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex
.submit(Op::UserInput {
items: vec![InputItem::Text {
text: "hello 2".into(),
}],
})
.await
.unwrap();
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
let requests = server.received_requests().await.unwrap();
assert_eq!(requests.len(), 2, "expected two POST requests");
let expected_instructions: &str = include_str!("../prompt.md");
// our internal implementation is responsible for keeping tools in sync
// with the OpenAI schema, so we just verify the tool presence here
let expected_tools_names: &[&str] = &["shell", "update_plan", "apply_patch"];
fn assert_tool_names(body: &serde_json::Value, expected_names: &[&str]) {
assert_eq!(
body["tools"]
.as_array()
.unwrap()
.iter()
.map(|t| t["name"].as_str().unwrap().to_string())
.collect::<Vec<_>>(),
expected_names
);
}
let body0 = requests[0].body_json::<serde_json::Value>().unwrap();
assert_eq!(
body0["instructions"],
serde_json::json!(expected_instructions),
);
assert_tool_names(&body0, expected_tools_names);
let body1 = requests[1].body_json::<serde_json::Value>().unwrap();
assert_eq!(
body1["instructions"],
serde_json::json!(expected_instructions),
);
assert_tool_names(&body1, expected_tools_names);
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)] #[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn prefixes_context_and_instructions_once_and_consistently_across_requests() { async fn prefixes_context_and_instructions_once_and_consistently_across_requests() {
use pretty_assertions::assert_eq; use pretty_assertions::assert_eq;