The output of an MCP server tool call can be one of several types, but to date, we treated all outputs as text by showing the serialized JSON as the "tool output" in Codex:25a9949c49/codex-rs/mcp-types/src/lib.rs (L96-L101)This PR adds support for the `ImageContent` variant so we can now display an image output from an MCP tool call. In making this change, we introduce a new `ResponseInputItem::McpToolCallOutput` variant so that we can work with the `mcp_types::CallToolResult` directly when the function call is made to an MCP server. Though arguably the more significant change is the introduction of `HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that uses `ratatui_image` to render an image into the terminal. To support this, we introduce `ImageRenderCache`, cache a `ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the appropriate scaled image data and dimensions based on the current terminal size. To test, I created a minimal `package.json`: ```json { "name": "kitty-mcp", "version": "1.0.0", "type": "module", "description": "MCP that returns image of kitty", "main": "index.js", "dependencies": { "@modelcontextprotocol/sdk": "^1.12.0" } } ``` with the following `index.js` to define the MCP server: ```js #!/usr/bin/env node import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { readFile } from "node:fs/promises"; import { join } from "node:path"; const IMAGE_URI = "image://Ada.png"; const server = new McpServer({ name: "Demo", version: "1.0.0", }); server.tool( "get-cat-image", "If you need a cat image, this tool will provide one.", async () => ({ content: [ { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" }, ], }) ); server.resource("Ada the Cat", IMAGE_URI, async (uri) => { const base64Image = await getAdaPngBase64(); return { contents: [ { uri: uri.href, mimeType: "image/png", blob: base64Image, }, ], }; }); async function getAdaPngBase64() { const __dirname = new URL(".", import.meta.url).pathname; // From9705ce2c59/assets/Ada.pngconst filePath = join(__dirname, "Ada.png"); const imageData = await readFile(filePath); const base64Image = imageData.toString("base64"); return base64Image; } const transport = new StdioServerTransport(); await server.connect(transport); ``` With the local changes from this PR, I added the following to my `config.toml`: ```toml [mcp_servers.kitty] command = "node" args = ["/Users/mbolin/code/kitty-mcp/index.js"] ``` Running the TUI from source: ``` cargo run --bin codex -- --model o3 'I need a picture of a cat' ``` I get: <img width="732" alt="image" src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869" /> Now, that said, I have only tested in iTerm and there is definitely some funny business with getting an accurate character-to-pixel ratio (sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10 rows to render instead of 4), so there is still work to be done here.
74 lines
2.3 KiB
Rust
74 lines
2.3 KiB
Rust
use std::time::Duration;
|
|
|
|
use tracing::error;
|
|
|
|
use crate::codex::Session;
|
|
use crate::models::FunctionCallOutputPayload;
|
|
use crate::models::ResponseInputItem;
|
|
use crate::protocol::Event;
|
|
use crate::protocol::EventMsg;
|
|
use crate::protocol::McpToolCallBeginEvent;
|
|
use crate::protocol::McpToolCallEndEvent;
|
|
|
|
/// Handles the specified tool call dispatches the appropriate
|
|
/// `McpToolCallBegin` and `McpToolCallEnd` events to the `Session`.
|
|
pub(crate) async fn handle_mcp_tool_call(
|
|
sess: &Session,
|
|
sub_id: &str,
|
|
call_id: String,
|
|
server: String,
|
|
tool_name: String,
|
|
arguments: String,
|
|
timeout: Option<Duration>,
|
|
) -> ResponseInputItem {
|
|
// Parse the `arguments` as JSON. An empty string is OK, but invalid JSON
|
|
// is not.
|
|
let arguments_value = if arguments.trim().is_empty() {
|
|
None
|
|
} else {
|
|
match serde_json::from_str::<serde_json::Value>(&arguments) {
|
|
Ok(value) => Some(value),
|
|
Err(e) => {
|
|
error!("failed to parse tool call arguments: {e}");
|
|
return ResponseInputItem::FunctionCallOutput {
|
|
call_id: call_id.clone(),
|
|
output: FunctionCallOutputPayload {
|
|
content: format!("err: {e}"),
|
|
success: Some(false),
|
|
},
|
|
};
|
|
}
|
|
}
|
|
};
|
|
|
|
let tool_call_begin_event = EventMsg::McpToolCallBegin(McpToolCallBeginEvent {
|
|
call_id: call_id.clone(),
|
|
server: server.clone(),
|
|
tool: tool_name.clone(),
|
|
arguments: arguments_value.clone(),
|
|
});
|
|
notify_mcp_tool_call_event(sess, sub_id, tool_call_begin_event).await;
|
|
|
|
// Perform the tool call.
|
|
let result = sess
|
|
.call_tool(&server, &tool_name, arguments_value, timeout)
|
|
.await
|
|
.map_err(|e| format!("tool call error: {e}"));
|
|
let tool_call_end_event = EventMsg::McpToolCallEnd(McpToolCallEndEvent {
|
|
call_id: call_id.clone(),
|
|
result: result.clone(),
|
|
});
|
|
|
|
notify_mcp_tool_call_event(sess, sub_id, tool_call_end_event.clone()).await;
|
|
|
|
ResponseInputItem::McpToolCallOutput { call_id, result }
|
|
}
|
|
|
|
async fn notify_mcp_tool_call_event(sess: &Session, sub_id: &str, event: EventMsg) {
|
|
sess.send_event(Event {
|
|
id: sub_id.to_string(),
|
|
msg: event,
|
|
})
|
|
.await;
|
|
}
|