I still see flakiness in `test_shell_command_approval_triggers_elicitation()` on occasion where `MockServer` claims it has not received all of its expected requests. I recently introduced a similar type of test in #2264, `test_codex_jsonrpc_conversation_flow()`, which I have not seen flake (yet!), so this PR pulls over two things I did in that test: - increased `worker_threads` from `2` to `4` - added an assertion to make sure the `task_complete` notification is received Honestly, I'm still not sure why `MockServer` claims it sometimes does not receive all its expected requests given that we assert that the final `JSONRPCResponse` is read on the stream, but let's give this a shot. Assuming this fixes things, my hypothesis is that the increase in `worker_threads` helps because perhaps there are async tasks in `MockServer` that do not reliably complete fully when there are not enough threads available? If that is correct, it seems like the test would still be flaky, though perhaps with lower frequency?
15 KiB
15 KiB