Using more compute at inference time — via longer chain-of-thought, sampling, or verification — to boost model quality without retraining. The OpenAI o1 + Claude extended thinking paradigm.
"We're throwing more test-time compute at the eval. 3x latency, much better answers."
No comments yet — say something.
Add your own interpretation of "test-time compute".