Time between sending a request and getting a response. In LLMs, typically 500ms–15s depending on model size and output length. Every millisecond matters in agentic loops.
"Latency is the UX story of 2026. Streaming helps; reasoning models hurt."
No comments yet — say something.
Add your own interpretation of "latency".