latency reveal | LX Mini

Apr 21, 2026, 2:03 PM

Agents have induced lots of conversation around search implementation details. This has been useful for revisiting all kinds of assumptions baked into search systems thinking.

I keep coming back to how agents have led us to relax the contraint of low latency.

If I have more time, then I’m allowed more compute and a more diverse set of algorithms at query time. Instead of pre-computing work, which is currently expressed in leveraging trained embedding and encoder models, I can explore new ways to solve the same problems at query time.

You might still end up with a query time model, but I suspect all the work and resources for managing embeddings would start to look like a poor tradeoff. I think we can likely get the incremental relevancy boosts in different ways while still building on the more efficient core of a lexical index.