Ephemeral indexes for agents
Not all LLM apps are the same. Anthropic did a good job defining agents vs agentic workflows. That is just the beginning of the rabbit hole.
There are some agentic systems which are asynchronous and more like batch jobs in terms of characteristics and expectations. Need to find some docs? Take your time. Need to read 10 docs instead of two paragraphs, it’s fine, take your time. Need three variants of indexes and lots of iterative queries? ok, sure.
The past and present worlds of enterprise search and search in general all prioritize collecting, processing, and indexing up front. So much front loaded compute, all to address low latency search on demand. What a waste.
As asynchronous agents and workflows mature, we should reconsider the architecture and trade offs. There’s no need to precompute and build a general system. Build it just in time. Tailor it to the needs of the moment.
Enter dynamically created and ephemeral indexes. When the agentic job comes online, you can go collect all the docs and information needed. Even with a high recall oriented retrieval you still change the scale of ingestion and indexing by multiple orders of magnitude. You can adapt the indexing based on the job at hand. It can often fit in memory. And when done, just archive it or throw it away.
It’s refreshing to revisit computational trade offs and architecture in these new domains.
Ephemeral indexes are on the way. Just in time.
Thanks to @mdp314 and @ravo for listening to me ramble about this at various times this year.