What is happening inside agents?
It's time to learn about the internal mechanisms of agents
Where to begin?
Let’s begin in the middle. It’s the middle of 2025, and I’ve been a little out of the LLM game. For the last 9 months (2 as a volunteer and 7 as a full time staff member), I’ve been working on transcriptomics, data analysis, and machine learning in an academic lab. I’ve been keeping up with LLMs and AI, but mostly through observation, reading, and some fun development on the side.
In the mean time, agents and agentic workflows have really taken center stage. I’m feeling a lot of pull to dig back in and sort out the details of these advancements.
First there was RAG and Orange Words
Winding back to late 2023 and early 2024, I was deep into a stint at a small enterprise search startup. Retrieval Augmented Generation (RAG) was getting a lot of attention in the field, and I was learning, thinking, and developing prototypes using the OpenAI and Together.AI API services.
The RAG prototype I first built using Vespa.ai and my Hacker News data can be found here: https://orangewords.com/qna
If memory serves, this was around the time of GPT-3.5-turbo and friends. I remember learning the first step of RAG was as simple as:
- Use the user question to make a query
- Place the search results in the prompt template
- Call the LLM to generate a response using the search results
This was (and still is) pretty straight forward, if you have a decent search platform and a good LLM. It can deal with a lot of simple, data oriented questions and even some more complex and nuanced questions.
However, at the time, I kept thinking about the more complex workflows between the search and LLM calls, as well as the possible taxonomy of questions that might arise. Key emerging questions included things like:
- How do you handle a follow up question?
- When should you perform another search?
- How do you understand the difference based on the user question?
- and so on…
Tool use and function calling were emerging, but it was still really early for these more complex LLM uses.
MCP and agents everywhere
Fast forward to now and everything’s an Agent. And if you’re not AI-native then you’re AI-belated.
With the wave of interest in Model Contex Protocol (MCP), my attention started returning to this area. I began reading, listening to podcasts, and pressing Gemini in voice mode to give me more details.
Repeatedly, explanations seemed to gloss over the internal details of the LLM orchestration. I’d remembered enough to know that tool use or function calling were likely at play, but MCP mostly focuses on the interface for connecting and calling those tools. That’s adjacent to what I really want to learn.
The drivers
My main driver here is just curiosity, and keeping up with my field.
Also, earlier this year I spent some time developing a prototype which uses telegram and an LLM, in an ongoing side project with a friend. It’s more complex than a basic RAG workflow. It involves longer running conversations across days, managing state, mulitple LLM call types, and understanding user input as directives to update the state. I suspect it could be modeled as an agent with the internal API setup as tools to call. I’ll be keeping that in mind on this journey.
Time for a journey
With all that said, I’ve decided it’s time for a small journey of learning.
My aim is to get my hands dirty and have a much better understanding of:
- What is my definition of an agent?
- How do agents and workflows relate and blend?
- How are tools defined and called?
- How does this change across major LLM APIs?
- What and how are decisions made to call tools?
- How is all this blended together?
I’ll be attempting to document things informally, here in my logbook.