From Docs to Answers: My Smolagents + Docling + DuckDB Experiment đ€
To the surprise of no one, retrieval-augmented generation (RAG) and agents have been the biggest buzz in the AI world for over a year now. Today, you can choose from loads of libraries, vector databases, models, and frameworks to build your own applications, and more keep popping up all the time!
Among the multitude of options, three particular new contenders caught my eye:
- Hugging Faceâs smolagents, especially their code agent (more on that later).
- DuckDB, whichâat the risk of underselling itâI will be using it as an OLAP SQLite. Itâs recently added support for a vector data type and similarity search, making it very interesting for this kind of project.
- IBMâs Doclingâa simple, flexible, and powerful document processor/parser.
So there I was, curious about a lightweight, in-process vector DB, a promising document parser, and a new agents framework⊠what could I possibly do with them? đ€ An agent that ingests documents and RAGs for you đĄ
This article shares my first experience and impressions while building a smol prototype agent that ingests your documents, indexes them for retrieval, and answers questions about their content.
Do you want to see the code? Hereâs the repo too!
The Stack
This project uses several well-known packagesâincluding transformers and sentence-transformers for LLMs and embeddings, plus Gradio for the UI. I wonât detail all of them here, but I want to highlight our three main players.
Smolagents
Smolagents is a minimalist AI agent framework from Hugging Face. As far as I know (and I donât follow every framework), they were the first to push the idea of a code agent in an openâsource setting. Instead of interacting with tools via JSON, the code agent writes Python code that executes tools inside a sandbox. That makes it both more flexible and more efficient, since it can tackle complex workflows in fewer steps. For example, it could execute something like the following in a single step:
result_1 = tool_1(args)
if result_1:
result_2 = tool_2(result_1)
print(result_2)
Like most HF projects, itâs modelâagnosticâyou can plug in your favorite LLM (openâsource or APIâbased) and it integrates seamlessly with the transformers library.
At the time of writing, theyâve also added VisionâLanguage models (VLMs) and Computer Use support in the last few weeks, opening up even more possibilities đ
Docling
I donât think Docling has received the hype it deserves. IBM openâsourced it a while back, yet I havenât seen many people talking about it. Out of the box, it takes tons of document formats and parses them into JSON or Markdown. What used to require multiple libraries and custom parsers is now a oneâstop shop. Itâs so straightforward that I barely had to tweak anything đ âwhich says a lot about its defaults.
You can also supercharge it with VLMs, though I found the outâofâtheâbox pipeline already covers the â80%â of most RAG needs.
DuckDB
DuckDB delivered exactly what it promised: an inâprocess OLAP databaseâthink âSQLite for analyticsââand it recently added experimental support for fixedâsize arrays and vector similarity search. That means you can store embeddings directly in a column and run nearestâneighbor queries with plain SQL.
With the vss
extension, building an HNSW
index and performing similarity search takes just a few lines. No servers, no extra services, just a local file and your queries. For this prototype, that meant everything stayed selfâcontained: ingest, embed, store, and search. Super convenient.
Building the RAG Agent
The core idea is simple:
- A chat UI built on Gradio
- A code agent following the ReAct framework
-
Two initial tools:
- Indexer: ingests, parses, and indexes documents
- Retriever: embeds queries and performs similarity search
During a conversation, the user can ask the agent to index new documentsâeither by upload or URLâor to answer questions about any indexed content.
In practice (no surprise to anyone building agents), it didnât work perfectly at first. For this reason I ended up adding a third tool:
- Summarizer: condenses one or more text chunks, either generally or tailored to a query.
All tools are invoked via generated Python code. The same LLM powers both the agentâs reasoning and the summarization, keeping the architecture simple.
Indexing
For each document, the indexer:
- Parses the file with Docling
- Extracts named entities via an NERâtuned model
- Computes an embedding vector
-
Inserts a row per chunk into DuckDB, storing:
- Document name
- Chunk text
- Named entities
- Embedding vector
Retrieval
The retriever:
- Extracts named entities from the query
- Embeds the query
- Retrieves chunks via similarity search (optionally filtered by document name)
- Reranks based on shared named entities between chunk and query
This quick entityâbased reranking boosted relevance without requiring expensive crossâencoders or reranker models. Itâs not perfect, but itâs surprisingly effective. An obvious but much more intricate extension to this approach would be to build a knowledge graph using these names entities.
Observations & Takeaways
While I hinted at a few challenges above, the overall experience with these tools has been very positive. Here are my main takeaways:




Agent bias toward tool usage
By default, the code agent strongly prefers using toolsâeven inventing themâinstead of solving tasks directly as an LLM. To work around this, you can tweak the system prompt or add deliberately goofy tools. I opted for the latter, introducing the summarizer. This played to the agentâs toolâcentric tendencies while leveraging the LLMâs strengths in summarization.
Model size matters
Earlier this year, I wrote about testâtime compute and how smaller LLMs can outperform expectations. In this case, though, model size really did matter. Models in the ~7Bâ11B range struggled with openâended tasks, needing explicit instructions to use specific tools. Swapping up to ~30Bâ70B turned that around: the larger models handled ambiguous requests and selfâcorrected much better. The tradeâoff was losing local inferencing and moving to cloud endpoints.
Final verdict
All in all, I came away with a positive impression of the three tools that motivated this project. Docling is incredibly simple yet powerful âIâll definitely reach for it again when processing documents. DuckDB is great, but I see its real potential more as a Delta Lake alternative than for simple, local storage (see their Duck Lake post). And smolagents? Iâm excited to take it beyond this PoC, especially if async support is added. Itâs shaping up to be a solid production contender.
Enjoy Reading This Article?
Here are some more articles you might like to read next: