Agentic RAG with Reasoning LLMs: A Powerful Workflow

2025-02-05

ℹ️Note on the source

This blog post was automatically generated (and translated). It is based on the following original, which I selected for publication on this blog:
You HAVE to Try Agentic RAG with DeepSeek R1 (Insane Results).

Agentic RAG with Reasoning LLMs: A Powerful Workflow

Reasoning language models (LLMs) such as DeepSeek R1 offer significant power, but often at the cost of speed. Combining these models with faster, more lightweight LLMs can unlock powerful agentic workflows. This approach leverages the reasoning capabilities of models like R1 for in-depth insights while maintaining nimble conversational flow.

Conceptual Overview

In this setup, a primary LLM (e.g., Llama or Quen) directs the conversation. When knowledge retrieval is required, the primary LLM calls a tool that utilizes R1. This tool performs the following steps:

A user query is received.
Retrieval-augmented generation (RAG) is performed using a vector database to find relevant context.
The context and query are fed into the reasoning LLM (R1).
R1 extracts insights from the context.
The insights are returned to the primary agent to continue the conversation.

Implementing Agentic RAG

Frameworks like Small Agents simplify the creation of such workflows. With Small Agents, you can bootstrap an agentic workflow with R1 very quickly.

Local Setup with Ollama

Ollama provides a way to run R1 locally. You can download different versions of DeepSeek R1, including distilled versions, and run them on your machine. Distilled versions are fine-tuned based on data produced from DeepSeek R1, effectively turning them into smaller reasoning LLMs.

Considerations for Context Length

Ollama models typically have a limited context window. To address this, the context limit can be increased by modifying the model file. This allows the LLM to process longer prompts and maintain more context during conversations.

Advantages of Agentic RAG

In-depth Insights: Reasoning LLMs extract more profound understanding from retrieved information.
Nimble Conversations: Faster LLMs maintain a fluid and responsive user experience.
Improved Query Formulation: Reasoning LLMs can even guide the primary LLM to formulate better queries for RAG.

Is this combination of different LLMs a way to get the best of both worlds and achieve a much better solution?

Further Exploration

Agentic RAG is a rapidly evolving area with many potential avenues for exploration. More robust implementations can be built using frameworks like Pydantic AI and LangGraph. Such systems could incorporate more sophisticated tool usage, planning, and memory management.

The integration of reasoning LLMs with faster models unlocks new possibilities for AI agents and conversational systems. Which path do we want to take?

Comments are closed.