Using RAG with Langchain4j and Ollama3

Retrieval-Augmented Generation (RAG) is a framework that enhances the capabilities of generative language models by incorporating relevant information retrieved from a large corpus of documents. This combination helps improve the accuracy and relevance of the generated responses. In this article we will learn how to use RAG with Langchain4j.

Why RAG matters

Retrieval-Augmented Generation (RAG) is highly beneficial for enhancing the capabilities of generative models by integrating external information directly into the generation process. Instead of relying solely on the pre-trained knowledge of a language model, RAG dynamically retrieves relevant information from an extensive corpus of documents, ensuring that responses are both accurate and contextually rich.

Retrieval-Augmented Generation (RAG) java tutorial

This approach not only mitigates the model’s tendency to produce hallucinations but also enriches its output with up-to-date and domain-specific knowledge that may not be present in the model’s original training data. By bridging the gap between static knowledge encapsulated in the model and dynamic, real-time information, RAG significantly improves the quality and reliability of generated responses, making it invaluable for applications that require precision and context-awareness.

Key Components in RAG with Langchain4j and Ollama3

To implement RAG using Langchain4j and Ollama3, we’ll focus on the following components:

  1. EmbeddingStore: Manages embeddings generated from documents.
  2. EmbeddingStoreIngestor: Ingests documents and generates embeddings.
  3. OllamaEmbeddingModel: Generates embeddings from textual data.
  4. OllamaLanguageModel: Uses the retrieved data to generate responses.

Step-by-step implementation

Firstly, make sure you have Ollama3 engine up and running. The following article describes in detail the process: Getting started with langchain4j and Llama Model

Then, include the following dependency in your code:

<dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-ollama</artifactId>
      <version>0.33.0</version>
</dependency>

Then, let’s suppose we want to add extra information from a text file, which defines a new creature that would fit in the epic Lord of the Rings saga:

The Shadowmire is a mysterious and ancient creature that dwells in the darkest, most secluded swamps of Middle-earth.
It has the body of a large, sleek panther, but its fur is a deep, iridescent black that seems to absorb light.
Its eyes are a piercing emerald green, glowing with an eerie luminescence that can be seen from afar.

Save this text file under the resources of your Maven project (or anywhere in your classpath), for example in dictionary.txt.

Then, let’s write the code to ingest this information in the OllamaLanguageModel. Then, we will write a prompt to query the Model with the information which is in the text file:

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

import java.net.URL;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.util.List;
import java.util.Map;

public class RAGIngestor {
    private static Duration timeout = Duration.ofSeconds(900);
    public static void main(String args[]) throws Exception  {

        EmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3")
                .build();


        EmbeddingStore embeddingStore = new InMemoryEmbeddingStore();
        URL fileUrl = RAGIngestor.class.getResource("/dictionary.txt");
        Path path = Paths.get(fileUrl.toURI());

        Document document = FileSystemDocumentLoader.loadDocument(path, new TextDocumentParser());

        DocumentSplitter splitter = DocumentSplitters.recursive(600, 0);

        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(splitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(document);

        Embedding queryEmbedding = embeddingModel.embed("What is the Shadowmire ?").content();
        List<EmbeddingMatch<TextSegment>> relevant = embeddingStore.findRelevant(queryEmbedding, 1);
        EmbeddingMatch<TextSegment> embeddingMatch = relevant.get(0);
        String information = embeddingMatch.embedded().text();

        Prompt prompt = PromptTemplate.from("""
                Tell me about {{name}}?
                
                Use the following information to answer the question:
                {{information}}
                """).apply(Map.of("name", "Shadowmire","information", information));
        // Initialize the language model for generating the response
        OllamaLanguageModel model = OllamaLanguageModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3")
                .timeout(timeout)
                .build();

    
        String answer = model.generate(prompt).content();
        System.out.println("Answer:"+answer);

    }
}

Explanation of the Code

  1. Initialize the Embedding Model: Using OllamaEmbeddingModel, we create an embedding model connected to the Ollama3 service.
  2. Initialize the Embedding Store: An in-memory store to hold the embeddings.
  3. Load and Parse the Document: Load a document from the file system and parse it into text segments.
  4. Split the Document: Split the document into smaller parts using a recursive splitter.
  5. Ingest the Document: Ingest the document into the embedding store by generating embeddings for each segment.
  6. Create Query Embedding: Generate an embedding for the user query.
  7. Retrieve Relevant Information: Perform a similarity search in the embedding store to find relevant text segments.
  8. Prepare the Prompt: Create a prompt using the retrieved information and a template.
  9. Initialize the Language Model: Use OllamaLanguageModel to initialize the generative model.
  10. Generate the Response: Generate a response using the language model based on the prepared prompt.

By running the above code, the Prompt will return ( after a couple of minutes), the response:

Using RAG with Langchain4j and Ollama3

Conclusion

By combining the strengths of retrieval-based and generative models, RAG with Langchain4j and Ollama3 offers a powerful approach to improve the accuracy and relevance of natural language processing tasks. This tutorial provides a foundational framework to implement RAG, which can be further customized and scaled based on specific use cases and datasets.