Elasticsearch

Quick Summary

Elasticsearch is a distributed search engine that serves as a high-performance vector database for your RAG application. It was designed for scalability, high-speed queries, relevance optimization, and efficiently handles production-scale retrieval workloads. Learn more about Elasticsearch here.

info

To get started, install the Python library through the CLI using the following command:

pip install elasticsearch

DeepEval allows you to evaluate your Elasticsearch retriever and optimize retrieval hyperparameters such as top-K, embedding model, and/or similarity function. This diagram provides a great overview of how Elasticsearch functions as a vector database within your RAG pipeline:

Source: Elasticsearch

Setup Elasticsearch

To get started, simply connect to your local Elastic cluster using the "elastic" username and the password set in your environment variable ELASTIC_PASSWORD.

import os
from elasticsearch import Elasticsearch

username = 'elastic'
password = os.getenv('ELASTIC_PASSWORD') # Value you set in the environment variable

client = Elasticsearch(
    "http://localhost:9200",
    basic_auth=(username, password)
)

Next, create an Elasticsearch index with the correct type mappings to store text and embedding as a dense_vector for similarity search, ensuring efficient retrieval.

# Create index if it doesn't exist
if not es.indices.exists(index=index_name):
    es.indices.create(index=index_name, body={
        "mappings": {
            "properties": {
                "text": {"type": "text"},  # Stores chunk text
                "embedding": {"type": "dense_vector", "dims": 384}  # Stores embeddings
            }
        }
    })

Finally, define an embedding model to embed your document chunks before adding them to your Elasticsearch index for retrieval.

# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

# Example document chunks
document_chunks = [
    "Elasticsearch is a distributed search engine.",
    "RAG improves AI-generated responses with retrieved context.",
    "Vector search enables high-precision semantic retrieval.",
    ...
]

# Store chunks with embeddings
for i, chunk in enumerate(document_chunks):
    embedding = model.encode(chunk).tolist()  # Convert text to vector
    es.index(index=index_name, id=i, body={"text": chunk, "embedding": embedding})

Evaluating Elasticsearch Retrieval

To begin evaluating your Elasticsearch retriever, simply prepare an input query, generate a response to the input using your RAG pipeline, and store the input, actual_output, and retrieval_context values in an LLMTestCase for evaluation.

tip

An LLMTestCase allows you to create unit tests for your LLM applications, helping you identify specific weaknesses in your RAG application.

First, retrieve the retrieval_context from your Elasticsearch index for your input query.

def search(query):
    query_embedding = model.encode(query).tolist()

    res = es.search(index=index_name, body={
        "knn": {
            "field": "embedding",
            "query_vector": query_embedding,
            "k": 3  # Retrieve the top match
            "num_candidates": 10  # Controls search speed vs accuracy
        }
    })

    return res["hits"]["hits"][0]["_source"]["text"] if res["hits"]["hits"] else None

query = "How does Elasticsearch work?"
retrieval_context = search(query)

Next, pass the retrieval context to your LLM to generate a response.

prompt = """
Answer the user question based on the supporting context

User Question:
{input}

Supporting Context:
{retrieval_context}
"""

actual_output = generate(prompt) # hypothetical function, replace with your own LLM

Finally, run evaluate() using deepeval to assess contextual recall, precision, and relevancy of your Elasticsearch retriever.

note

These 3 contextual metrics help evaluate the performance of your retriever. For more details on retriever evaluation, check out this guide.

from deepeval.metrics import (
    ContextualRecallMetric,
    ContextualPrecisionMetric,
    ContextualRelevancyMetric,
)
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

...
test_case = LLMTestCase(
    input=input,
    actual_output=actual_output,
    retrieval_context=retrieval_context,
    expected_output="Elasticserach uses inverted indexes for keyword searches and dense vector similarity for semantic search.",
)
evaluate(
    [test_case],
    metrics=[
        ContextualRecallMetric(),
        ContextualPrecisionMetric(),
        ContextualRelevancyMetric(),
    ],
)

Improving Elasticsearch Retrieval

To improve your Elasticsearch retriever, experiment with various hyperparameters and prepare LLMTestCases using generations from different versions of these retrievers. Analyzing improvements and regressions in contextual metric scores (the 3 metrics we defined above) will help you determine the optimal hyperparameter combination for your Elasticsearch retriever.

tip

For a more detailed guide on iterating on your retriever's hyperparameters, you may want to check out this guide.

Quick Summary​

Setup Elasticsearch​

Evaluating Elasticsearch Retrieval​

Improving Elasticsearch Retrieval​

Quick Summary

Setup Elasticsearch

Evaluating Elasticsearch Retrieval

Improving Elasticsearch Retrieval