Using LLM as a Reranker

What Is Reranker?

A reranker is a component in LLM applications that helps improve search and retrieval quality by reordering a set of initial results based on more sophisticated relevance criteria. Let me explain how it typically works:

Initial Retrieval: First, a faster but simpler retrieval system (like vector search) pulls a set of potentially relevant documents or passages from a database. This initial retrieval prioritizes speed and recall.
Reranking Stage: The reranker then takes this initial set and performs a more thorough analysis to reorder them based on their true relevance to the query. The reranker uses more sophisticated matching techniques that would be too computationally expensive to run on the entire document collection.

For example, if you search for “What are the health benefits of running?“:

Initial retrieval might return 20-50 passages containing keywords about running and health
The reranker then carefully compares each passage with the query, considering factors like:
- Semantic similarity
- Whether the passage actually answers the question
- The quality and authority of the content
- Cross-passage relationships

Different types of rerankers

Common types of rerankers include:

Cross-encoder models: These look at the query and candidate passage together to determine relevance
Mono-encoder models: These encode the query and passages separately but can compare them more thoroughly
Hybrid approaches that combine multiple scoring methods

The main benefits of using a reranker are:

Higher precision in search results
Better handling of semantic nuances
Improved result ordering without sacrificing initial retrieval speed

Several companies provide reranking models as part of their AI offerings, with Cohere being one of the prominent providers. Let me break down how different companies approach this:

Cohere:

Offers rerank as a dedicated API endpoint
Their model can handle up to 25 candidates per request
Takes a query and a list of documents/passages as input
Returns relevance scores and reranked results
Notable for having multilingual support
Particularly good at semantic matching rather than just lexical matching

Microsoft Azure:

Provides reranking capabilities through Azure Cognitive Search
Integrates with their vector search and semantic search features
Can be used with their pre-trained models or custom models
Offers integration with OpenAI models for reranking

Amazon:

Offers reranking through Amazon Kendra
Includes semantic ranking capabilities
Can be integrated with custom ranking expressions
Works well with their document retrieval system

Google Cloud:

Provides Enterprise Search with built-in reranking capabilities
Offers Vertex AI Search (formerly Enterprise Search) with semantic ranking
Can be customized with domain-specific knowledge

LLM as a Reranker

Using an LLM as a reranker is an interesting approach that can provide sophisticated semantic matching. Here’s a breakdown:

Basic Approach:

def llm_rerank(query, documents, llm):
    ranked_results = []
    for doc in documents:
        prompt = f"""
        Query: {query}
        Document: {doc}
        
        On a scale of 0-10, how relevant is this document to the query?
        Provide your score and brief reasoning.
        """
        score = llm.get_score(prompt)  # You'd implement this based on your LLM
        ranked_results.append((score, doc))
    
    return sorted(ranked_results, reverse=True)

More Sophisticated Methods:

def advanced_llm_rerank(query, documents, llm):
    prompt_template = """
    Rate the relevance of this document for the given query.
    Consider these aspects:
    - Direct answer relevance (0-5)
    - Information completeness (0-3)
    - Factual accuracy (0-2)
    
    Query: {query}
    Document: {document}
    
    Provide scores for each aspect and a total score out of 10.
    """
    
    results = []
    for doc in documents:
        prompt = prompt_template.format(query=query, document=doc)
        scores = llm.analyze(prompt)
        results.append((scores['total'], doc))
    
    return sorted(results, reverse=True)

Key Considerations:

Cost and Latency:
- LLMs are relatively slow and expensive compared to traditional rerankers
- Best to limit to a small number of candidates (e.g., top 5-10 from initial retrieval)
- Consider batching requests if your LLM supports it
Prompt Engineering:
- Clear scoring criteria are essential
- Consider breaking down relevance into specific aspects
- You might want to include examples of good and bad matches
Performance Optimization:
- Cache results for common queries
- Use smaller LLMs specialized for ranking
- Consider async processing for better throughput
Hybrid Approaches:

def hybrid_rerank(query, documents):
    # First pass with traditional reranker
    initial_rerank = traditional_reranker.rank(query, documents)
    top_candidates = initial_rerank[:5]
    
    # Second pass with LLM for final ordering
    final_ranking = llm_rerank(query, top_candidates)
    return final_ranking

Output Parsing:

def parse_llm_score(llm_response):
    # Example parsing logic
    try:
        # Extract numerical score from LLM response
        score_pattern = r"Score:\s*(\d+(?:\.\d+)?)"
        match = re.search(score_pattern, llm_response)
        if match:
            return float(match.group(1))
        # Fallback parsing logic
        return 0
    except Exception:
        return 0

The main advantages of using LLMs as rerankers:

Deep semantic understanding
Ability to handle complex queries
Flexible scoring criteria
Can provide explanations for rankings

The main challenges:

Higher latency
Higher cost per query
Need for careful prompt engineering
Potential inconsistency in scoring

Dec 20, 2024

AI Enthusiast and a Software EngineerJason Kang on LinkedIn