Lädt...


🔧 Enhancing Hybrid Search in MongoDB: Combining RRF, Thresholds, and Weights


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

In my previous blogs, I explored implementing basic hybrid search in MongoDB, combining vector and text search capabilities(https://dev.to/shannonlal/optimizing-mongodb-hybrid-search-with-reciprocal-rank-fusion-4p3h). While this approach worked, I encountered challenges in getting the most relevant results. This blog discusses three key improvements I've implemented: Reciprocal Rank Fusion (RRF), similarity thresholds, and search type weighting.

The Three Pillars of Enhanced Hybrid Search

1. Reciprocal Rank Fusion (RRF)

RRF is a technique that helps combine results from different search methods by considering their ranking positions. Instead of simply adding scores, RRF uses a formula that gives more weight to higher-ranked results while smoothing out score differences:

{
  $addFields: {
    vs_rrf_score: {
      $multiply: [
        0.4, // vectorWeight
        { $divide: [1.0, { $add: ['$rank', 60] }] },
      ],
    },
  },
}

2. Similarity Thresholds

To ensure quality results, I've added minimum thresholds for both vector and text search scores:

// Vector search threshold
{
  $match: {
    vectorScore: { $gte: 0.9 }
  }
}

// Text search threshold
{
  $match: {
    textScore: { $gte: 0.5 }
  }
}

This prevents low-quality matches from appearing in the results, even if they would have received a boost from the RRF calculation. In the example above I have chosen 0.9 for vector similarity score and 0.5 for text; however, you can adjust these based on your search results with your data.

3. Weighted Search Types

Different search types perform better for different queries. I've implemented weights to balance their contributions:

{
  $addFields: {
    combined_score: {
      $add: [
        { $multiply: [{ $ifNull: ['$vectorScore', 0] }, 0.4] },
        { $multiply: [{ $ifNull: ['$textScore', 0] }, 0.6] }
      ]
    }
  }
}

In this example I am giving a bit more weight to the text search results over the vector search, but again you can adjust these based on your search tests.

Putting It All Together

Here's a simplified version of the complete pipeline:

[
  // Vector Search with threshold
  {
    $vectorSearch: {
      index: 'ai_image_vector_description',
      path: 'descriptionValues',
      queryVector: embedding,
      filter: {
        userId: userId,
        deleted: false,
      }
    }
  },
  { $match: { vectorScore: { $gte: 0.9 } } },
  // RRF calculation for vector search
  {
    $group: {
      _id: null,
      docs: { $push: '$$ROOT' }
    }
  },
  // ... RRF calculation stages ...
  {
    $unionWith: {
      // Text search pipeline with similar structure
    }
  },
  // Final combination and sorting
  {
    $sort: { combined_score: -1 }
  }
]

Benefits and Results

This enhanced approach provides several benefits:

  1. More relevant results by considering both ranking position and raw scores
  2. Quality control through minimum thresholds
  3. Flexible weighting to optimize for different use cases

The combination of these techniques has significantly improved our search results, particularly for queries where simple score addition wasn't providing optimal ordering.

Next Steps

Future improvements could include:

  • Dynamic weight adjustment based on query characteristics
  • Additional quality metrics beyond simple thresholds
  • Performance optimization for larger datasets

By implementing these enhancements, we've created a more robust and reliable hybrid search system that better serves our users' needs.

...

🔧 Enhancing Hybrid Search in MongoDB: Combining RRF, Thresholds, and Weights


📈 137.83 Punkte
🔧 Programmierung

🔧 Enhancing Search Accuracy with RRF(Reciprocal Rank Fusion) in Alibaba Cloud Elasticsearch 8.x


📈 50.21 Punkte
🔧 Programmierung

📰 How The Economic Times passed Core Web Vitals thresholds and achieved an overall 43% better bounce rate


📈 30.52 Punkte
Web Tipps

🔧 Understanding Search Scores in MongoDB Hybrid Search


📈 29.49 Punkte
🔧 Programmierung

🔧 AWS Budgets: Update alert thresholds unlimitedly with Lambda


📈 29.3 Punkte
🔧 Programmierung

🔧 AI Limits? Compute Thresholds Aren't a Silver Bullet for Governance


📈 29.3 Punkte
🔧 Programmierung

🔧 Optimizing MongoDB Hybrid Search with Reciprocal Rank Fusion


📈 23.84 Punkte
🔧 Programmierung

🔧 What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more


📈 23.44 Punkte
🔧 Programmierung

📰 AI Weights: Securing the Heart and Soft Underbelly of Artificial Intelligence


📈 23.44 Punkte
📰 IT Security Nachrichten

📰 Streamlining Object Detection with Metaflow, AWS, and Weights & Biases


📈 23.44 Punkte
🔧 AI Nachrichten

📰 DeepMind Released AlphaFold 3 Inference Codebase, Model Weights and An On-Demand Server


📈 23.44 Punkte
🔧 AI Nachrichten

🔧 Unraveling the Might of "Super Weights" in Massive Language Models: Identification and Management


📈 23.44 Punkte
🔧 Programmierung

🔧 Full-text search using MongoDB and Elastic Search


📈 22.47 Punkte
🔧 Programmierung

🔧 Transforming MongoDB Search: Step-by-Step Guide to Using Atlas and Fuzzy Search


📈 22.47 Punkte
🔧 Programmierung

🎥 Memory-efficient inference with XNNPack weights cache


📈 22.22 Punkte
🎥 Künstliche Intelligenz Videos

📰 Manipulating Weights in Face-Recognition AI Systems


📈 22.22 Punkte
📰 IT Security Nachrichten

matomo