~/leocamacho.co

Get Around

🧠 EdinburghAI
Co-founder and President of my University's AI Society
🛠️ Projects
Side projects I've worked on
📝 Essays
Thoughts on AI, startups, and the future

Contact Me

📧 Email
💼 LinkedIn
🐦 Twitter

Query Expansion

Made Mar 02, 2026modified Mar 02, 20262 min read

What:

When doing Information Retrieval (IR), you can add new terms to the user’s original query for better results.
If a user is searches for “car”, but the documents only contain “automobile”, then traditional searches will fail.

1. Thesaurus Based:

Inject synonyms from a pre-built dictionary.
Dictionary can be manual (unscalable), built automatically with word co-occurrence.
Loses context (😉)

2. Relevance Feedback:

Manual:

The user submits a query, and we return 10 documents we assume are ideal.
Users mark results with positive (👍) or negative (👎) feedback.
We use this information to come up with a new optimal query.
We then run this improved query.

Automatic:

Exactly the same. Except, we assume the top $k$ documents are relevant.
We automatically refine all of the documents using similar methods to before.
We instantly surface this new and improved query to the user.
Problem? Query drift, where if the top $k$ was incorrect, this new refined version will be wayyy off.

Combining With LLMs:

These methods are fast. Much faster than BERT for example. Thus, we use these methods to come up with the top 1000 most promising documents, then BERT to come through and rerank better.

Thus a hybrid, two-stage pipeline:

Stage 1: Retrieval Use the classic, efficient method (like BM25 with an inverted index) to scan billions of documents and retrieve the top 1000 most promising candidates in milliseconds.
Stage 2: Re-ranking Use the powerful, context-aware, slow model (like BERT) to carefully re-rank only those 1000 candidates. This model can understand the deep semantic meaning and produce the final, high-quality top 10 list that the user sees.

Graph View

What:
1. Thesaurus Based:
2. Relevance Feedback:
Combining With LLMs:

Created with Quartz v4.4.0 © 2026

GitHub
LinkedIn
Twitter