What:
A type of Information Retrieval (IR) that allows you to use Logical Connectiveslogical operators (AND, OR, NOT). It’s made possible by Inverted Index.
Example
• Collection: search Shakespeare’s Collected Works
• Boolean query: Brutus AND Caesar AND NOT Calpurnia
How to implement it?
- Take an inverted index’s posting list.
- Split it up using the same preprocessing done for the indexing.
- Now, we break the user’s query into terms and operators (
"scotland AND england"). - We then intelligently combine posting lists of each words into one
- E.g. For finding
"scotland AND england", we’d find the intersection of the posting lists. - For
"scotland OR england", we’d find the union of them.
- E.g. For finding
Limitations:
While powerful, this unfortunately just returns the documents that satisfy our search query. Ideally, we want them ranked by relevant. That’s why we invented way of scoring documents (Jaccard Coefficient)