What:
Instead of asking “How similar are the words in the document and the query?”, we ask “What is the probability that this document is relevant to the user’s need?“. I.e. Calculate
How:
Simple. ML Classification. Imagine you’ve got 2 classes:
- A small amount of RELEVANT documents.
- A huge amount of NON-RELEVANT documents.
Okapi BM25:
As a result, we’ve got the BM25 ranking formula, a far more advanced version of TF-IDF.