This article presents a solution based on relative scoring to detect Elasticsearch or Lucene search requests returning poorly matching results. It also explains why this problem is difficult to solve.
Assume that you are selling books and a user search “A Song of Ice and Fire”. Sadly, you don’t sell this particular book (poor you), but you are selling books containing either “song”, “ice” or “fire” in their title.
These books will be shown to the user as a result to his search, but he is likely to be disappointed.
Being able to detect search requests returning poorly matching documents is useful for the following use cases :
- For search analyst, to improve the search query when there is no good response;
- For sales, because when a query does not return good results, it may indicates a missing product in the catalog;
- For user experience, to reduce user frustration by returning a different result page when there is no good enough results.
In this article I will explain why detecting poorly matching document is difficult and propose a solution to this problem.
A first experience with text search using Elasticsearch (or Lucene) is required to understand most of this article. Even if all the examples use Elasticsearch, you will face the same difficulties using Lucene directly and the solution works in both cases.
Using Elasticsearch or Lucene, a score is computed for each document when a search query is performed. However, the score of a document does not indicate if it is a good match or not. It is a ranking among the documents returned by the query. If you are not convinced, try to run the following queries:
The first search query, matching perfectly the document, has a score of almost 1. The second search query has the same score, but match on only one term out of 4. More surprising, the third search query has a score of 1.6 even if it would be considered to match very poorly by most users.
I will not explain why this scores are returned in this article, but you can use the
explain parameter to get a detailed explanation.
This example does not try to demonstrate that Elasticsearch scoring is wrong. The score works perfectly to compare search results for a given search query. Its just not useful to compare different search requests.
The second problem is that the score is not bounded. If we could have a maximum possible score for a search request, then we could simply compare the returned score with the maximum score and it would be a good indication for the quality of the result.
Several reasons lead to an unbounded score, but the most easy to understand is that the score of a match query is computed as the sum of the term queries scores for each distinct term. Consequently, as the number of distinct term in a match request is not bounded, the sum is also not bounded.
The not really solutions
Normalize with the score of the first document
Probably because what we would like is a matching percentage instead of an unbounded score value, some users try to normalize the scores using the score of the first document as a maximum score. It works to compare the other documents to the first. However, we cannot assume that the first document is already a good match. In such case, all the documents may be considered has good match because they match as badly as the first document. Unless you always have a good first document (and you are very lucky), this solution will not work.
If you don’t already use function score query, have a look to the documentation. It is a powerful tool, but it does not solve this particular problem. If you already use function score query, but don’t see how it could solve this problem, then don’t try and keep using them for what they are good at. Some have tried (including me) and it does not work.
If you try to replace the whole elasticsearch score by function score query you will be likely to have poor search results. The elasticsearch score takes into account the frequencies of the terms which is not possible using function score. With function score, a term occurring very frequently will count has much has a term occurring rarely. You probably don’t want that !
Set a maximum score
This solution can be very tempting if you do not understand well how the score is computed. By experiments, it is relatively easy to detect a score that is hardly reachable by a search query. So instead of normalizing with the score of the first document, you normalize with a constant top score. Good idea ? No.
The major problem with this solution is that the score value depends on the number of matching terms in the user query. If a user search “elasticsearch in action”, the score will be the sum of the score for each term. This score will be always higher than a search only for “elasticsearch”. Consequently, if you use a constant maximum score, you will consider a result as a poor match just because the user searched only few terms.
There are other problems with this approach because the maximum score would be dependent of many varying factors such as the number of documents, the average length of the fields, etc. But I believe that the first reason is already good enough to avoid using a maximum score.
The working solution : relative scoring
As mentioned before, the problem is that there is not a maximum score for a given search query. But instead of a maximum score, we can compute an optimal score. The optimal score is the score that would get an optimal document for the search query. The optimal document is a virtual document that match very well the user request.
I will explain in the next paragraph what is an optimal document and how to build it, but I would like to emphasis two points first.
Relative scoring is not normalization. Using relative scoring, you will not get a ratio between 0 and 1, 1 being a perfect match. Because the score is not bounded this is not possible. However, you will be able to compare your search results with the score of a document which is expected to match very well the user request. You may have a score higher than the optimal score. In which case you definitely have a good match :-).
Elasticsearch has no idea about what would be a good match. All the other solutions tried to provide this information one way or another, but they are not working. The optimal document is the missing piece of information that is required to solve the problem.
Building the optimal document
There is no universal method to build the optimal document, but in most cases it is straightforward. Basically, you will just use the same mechanism as when you build the elasticsearch query. Instead of mapping the user input to fields in the query, you map them to fields in the document.
When several user input are mapped to a single search field you can join the values. That would be the case when the user can input a first name and a last name but your document contains only a field name.
There is also a few things to keep in mind:
- If you have
filterstatements in the search query, you must ensure that the optimal document validate the required constraints.
filterstatements can be safely discarded since they do not contribute to the score.
- If you apply transformation to the user input, e.g., automatic language translation, you must apply the same transformations to the optimal document.
Please note that the optimal document will not have the maximum score, hence the term relative scoring instead of normalized scoring. You can always get a higher score by repeating the same terms an infinite number of time.
Computing the score of the optimal document
The bad new is that there is no simple way to get the score of a document without indexing the document. Obviously, the optimal document should not be returned in search results.
Indexing the optimal document
You can index the optimal document in your index and add a boolean field such as
is_searchable to remove this document from standard search. Even if it is simple, this approach suffers from several drawbacks.
The first problem is that you will have to force the refresh of your index each time after indexing the optimal document before getting its score. You will also have to run the search query twice, even though the second time should be much faster because most statistics should be cached and you can add an
ids query to get only the score for the optimal document. You will also have to be very careful that all your search queries remove correctly the ideal documents. Eventually, you will also have to remove the optimal documents generated for each queries.
The second problem with this approach is that it will strongly increase the number of deletion in your index. Even if deleted documents are not searchable once a refresh has been performed, they still consume memory and may impact field statistics. You can read this article to learn more on the impact of deleted documents. Basically, deleting many documents can decrease the performance of search queries by up to 46%.
To avoid indexing a fake document, I wrote a plugin to get the score of a document by indexing it in a temporary in memory index. It adds a
/_docscore endpoint returning the score of a document for a given query. This plugin is available for elasticsearch version 6.7.x up to 7.2.x. Let me know in the comment if you are interested in a release for older versions.
_docscore endpoint accepts five parameters:
index: Name of the index having the mapping used to indexed the virtual document and the stats used to compute the score
type: Optional, default to
query: The search request
document: The optimal document
explain: Allows to retrieve the score explanation
I really hope you found this article useful. Let me know in the comments which parts are not clear enough. I am planning to write a series of articles about text search using elasticsearch, so don’t hesitate to add suggestions !
I am also an Elasticsearch consultant. If your company needs support using elasticsearch contact me ! I can also provide elasticsearch training for your team.