I'm working on a news search, and I find that sorting by the number of keyword matches tends to produce stale results, but sorting by date tends to produce irrelevant results.
How do I strike a balance? I'd rather have a good default so users don't have to muck about the settings themselves.
Answer
TL;DR: Use a multi factor ranking system.
A good example to follow is the way that Google rank search results. We of course don't know the precise details of their ranking algorithm, but they have arguably done the most research on this and have the most success. What we do know for sure is that Google include a large number of factors and apply a weighting to each to give a final results ranking.
I'll try to give you a crude example of how it could be done:
Assign a value to each result based on its age.
the last 10 minutes = 100
the last hour = 80
the last 6 hours = 70
the last day = 60
the last week = 40
etc.
Assign a value to the keyword density
5 or more matches = 100
4 matches = 80
3 matches = 60
2 matches = 30
1 match = 10
0 matches = 0
Create a weighting matrix
Date vale = 8
Keyword density value = 4
Work out the rank value for each article
Multiply the value for each factor by its value in the weighting matrix for each article
An article from 1 day ago with 4 keywords would have a rank value of:
60 * 8 + 80 * 4 = 800
An article from 10 minutes ago with 3 keywords would have a rank value of:
100 * 8 + 60 * 4 = 1040
You would then show each result by it's computed rank value.
Some things to note here. You don't strictly speaking need the weighting matrix, but it makes it easier to tune the results - which you should do. Also, to get good results you would probable need to include more than two factors. You could for example assign weighting to the length of the article; or the publisher of the article. All these choices are really more of an art than a science, so you will need to play around with it a bit.
No comments:
Post a Comment