Sunday, December 16, 2018

How to interpret vague date input in a search query


This is a follow-up to this question Would an "around" search token make any sense?


If someone is performing a search by date would having an "around" this date feature be useful, and if so, how you would interpret their input. For example what date range would you include in the results if someone specified "around march" as their date query.


EDIT:



The answers so far from @sacohe and @PatomaS have been perfectly sensible, but not quite what I'm looking for, so let me constrain the problem a bit more.


Imagine the dates are mentioned in a witness report for some criminal incident. The report says "It happened around March". The witness the statement was taken from is no longer available, for reasons which I'll let people's fertile imaginations come up with!


Now, we want an algorithm to correlate that "date phrase" with other data in a database of incidents which have known dates. How do we convert that search date into a something we can use in a concrete query. Of course, you may find no matches and have to widen the search, but what would be your best interpretation for your first attempt?


Would you handle the phrases "around 12th March", "around March", "before Christmas" (or other religious/cultural holiday of your choosing) differently? Would the width of the date range be different for each?



Answer



One thing to consider is the how far in the past the target date or period is. For instance "around last Tuesday" would have a tighter focus than "around Christmas of 2010". Maybe call this report latency.


I'm an OO programmer so I tend to think in terms of objects. So I'd define a class (or data structure) that represents a range of dates, each date with a likelihood value. Let's call this class VaguePeriod and it's values could be stored in a DB. For simplicity say a likelihood can be from 1 to 100. So the phrase "before last Tuesday" would translate into a VaguePeriod where



  • last Tuesday - 1 -> (likelihood=50)

  • last Tuesday - 2 -> (likelihood=45)


  • last Tuesday - 3 -> (likelihood=40)


etc.


The report latency would determine how this VaguePeriod is constructed, if it's short (recent) the VaguePeriod would be tighter, if it's a long time ago it would be looser, a wider range of lower likelihoods.


So now the hard part is to translate natural language phrases into a VaguePeriod structure, and writing the search algorithm to search using VaguePeriods. Of course the search results wouldn't be boolean, it too would return likelihood values.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...