What is the best data structure/implementation for representing a time series?

Sunday, December 27, 2015

What is the best data structure/implementation for representing a time series?

I was wondering what is best practice for representing elements in a time series, especially with large amounts of data. The focus/context is in a back testing engine and comparing multiple series.

It seems there are two options:

1) Using an integer index, or
2) Using a date-based index

At the moment I am using dates, but this impacts on performance & memory usage in that I am using a hash table rather than an array, and it requires some overhead in iteration (either forward or backwards) as I have to determine the next/previous valid date before I can access it.

However, it does let me aggregate data on the fly (e.g. building the ohlc for the previous week when looking at daily bars) and most importantly for me allows me to compare different series with certainty I am looking at the same date/time. If I am looking at an equity issue relative to a broader index, and say the broader index is missing a few bars for whatever reason, using an integer indexed array would mean I'm looking at future data for the broad index vs present data for the given security. I don't see how you could handle these situations unless you're using date/times.

Using integer indexes would be a lot easier code wise, so I was just wondering what others are doing or if there is best practice with this.

Answer

Representing time series (esp. tick data) using elaborate data structures may be not the best idea.

You may want to try to use two arrays of the same length to store your time series. The first array stores values (e.g. price) and the second array stores time. Note that the second series is monotonically increasing (or at least non-decreasing), i.e. it's sorted. This property enables you to search it using the binary search algorithm. Once you get an index of a time of interest in the second array you also have the index of the relevant entry in the first array. If you wrap the two arrays and the search algorithm e.g. in a class you will have the whole implementation complexity hidden behind a simple interface.

Blog

Sunday, December 27, 2015

What is the best data structure/implementation for representing a time series?

No comments:

Post a Comment

technique - How credible is wikipedia?