Monday, February 9, 2015

time series - How much data is needed to validate a short-horizon trading strategy?


Suppose one has an idea for a short-horizon trading strategy, which we will define as having an average holding period of under 1 week and a required latency between signal calculation and execution of under 1 minute. This category includes much more than just high-frequency market-making strategies. It also includes statistical arbitrage, news-based trading, trading earnings or economics releases, cross-market arbitrage, short-term reversal/momentum, etc. Before even thinking about trading such a strategy, one would obviously want to backtest it on a sufficiently long data sample.


How much data does one need to acquire in order to be confident that the strategy "works" and is not a statistical fluke? I don't mean confident enough to bet the ranch, but confident enough to assign significant additional resources to forward testing or trading a relatively small amount of capital.


Acquiring data (and not just market price data) could be very expensive or impossible for some signals, such as those based on newer economic or financial time-series. As such, this question is important both for deciding what strategies to investigate and how much to expect to invest on data acquisition.


A complete answer should depend on the expected Information Ratio of the strategy, as a low IR strategy would take a much longer sample to distinguish from noise.



Answer




Consider the standard error, and in particular the distance between the upper and lower limits:


\begin{equation} \Delta = (\bar{x} + SE \cdot \alpha) - (\bar{x} - SE \cdot \alpha) = 2 \cdot SE \cdot \alpha \end{equation}


Using the formula for standard error, we can solve for sample size:


\begin{equation} n = \left(\frac{2 \cdot s \cdot \alpha}{\Delta}\right)^{2} \end{equation}


where $s$ is the measured standard deviation, which you already have from your IR calculation.




High-frequency Example


I was testing a market-making model recently that was expected to return a couple basis points for each trade and I wanted to be confident that my returns were really positive (ie, not a fluke). So, I chose a distance of 3 bps $(\Delta = .0003)$. My sample's measured standard deviation was 45 bps $(s = .0045)$. For a confidence interval of 95% $(\alpha = 1.96)$, my sample size needs to be $n = 3458$ trades. I would have picked a tighter distance if I had been simulating this model, but I was trading live and I couldn't be too choosy with money on the line.




Low-frequency Example



I imagine that for a low-frequency model that was expected to return 1.5% per month, I'd want maybe 1% as the distance $(\Delta = .01)$. If the hoped-for Sharpe ratio were 3, then the standard deviation would be 1.7% $(s = .017)$, which I came-up with by backing-out the monthly returns. So for a confidence interval of 95% $(\alpha = 1.96)$, I'd need 45 months of data.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...