backtesting - What are the popular methodologies to minimize data snooping?

Thursday, January 8, 2015

backtesting - What are the popular methodologies to minimize data snooping?

Are there common procedures prior or posterior backtesting to ensure that a quantitative trading strategy has real predictive power and is not just one of the thing that has worked in the past by pure luck? Surely if we search long enough for working strategies we will end up finding one. Even in a walk forward approach that doesn't tell us anything about the strategy in itself.

Some people talk about white's reality check but there are no consensus in that matter.

Answer

Strictly speaking, data snooping is not the same as in-sample vs out-of-sample model selection and testing, but has to deal with sequential or multiple tests of hypothesis based on the same data set. To quote Halbert White:

Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the methody yielding the results.

Let me provide an example. Suppose that you have a time series of returns for a single asset, and that you have a large number of candidate model families. You fit each of these models, on a test data set, and then check the performance of the model prediction on a hold-out sample. If the number of models is high enough, there is a non-negligible probability that the predictions provided by one model will be considered good. This has nothing to do with bias-variance trade-offs. In fact, each model may have been fitted using cross-validation on the training set, or other in-sample criteria like AIC, BIC, Mallows etc. For examples of a typical protocol and criteria, check Ch.7 of Hastie-Friedman-Tibshirani's "The Elements of Statistical Learning". Rather the problem is that implicitly multiple tests of hypothesis are being run at the same time. Intuitively, the criterion to evaluate multiple models should be more stringent, and a naive approach would be to apply a Bonferroni correction. It turns out that this criterion is too stringent. That's where Benjamini-Hochberg, White, and Romano-Wolf kick in. They provide efficient criteria for model selection. The papers are too involved to describe here, but to get a sense of the problem, I recommend Benjamini-Hochberg first, which is both easier to read and truly seminal.

Blog

Thursday, January 8, 2015

backtesting - What are the popular methodologies to minimize data snooping?

No comments:

Post a Comment

technique - How credible is wikipedia?