Suppose you are running a portfolio of quantitative strategies and that you develop a new potential strategy to be added to the mix. Assume for simplicity that the new strategy is independent of the existing strategy. The new strategy relies on data which is available going back X years. You proceed by backtesting and optimizing the parameters of the new strategy on an "in-sample" portion of your dataset, while reserving an "out-of-sample" portion for validation. The new strategy's weight in your portfolio will be determined by its out-of-sample performance. Your goal is to maximize your overall Sharpe ratio. What is the ideal ratio of in-sample length to out-of-sample length?
Answer
Interestingly enough there is no scientific theory that suggests what fraction of the data should be assigned to training and testing and results can be very sensitive to these choices.
From Quantitative Trading by Ernest Chan (p. 53-54):
Out-of-Sample Testing Divide your historical data into two parts. Save the second (more recent) part of the data for out-of-sample testing. When you build the model, optimize the parameters as well as other qualitative decisions on the first portion (called the training set), but test the resulting model on the second portion (called the test set). (The two portions should be roughly equal in size, but if there is insufficient training data, we should at least have one-third as much test data as training data. [...]
For more sophisticated methods see Evidence-based technical analysis by David Aronson p. 321-323.
I would add that once the strategy has been revised to reflect such data it is no longer "out-of-sample" or in other words: If you optimize your strategy "out-of-sample" you will incur data snooping bias via curve fitting nevertheless! Or as Aronson puts it:
the Virginal status of the data reserved for out-of-sample testing has a short life span. It is lost as soon as it is used one time.
No comments:
Post a Comment