I highly recommend backtesting any portfolio strategy. I do my backtesting using Portfolio123, but there may be other options. I’ve backtested a lot of strategies over the years, and have run a number of correlation studies as well, comparing backtested results during one period to “out of sample” results for a later period in order to figure out the most correlative backtesting methods. Now I want to offer thirteen tips for better backtesting.
- Double or triple your portfolio size. If you normally hold twenty stocks, run a backtest on a portfolio of fifty. If you normally hold a hundred stocks, run a backtest on two or three hundred.
- Never run a backtest over a period less than six years long. And studiously ignore all backtest results that are shorter than six years. Jim O’Shaughnessy gives a great example of misleading backtests: over a five-year period, he found that holding the top fifty stocks in terms of annual sales gains produced terrific returns. Those years just happened to be 1964 through 1968; over the long term, it’s a terrible strategy. Six years is my absolute minimum; ten to twenty years is far better.
- Never backtest using the entire universe of stocks that you’re going to be choosing from. Find some way of splitting it into pieces (half is good) and test each universe separately. Do this several times, and then average the results. Similarly, you can bootstrap your time periods by choosing just half of your weekly or monthly returns, and then average the results. Once again, Jim O’Shaughnessy spells this out well: see his article The Power of Back Testing Investment Strategies. In designing The Stock Evaluator, my Seeking Alpha Marketplace subscription service, I created thirteen different stock universes delimited by size and sector (for example, one universe excludes energy, materials, and industrials, and includes only stocks in the bottom 40% size-wise; another includes only stocks between 40% and 80% by size and excludes discretionary, health care, financials, utilities, and real estate; and so on), then once I had a good strategy for each one, I averaged them all.
- Ignore the total returns of your backtest. Instead look at more robust portfolio performance measures. I favor three: the omega ratio, median excess returns, and Jensen’s alpha (measured weekly). I’ll explain briefly how to calculate each of these. The omega ratio is also called the gain-to-pain ratio, and there are a few ways to calculate it. I figure out the median return of the benchmark (weekly, monthly, yearly or whatever interval you want to use). I then take the backtested return of my strategy and for each period subtract that benchmark median return. I then add up all the net positive returns and divide by the negative of the sum of the net negative returns. To calculate median excess returns, I take the monthly returns of my backtested strategy and subtract the returns of the benchmark over the same period. I then get the median of those numbers. To calculate Jensen’s alpha, take the weekly returns of your strategy and the weekly returns of the benchmark and find, using regression analysis, the y-intercept, i.e. the expected return of your strategy when the benchmark return is zero. Using total returns instead of these portfolio measures is often very misleading since your strategy could have worked extremely well (or extremely poorly) during a short period and affected your compounded earnings considerably. If you use alternative (and better) portfolio performance measures, you can more easily discern which is the best of your tested strategies. I would not, however, recommend using the Sharpe ratio, which has a multitude of problems (I’ve spelled these out in two of my previous posts, The Ultimate Measure of Portfolio Performance and William F. Sharpe, Beta, and the Paradox of Risk-Adjusted Returns).
- If you’re combining various factors into a workable system, use increments of at least two or three percent. The difference between a factor weight of 6% and 7% is not worth backtesting. Too much time is wasted trying to decide exactly how much weight to allocate to various factors. Getting those factors right is far more important than their weight.
- Do different kinds of backtests. I combine the results from a screen that rebalances weekly with those of a rolling backtest that holds for eight weeks. For a system with less turnover, concentrate more on rolling backtests. A straight simulation of a strategy that rebalances quarterly will only have one-thirteenth of the data as a rolling backtest performed weekly. So don’t rely only on precise simulations, but take advantage of other backtesting possibilities.
- Employ recency bias. The backtest of a strategy over the last twelve years is far more likely to correspond to near-future returns than one performed over the previous twelve years. However, it’s important to remember that while bull markets are often quite similar, bear markets are usually extremely different from each other (see my post on Bear Market Investing).
- Never rely only on a backtest of a factor in isolation. Instead, incorporate it into your factor set and see if it improves your results or not.
- Never expect your out-of-sample results to have a compound annual growth rate greater than one-half of that of your best backtested results. In other words, if your backtests are showing a CAGR of 30% per annum, expect no more than 15% going forward. Expecting out-of-sample returns that match in-sample returns is not just unrealistic, it’s a major tool of snake-oil salesmen. The system I use for my everyday investments gives me a backtested nineteen-year CAGR of well over 100%. I know that’s completely unrealistic, I only expect 50% at best, and I know that will vary greatly from year to year. (I got 45% in 2016, 58% in 2017, and my annualized return in 2018 so far is 36%, but I'm pretty sure that one of these years I'm going to be in the red.)
- Backtest lots of systems and variations and adopt the average or a combination of those that give you the best performance. Don’t just backtest one system and tweak it; and don’t invest in a lot of weak systems hoping that one will outperform.
- Never derive your rule or factor from the data. Factors have to make intuitive and financial sense first and foremost. Then you can see if the data backs them up.
- Use reliable data and avoid look-ahead bias. The data on Portfolio123 is reliable, but that on other websites may not be. Make sure there’s no survivorship bias in your backtests, and make extra sure that share count—invaluable for per-share measures—is accurate (it often isn’t, especially for ADRs and stocks that have undergone splits). Make sure that your database avoids look-ahead bias: if an earnings announcement comes out on the evening of the 7th, your backtest should not recommend that you buy that stock on the 8th, since the data from the announcement would probably have taken at least a day or two to be incorporated into the database that you use. You need real point-in-time data.
- Never include stocks that you wouldn’t have been able to buy. A lot of misleading backtesting is done on stocks with liquidity so low that you wouldn’t actually have been able to buy them in the quantity that your investment strategy requires.
If you have any questions about any of these principles, I’d be glad to answer them in the comments below.
CAGR since 1/1/16: 49%.
My top ten holdings right now: RCKY, IRMD, AUDC, ZYXI, TZOO, TRIB, XOXO, ORN, MCHX, EIGI.
Comments
You can follow this conversation by subscribing to the comment feed for this post.