Choosing Performance Measures
Performance measures have to be suited to the use you’re making of your portfolio and to its important features. In this article I’ll first outline in broad terms the appropriate measures for different portfolios, and will follow that by a more detailed look at each of the most prominent and important performance measures, how to calculate it, and its advantages and disadvantages.
I’ve long been a strong proponent of using alpha as one’s primary performance measure. But if your portfolio includes a hedge, alpha is not an appropriate measure at all. If you have an unhedged benchmark (and I don’t really know of any hedged benchmarks), the strategy with the highest alpha will usually be the one that maximizes the hedging, since alpha will rise as beta goes down (due to the mathematics behind alpha and beta). In other words, if you want to maximize the alpha of your hedged portfolio, you’ll end up putting almost all your money into short positions.
Total return, measured by CAGR (compound annualized growth rate), is an excellent measure for a buy-and-hold portfolio, hedged or not, because it takes into account compounding. But if you’re going to be withdrawing 50% of your portfolio every year, it may not be the best measure, because it doesn’t take into account sequence of returns. In that case, median annual return might be better. (Avoid using average returns, like the Sharpe ratio does, because they really emphasize outliers, unless you’re going to do some trimming.) Similarly, if you’re facing large withdrawals and deposits on a quarterly or monthly basis, using median quarterly or monthly returns along with CAGR will be important. Lastly, CAGR is extremely dependent on your starting and ending dates. It might be good to measure it several times with different dates.
If drawdowns give you heartburn, then you should definitely take a simplified value at risk (VaR) or conditional value at risk (CVaR) into account. I like to look at these measures at 2%, 5%, 10%, and 15%. The period over which you measure VaR is extremely important. Whether you look at weekly, monthly, quarterly, or annual VaR depends on what causes you (or your investors) the most stress, on your reporting requirements, and on your own capacity for pain. If you’re comparing your VaR to that of a benchmark and the benchmark is far more diversified than your fund (i.e. you have a 10- or 20-position portfolio and you’re comparing your VaR to that of the S&P 500), you should probably avoid weekly or monthly comparisons, because your comparative risk is more concentration-based than performance-based.
The omega and gain-to-pain ratios are appropriate if you don’t really care about compounding and just want to maximize your winning periods and minimize your losing ones. These might be good measures for a portfolio with some fixed-income or futures components. They’re also excellent measures for trades rather than a portfolio.
The Sharpe ratio and its cousins (Sortino, information, and Treynor ratios) are extremely flawed measures that I do not recommend using. You can get all the information about portfolio return and risk from other measures. I will, however, suggest below an alternative method of calculating the Sharpe ratio if you’re really into it.
I have not discussed factor risk measures here (e.g. portfolio exposure to the value factor) because I simply don’t believe in them. Rather than avoiding exposure to factors and trying for idiosyncratic exposure, I believe that the secret to investment success is maximizing exposure to as diverse a collection of effective factors as practicable. That approach is simultaneously low in risk and high in rewards. Minimizing your exposure to these factors is not only counterintuitive but, I believe, harmful to your portfolio, despite the recommendations of Giuseppe Paleologo’s widely heralded Advanced Portfolio Management, which I hope to write a more detailed review of later this year.
A Word About Trimming
Before I get into the descriptions of the measures, I want to talk a bit about outliers and trimming.
Outliers are short periods in which your performance is extremely high or extremely low. Using measures that emphasize outliers will therefore distort your results; using measures that ignore outliers altogether do not. Any measure that relies only on medians or percentiles are insensitive to outliers (e.g. median excess returns, VaR). Any measure that relies on averages is quite sensitive to outliers. The most sensitive measures to outliers are measures that rely on the square of periodic returns, or least-squares methods, such as standard deviation and linear regression. These are at the heart of the most common risk-adjusted performance measures, the Sharpe ratio and alpha. So to improve those measures, I advise you to trim your returns.
The easiest way to do so is to take any months in which either the benchmark or your portfolio returns were far outside the norm and disregard them when making your calculations. Some people trim at the first and 99th percentile; others take a more extreme approach and trim at the fifth and 95th. A more complicated but better way to do this is elliptical trimming. You use a scatter plot to graph your periodic portfolio returns against the benchmark’s and construct an ellipse around the data points; then you trim all points that fall outside this ellipse. The formula is rather elegant: you can read about it in section #11 of an article I wrote a few years ago.
Total Return (CAGR)
Compound annual growth rate (CAGR) is perhaps the most straightforward and often-used measure of portfolio performance. Its calculation is relatively simple. Let’s say you start with X dollars and end with Y dollars over a period of Z years. Then the CAGR is (Y/X)1/Z – 1.
What if you’ve been withdrawing money from and/or adding more money to your portfolio periodically? The simplest way to deal with that is to use time-weighted returns. Divide your returns into discrete periods immediately before and after your withdrawals and deposits, then multiply those returns without regard to how long the periods are. For example, let’s say you add $2,000 to your portfolio on one date and withdraw $5,000 on a second date. And let’s say you had P dollars at the beginning of your period, Q dollars right before your deposit, R dollars right before your withdrawal, and S dollars at the end of your period, and your period is Z years. Your CAGR would then be (Q/P × R/(Q + 2000) × S/(R – 5000))1/Z – 1.
There’s a more complicated calculation called money-weighted returns, also called the internal rate of return. In the above scenario, you would solve for r in the equation P + 2000/(1 + r)X – 5000/(1 + r)Y – S/(1 + r)Z = 0, where X is the number of years between inception and the deposit and Y is the number of years between inception and the withdrawal. The solution of this equation is not at all straightforward. This method may better reflect your actual experience because it is heavily influenced by the size and timing of cash inflows and outflows. But due to that heavy influence, it’s relatively useless as a measure of actual investment performance.
There are two main advantages of using CAGR as a portfolio performance measure.
- It’s absolutely real. There’s no messing around with benchmarks or taking risk into account. It’s simply your growth rate.
- It’s standard. Everybody uses it, and everybody knows what it means.
There are also several disadvantages of using CAGR as a portfolio performance measure.
- It’s entirely dependent on the starting and ending dates. A measure of CAGR that starts or ends a month or two earlier or later might be entirely different. This makes it very easy to cherry-pick your starting or ending dates to make you look good.
- It takes no account of what happened to your portfolio between the starting and ending dates. You could have lost 70% of your portfolio at one point and then recovered, and nobody would ever be able to tell. As such, it’s not the most appropriate measure if you’re going to frequently withdraw money from your portfolio.
- It’s quite susceptible to outliers. One absolutely horrible or terrific month will make your CAGR extremely low or high compared to the rest of the period.
- It’s divorced from any benchmark, and therefore from what’s happening in the market. A 15% CAGR in 2008 is very different from a 15% CAGR in 2019.
The Sharpe Ratio, Information Ratio, Sortino Ratio, and Treynor Ratio
The Sharpe ratio is the standard formula for risk-adjusted returns. You take the average monthly (or weekly or annual) return of the portfolio, subtract the risk-free rate (this is often but not always the ten-year T-bill rate), and divide by the standard deviation of the monthly/weekly/annual returns. You then annualize the result by multiplying by the square root of twelve if you're using monthly returns and the square root of 52 if you're using weekly returns.
The information ratio is calculated the same way except that you use excess returns over a benchmark instead of portfolio returns both in the numerator and the denominator. For the Sortino ratio, you divide by downside standard deviation. And for the Treynor ratio, you divide by the portfolio’s beta rather than its standard deviation.
The main effect of these ratios is to favor steady, monotonic returns, even if they’re small, over returns with a great deal of variety. For example, an ETF that consists of short-duration investment-grade debt securities may have a much higher Sharpe ratio than an ETF that tracks the S&P 500, even though the return of the first ETF will be comparatively tiny. However, the Sharpe ratio was created to measure equity portfolio returns, not fixed-income portfolio returns, so that example is a bit specious.
The problems with these ratios are numerous, and that’s why I don’t use them.
- They’re all calculated on the basis of average rather than compounded returns. This makes no financial sense. A stock that goes up 20% and then down 20% has an average return of 0%, but its actual return is –2.02%. The larger and more numerous the returns, the worse this problem gets.
- They make no sense for negative returns. Let’s say you have two funds with average returns of –2%. The one with the higher standard deviation (and therefore the higher risk according to Sharpe) is going to have a higher Sharpe ratio. The same goes for all the other ratios. The problem is worst for the information ratio, as that one is the most likely to exhibit negative returns in the numerator.
- They ignore the sequence of returns. A portfolio that alternates 5% gains with 5% losses over a year has the same risk (and will get the same ratio) as a portfolio that has 5% losses for six months straight followed by 5% gains for six months straight, even though the experience of holding the second portfolio is far, far worse.
- They operate on the assumption that returns are normally distributed. In fact, portfolio returns tend to have a great deal of skew and fat tails. Non-normal distribution distorts standard deviations terribly.
- They’re extremely sensitive to outliers. If you have one month in which you made or lost 18% on your returns because you happened to be heavily invested in a stock that went zoom, that’s going to have an outsized impact on these ratios.
- The Sharpe, Sortino, and Treynor ratios are unrelated to any benchmark.
In the case of the Sharpe ratio, I suggest the following steps to fix some of these flaws.
- First, trim your returns using one of the methods I outlined above.
- In the numerator, use the geometric mean of 1 + the trimmed returns, then subtract 1.
- In the denominator, use the numerator as your mean, take the absolute deviation from that mean for each return, get the geometric mean of 1 + the deviations, then subtract 1.
- If the numerator is negative, multiply by the above mean deviation rather than dividing by it, then multiply that by 400 to get a comparable figure.
The result will be a number that, if positive, will be significantly lower than the conventional method, so take that into account when comparing the two.
Here’s an example. Let’s say you have twelve monthly returns (I’ll use my own returns from 2022 as an example): 3.5%, 5.4%, 7.7%, 1.5%, 6.3%, –7.1%, 4.2%, –1.7%, –10.2%, 12%, 3.4%, and –3%. And let’s say the risk-free rate is 0.2% per month.
To calculate the Sharpe ratio in the conventional way, you’d subtract 0.2% from each of the returns, then average them and divide them by the standard deviation. You multiply the result by the square root of 12 in order to annualize it. The end result is 1.63% / 6.34% × √12 = 0.89.
To calculate the Sharpe ratio using my method, you’d subtract 0.2% from each of the returns, then trim them by eliminating the bottom and top 5% of returns. Since we only have 12 returns, that would mean no trimming at all, so we’ll take each return twice and then eliminate one of the –10.2% returns and one of the 12% returns, for 22 returns in all. We’ll then add 1 to each return, get the geometric mean, and subtract 1. The result is 1.57%. In the denominator, we take the absolute difference between each of our returns and 1.57%. We then add 1 to each, take the geometric mean, and subtract 1. We get a mean of 4.35%. Our Sharpe ratio is now 1.57% / 4.35%, or 0.36.
Let’s compare those numbers to those of the S&P 500. Monthly returns during 2022 averaged –1.7%. The standard deviation of those returns was 6.53%. So the Sharpe ratio was –0.90. Notice that if the standard deviation had been much lower, say 4%, the Sharpe ratio would be worse: –1.47. Now using my alternative measure the numerator would be –2.04% and the denominator would be 6.24%, just slightly lower than the denominator in my own returns. Because the numerator is negative, one would multiply the two and scale by 400 for a Sharpe ratio of –0.51.
Alpha and Beta
If you create a scatter plot with the benchmark monthly excess returns (over the risk-free rate) on the x axis and your portfolio’s excess returns on the y axis and then plot a trend line through it using conventional regression formulas, the slope of the line will be your beta and the y-intercept will be your alpha.
Beta measures the responsiveness of your portfolio to market movements. If the slope is low—relatively flat—then what the market does makes very little difference to your returns. If the slope is steep—higher than one—then your portfolio returns are magnified by market movements.
Alpha is a risk-adjusted measure of excess returns. Technically, it is the portfolio return when the excess market return is zero.
Many people have puzzled over the fact that alpha and beta are generally inversely correlated. As a general rule, the higher the beta, the lower the alpha (though certainly not in every case). As I have shown, this is due to mathematics, not to anything particularly salient about the market or portfolio management or investor behavior. As long as the market generally trends upwards, alpha and beta will be inversely correlated; however, during bear markets, alpha and beta are correlated, so a high-beta portfolio will have a higher alpha.
Here are the advantages of using alpha and beta as portfolio return and risk measures:
- They are persistent. Beta is the most persistent of all portfolio risk measurements, and, when trimmed of outliers, alpha is the most persistent of all portfolio return measurements (unless you have a long-short portfolio, which exception is discussed below). I have performed numerous backtests to determine the most predictive portfolio measure, and trimmed alpha usually takes the cake.
- Unlike most of the previously discussed measures, they take into account what the market is doing.
- When properly trimmed, they are relatively impervious to outliers.
- Because beta is related to variability, a high alpha portfolio is likely to be less volatile than a portfolio with the same total return but a lower alpha.
Here are the disadvantages:
- If your portfolio is hedged or long-short, alpha and beta are inappropriate portfolio measures. Maximizing your alpha will almost always mean maximizing your hedge, even to the detriment of performance. This is due to the inverse correlation between alpha and beta. Hedging lowers beta, and thus raises alpha.
- Like total return, alpha and beta may not be the most appropriate performance measures if you’re frequently withdrawing money from your portfolio.
Median Returns and Median Excess Returns
These performance measures are very rarely used, but they have their advantages.
- They’re intuitive and very easy to grasp.
- They’re completely insensitive to outliers.
- Median return is the perfect measure for a portfolio subject to frequent withdrawals. This kind of portfolio should always have a good level of returns and shouldn’t be subject to huge drawdowns. If your withdrawals are monthly, then use median monthly return; if quarterly, use median quarterly return; if yearly, use median yearly return.
Disadvantages:
- They have very little to do with the actual returns you’re likely to have. They don’t take compounding into account at all.
- Investing can often be a bet that over the long run you’ll make more money due to a few unforeseen and outsize returns than from your median returns. In that case, median returns are irrelevant to investment success.
- They have nothing to do with the performance of the market.
The Gain-Pain Ratio and the Omega Ratio
The gain-pain ratio is the sum of all positive returns divided by the absolute value of the sum of all negative returns. This is also the omega ratio with a hurdle rate of 0%. The omega ratio expands the gain-pain ratio concept to set various hurdle rates. So, for example, let’s say your hurdle rate was 0.5%. You would subtract 0.5% from all of your returns and then apply the formula of the gain-pain ratio.
Advantages:
- This measure is perfect for traders, who often don’t really have a portfolio. If you place various short-term bets on various securities using margin and don’t have a fixed portfolio (for example, perhaps one day you’ll make seventeen bets and another you’ll only make two), then this measure is terrific.
- The omega ratio, especially if you use a number of different hurdle rates, is a great way to take into account the entire distribution curve of your results.
Disadvantages:
- This measure is extremely subject to outliers.
- It has nothing to do with a benchmark.
Drawdown
This is the difference between the current value of your portfolio and the value when your portfolio last hit its peak. Maximum drawdown looks at the various drawdowns throughout the portfolio’s history and focuses on the one or two largest. This is not a return measure at all, but only a measure of risk.
Advantages:
- It plays well to investor sentiment and behavior, and is intuitive to grasp.
Disadvantages:
- It’s perhaps the least persistent of all return/risk measures. A strategy’s historical drawdown has very little correlation with its future drawdown.
- Portfolios with large gains will have larger drawdowns than portfolios with miniscule gains. A risk measure shouldn’t necessarily punish success; ideally, it would be agnostic about success.
- It is unrelated to any benchmark.
- It is, almost by definition, extremely subject to outliers.
Value at Risk and Conditional Value at Risk
Value at risk is a complex measure; many people simplify it by using the Xth percentile of all periodic returns, where X is usually below 20. Conditional value at risk is the average of all returns below the Xth percentile, again where X is usually below 20. These are not performance measures but measures of risk.
Advantages:
- VaR is impervious to outliers.
- Neither measure punishes a strategy for its success.
- They literally present you with a worst-case scenario, and give you a very good idea of what that might be.
- A strategy with a consistently high VaR or CVaR will truly be less likely to lose your money than one with a low VaR or CVaR. In other words, unlike beta or the denominator in the Sharpe ratio, VaR and CVaR don’t measure market risk or variability but rather loss, pure and simple.
Disadvantages:
- These have no relationship with a benchmark.
What I Use
For all my long-only personal measurements, I use elliptically trimmed alpha. This is what I use for backtests as well as for measuring my own performance. I also use the same measure to backtest strategies for my put-based-hedge.
When I discuss my performance with others, I use CAGR based on time-weighted returns.
I also run a hedge fund. When establishing the strategy for that fund (i.e. how much leverage to use and how much of my portfolio I should devote to my hedge), I consider CAGR, median quarterly returns, and simplified VaR at 2%, 5%, 10%, and 15%, measured quarterly.
For setting the same guidelines for my private foundation, which donates money once a year, I use CAGR and median annual returns and ignore VaR.
I hope the above discussion will prompt readers to explore alternative performance methods. Personally, I have found them very illuminating.
My CAGR since 1/1/2016: 40%. My trimmed alpha since 10/29/2015: 31%.
My top ten holdings right now: PSIX, MNDJF (MND:CAN), DSP, RDWR, ACT:DEU, DNGDF (DNG:CAN), DTLIF (DTOL:CAN), BKTI, MSA:CAN, MFCSF (DR:CAN).
Comments