The Global Industry Classification Standards (GICS) classify industries into eleven different sectors, and do so on a purely intuitive rather than a data-driven basis. But once you look past that intuitive level, the sectors make no sense. What do Cigna and Amgen have in common, really? Yes, they’re both in the business of “health care.” But as businesses go, they couldn’t be more dissimilar. What does a semiconductor manufacturer have in common with an IT consulting service? They’re both in “technology,” but I can’t think of anything else. What does an airline, a defense contractor, a manufacturer of electrical equipment, and a human resources firm have in common? They’re all “industrials.” But they operate in completely different ways, and the market knows that very well.
So I started wondering if there were a better way to classify industries than by using GICS sectors. There are several ideas out there.
One is to use “themes.” Portfolio123 classifies all industries into five themes: macro economic (most industrials, transport, discretionary, most materials), population growth (staples, some health care, communications, utilities), special (energy, precious metals, airlines), financial (banks, insurance, real estate), and innovative (biotech, pharma, tech). This makes intuitive sense, but almost all of the GICS sector classifications (with the exceptions of materials, industrials, and health care) remain intact, so it’s more like a grouping of sectors than an attempt to rethink them altogether.
Some folks at Open Matters, a machine learning company, suggest that we classify companies according to whether they’re asset builders (make and sell physical things), service providers (use people to offer services), technology creators (generate and deliver intellectual property), or network orchestrators (facilitate transactions and interactions). I like this idea, but I haven’t seen anyone follow up with it and actually classify companies in this way. A similar idea, suggested to me by an RIA on a Portfolio123 forum, is to classify industries in five groups: input resources, business-to-business, business-to-consumer, infrastructure and government, and facilitators.
Another way to approach this problem is to look at industry factors such as gross margin, growth rate, regulatory barriers, and so on, and classify industries accordingly. This would be a massive project, and completely dependent on which factors were chosen.
So I’ve come to the conclusion that the least biased method of dividing industries into groups is to simply let the market do it for me. As one friend told me, “The market is the ultimate crowdsource.” One can gather the returns for each industry and then group them by how those returns are correlated.
So that’s what I did. I used the GICS to classify all stocks into one of sixty different industries. I then generated an equity curve for each industry from January 1999 to January 2019, based on equal weight per company (excluding stocks with extremely low liquidity) and rebalancing annually. I then created a correlation matrix (using Kendall’s tau as my measure of correlation). If two industries had market returns that were very similar, their correlation was close to 1; if their returns were extremely dissimilar, their correlation was below 0.
Once that was done, the problem was how to group the industries based on their correlations. After a lot of research and experimentation, I ended up relying on the k-means++ algorithm, which is a bit complicated to explain. However, let me try via an analogy.
Imagine a swarm of flies hovering various distances from a central object. That central object is the return of the market, and all the flies are the returns of the various industries. The returns of some industries—metals, for instance, or oil and gas companies—are going to be very far from the middle, because their correlation to the market is low. The returns of others—health care providers, insurance, machinery—are going to be quite close to the middle, because their correlation to the market is high.
What a good clustering algorithm will do with a swarm like this is to take the fly that is farthest from the middle and see what flies are closest to it, within reason. Those flies then get placed within a cluster. We then see what fly is next farthest from both the middle and that first cluster, and form a cluster around that fly. In the end, with a lot of iterations and difficult choices, you can create coherent clusters.
Now these algorithms tend to create clusters that are somewhat larger or smaller than I wanted. I wanted clusters that represented between 2.5% and 25% of the market as a whole when looked at historically. I had to continually adjust my parameters in order to get clusters of that size. I also had to take some of the outer clusters and remove them from my data in order to get any sort of differentiation for the inner clusters.
Here are the industry clusters I found, in rough order from least differentiated (closest to the market as a whole) to most (on the edge of the swarm). (I excluded REITs, MLPs, royalty trusts, and BDCs from consideration here, because those operate on such different principles than other stocks.) Please keep in mind that I clustered these industries based solely on their historical returns, not on any preconceived idea of how well they fit into distinct categories; in fact, I clustered them more or less blindfolded, without paying attention to what industries they were, just so that I could be objective. Also please keep in mind that industry clustering is never going to be very precise.
- PRIMARY. Staples, insurance, electricity, gas, water, road & rail, health care providers and technology, packaging. These are core stocks, commonly known as “defensive” stocks. They’re the products that you simply can’t do without, no matter how bad the economy is.
- SECONDARY. Chemicals, aerospace and defense, machinery, conglomerates, trading companies, professional services, health care equipment & tools, internet retail. These are complicated businesses that mostly depend on other businesses. Few of them are direct-to-consumer operations.
- BASIC. Paper, wood, clothes, stores, drugs, infrastructure, auto parts. This category is very similar to the “Primary” category above: these are also defensive stocks.
- COMPLEX. IT services, software, wireless, non-traditional utilities, construction. These are complicated goods that fall in between the “Secondary” category above and the “Edge” category below.
- SERVICE. Commercial and consumer services, finance, air freight, building and construction supplies, leisure. This is similar to the “Secondary” category except that it includes more direct-to-consumer businesses and, with an exception or two, is centered around the idea of providing services rather than goods.
- EDGE. Biotech, semiconductors, computers, electronics, media. These are cutting-edge businesses that depend upon innovation and creativity.
- CREDIT. Banks, real estate, distributors, cars, furniture. These are businesses that respond sharply to the credit cycle.
- SPEED. Internet, social media, telecom, communications equipment, airlines. These are businesses that depend upon and benefit from long-distance communication.
- EARTH. Energy, metals, electrical equipment, marine shipping. Stuff you get out of the earth, along with the means of getting it from one place to another.
(These clusters—their names and the industries included—are the expression of my idea about sectors and clusters, and are therefore protected by US copyright.)
Here’s how the clusters performed over the last twenty years:
Here are some possible uses for this new classification.
- Cluster rotation. Keep track of the returns of each cluster and buy the ones with the strongest momentum or the most undervalued ones.
- Cluster concentration. Invest in the clusters with the least volatile returns, or the clusters with the lowest beta, or the clusters with the highest alpha. (The least volatile clusters are, in order, secondary, basic, and primary, and the most volatile are earth, credit, and service. The clusters with the lowest beta are primary, credit, and secondary, and with the highest beta are edge, speed, and earth. The clusters with the highest alpha are primary, secondary, and complex, and the lowest are speed, earth, and credit.)
- Recognize the differences between clusters in terms of investment strategy. The primary, basic, and credit clusters will respond best to a value strategy, while the secondary, edge, and service clusters will respond best to a growth strategy. Sentiment indicators are very important for the service and speed clusters and relatively unimportant for the credit cluster. You really want to favor tiny companies in the secondary cluster, while in the primary, basic, and credit clusters, size doesn’t matter as much.
So I propose that we replace the eleven GICS sectors with these nine clusters. I think they more accurately reflect the real differences between different industries.
My top ten holdings right now: ARC, PCMI, GSB, RLGT, CTEK, PERI, PFSW, NTWK, CLCT, PDEX.
CAGR since 1/1/16: 41%.
Comments
You can follow this conversation by subscribing to the comment feed for this post.