Alternative Data: Updating Social Media and NLP Signals

Key Takeaways
  • We update our social media-based stock lists, starting with the Reddit Alert list. This list identifies stocks that have seen a recent increase in popularity on WallStreetBets – stocks added to this list tend to underperform the index following inclusion.
  • Our Reddit-based sentiment indicator continues to be an effective contra-indicator for the market. We also focus on extreme observations in the indicator, which have identified numerous turning points this year.
  • For the just-concluded earnings season, investors are rewarding stocks that beat earnings to a degree not seen in at least three years. While macro concerns continue to drive stock returns, idiosyncratic risk has been on the rise, which bodes well for stock-pickers.
  • We also update our signal based on management sentiment from earnings calls. This signal, which is derived using natural language processing, has generated consistent returns throughout its history as well as in 2022. We provide lists of favored and unfavored stocks generated from this signal.

Review and Update of Social Media-Based Sentiment Signals

In this research note, we update our social media and alternative data-based signals. We have built a proprietary database of comments from the WallStreetBets sub-reddit extending back over two years. We have two products around this database; (1) the Reddit Alert list[1], which includes stocks that have seen an increase in their popularity, and (2) a short-term market indicator[2] based on aggregate sentiment. We update the members and performance of the Reddit Alert list and provide an update on the sentiment indicator.  

We also previously introduced a signal based on the application of natural language processing (NLP) to earnings call transcripts[3]. We found that this signal produces excess return relative to the S&P 500, and that this excess return is uncorrelated to returns provided by standard risk factors. We update this signal through the recently concluded earnings season and provide a list of favored and disfavored stocks.

Reddit-Based Indicators

We start with the Reddit Alert list. Recall that the Reddit Alert list includes stocks that have seen a recent pickup in their mentions on WallStreetBets, along with an increase in trading activity for either the shares or associated options. After being added to the Reddit Alert list, the typical stock underperforms the S&P 500 over the next 21 days. Fig. 1 below shows the return (relative to the S&P 500) for the typical stock after it has been added to the Reddit Alert list. Typically, a stock will underperform the S&P 500 by over 4% in the 21 days following inclusion.

Fig. 1 – Performance of Stocks on Reddit Alert List

Source: Reddit, FactSet, S&P, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows the median return (relative to the S&P 500) for the typical stock in the 3, 5, 10 and 21 days following addition to the Reddit Alert list. Period of analysis is from August 30, 2020 through November 18, 2022. Transaction costs are not considered.

Fig. 2 lists the current members of the Reddit Alert list, along with their inclusion dates. Obviously, not all stocks underperform during their time on the Reddit Alert list, but on average, stocks being added to the Reddit alert list will underperform. The list includes a variety of companies, with some large names like Amazon, Meta, UPS and Disney as well as a group of smaller companies.

Fig. 2 – Reddit Alert List as of November 18, 2022

Source: Reddit, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows current members of the Reddit Alert stock list, their respective inclusion dates, and relative performance (vs. the S&P 500) since inclusion. Data is as of November 15, 2022. Transaction costs are not considered.

We also built a short-term market indicator based on the overall sentiment of Reddit users toward the S&P 500. This indicator proxies for retail sentiment in the market and has shown to be an effective contra-signal over short-term holding periods. We construct the indicator by assigning a sentiment score to each comment about the S&P 500, then aggregating those sentiment scores at a daily frequency. Fig. 3 shows how the sentiment proxy evolves over time.

Fig. 3 – Reddit-based Sentiment Proxy

Source: Reddit, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows overall sentiment for comments on the SPY. Sentiment is computed using a rolling 63-day z-score. Higher values indicate more positive sentiment. Period of analysis is from August 30, 2020 through November 17, 2022.

To evaluate the effectiveness of the sentiment indicator at predicting future market moves, we segment its history into periods when readings are low, medium, and high. We then evaluate the subsequent market performance following each group of observations. Fig. 4 shows the subsequent 5-day return of the SPDR S&P 500 ETF Trust (ticker SPY) as a function of the sentiment indicator. Note that the market tends to perform best following periods of negative sentiment. That pattern has continued during the past few months.

Fig. 4 – Subsequent 5-Day S&P 500 Return Conditioned on Sentiment Score

Source: Reddit, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows the average subsequent 5-day return for the SPY after periods when the sentiment is negative, neutral, and positive. Sentiment is classified as positive (negative) when the current sentiment reading is greater (less) than +1 (-1). Sentiment is computed using a rolling 63-day z-score. Number of observations is given below each category. Returns are measured using a one-day delay following the sentiment observation. Period of analysis is from August 30, 2020 through November 17, 2022. Transaction costs are not considered.

In addition to its use as a contra-indicator, we can also examine periods when the sentiment indicator reaches extreme levels. Obviously, when we focus on extreme values, the sample size falls, but the trends based on these extreme values are notable. Fig. 5 shows the evolution of the sentiment indicator since April 2022, with the red arrows indicating dates when it reached extreme low values (on May 19th, June 21st, July 13th and September 29th) and the one instance (August 12th) where it reached an extreme high value.

Fig. 5 – Reddit-Based Sentiment Proxy – Performance Around Extreme Observations

Source: Reddit, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows overall sentiment for comments on the SPY. Sentiment is computed using a rolling 63-day z-score. Higher values indicate more positive sentiment. Arrows indicate extreme measures for sentiment. Sentiment is classified as in extreme positive (negative) territory when the reading is greater (less) than 1.5 (-1.5). Period of analysis is from April 15 through November 17, 2022.

Following the extreme negative instances listed above, the S&P 500 returned 4.1%, 1.6%, 5.5% and 1.6%, respectively, while following the August 12th extreme positive reading, the market returned -3.6%. Based on this small set of observations, extreme values in the sentiment indicator seem to be a strong contra-indicator for the market.

We again compute forward returns conditioned on the level of the sentiment indicator but focus on performance following extreme readings in the sentiment indicator. The results are shown in Fig. 6.

Fig. 6 – Subsequent 5-Day S&P 500 Performance Conditioned on Extreme Sentiment Readings

Source: Reddit, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows the average subsequent 5-day return for the SPY after periods when the sentiment is extreme negative, neutral, and extreme positive. Sentiment is classified as extreme positive (negative) when the current sentiment reading is greater (less) than +1.5 (-1.5). Sentiment is computed using a rolling 63-day z-score. Number of observations is given below each category. Returns are measured using a one-day delay following the sentiment observation. Period of analysis is from August 30, 2020 through November 17, 2022. Transaction costs are not considered.

From Fig. 6, when we limit ourselves to extreme observations, the sentiment indicator shows higher efficacy, as the market returns following an extreme observation are larger in magnitude than for the standard breakpoints we used in the initial analysis above (in Fig. 4). Further, the likelihood of the market contributing positive (negative) sentiment following an extreme negative (positive) observation is higher than when the standard breakpoints from Fig. 4 are used (indicated by the orange bars). While the number of actionable observations falls, limiting ourselves to periods when the sentiment indicator reaches extreme values increases its efficacy as a contra-indicator.

From Fig. 5, we note that the sentiment indicator has recently reached extreme positivity. Based on the relationship discussed herein, this would portend a market downturn over the next week or two.

NLP-Based Sentiment Signal and Earnings Season Wrap-Up

In addition to the work we have done based on Reddit, we have also partnered with ProntoNLP, an expert in the area of natural language processing (NLP). Using ProntoNLP’s customizable system, we can evaluate the sentiment embedded in earnings call transcripts. In our initial effort, we found companies for which management sentiment was unusually positive generated uncorrelated excess return. Likewise, companies with negative sentiment tended to underperform.

Obviously, because this signal relies on the analysis of earnings call transcripts, it shows the highest frequency of updates during earnings season. Now that the Q3 earnings season is behind us, we update the NLP-based signal, and include the latest list of favored and disfavored stocks.

First, however, we highlight a relevant trend we have noticed during the past few earnings seasons. Generally, during earnings seasons, companies that beat earnings estimates are rewarded (they tend to outperform the index going forward) while those that miss estimates tend to be punished. Fig. 7 shows the trend for the past 12 earnings seasons for the average reward and punishment to companies that beat/miss earnings.

Fig. 7 – Companies Beating Earnings are Being Handily Rewarded

Source: Bloomberg, FactSet, S&P, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows the 3-day relative return for stocks beating (dark blue bars), in line with (gray bars) and missing (light blue bars) earnings estimates. An earnings beat (miss) is defined as the stock reporting earnings at least 2% greater (less) than consensus estimates. Period of analysis is from December 16, 2019 through November 17, 2022. Performance is relative to the S&P 500. Transaction costs are not considered.

While the difference between the reward for beating and the penalty for missing is persistent across history, during the recently concluded Q3 earnings season, companies beating earnings are seeing a larger boost to performance than at any time over the past 12 quarters. Generally, the results in Fig. 7 point to an increase in the degree to which stocks are moving on idiosyncratic (i.e. stock-specific) news.

In Fig. 8, we show a measure for the share of stock returns that are NOT explained by the performance of the overall market and the associated sector. This chart proxies for the degree of differentiation between stocks – when this value is high, stock-pickers have a larger opportunity set to which they can apply skill in picking companies. As a result, when differentiation between stocks is high, active managers and stock pickers should theoretically be able to generate more alpha.

Fig. 8 – Degree of Importance for Stock-Specific Factors

Source: S&P, FactSet, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows the median idiosyncratic component of return for S&P 500 constituents. Idiosyncratic component of return is computed as 1 minus the share of variance explained by a factor model using the market and sector returns as factors. Factor model is estimated using rolling 13-week periods. Period of analysis is from January 2014 through November 18, 2022.

While the importance of idiosyncratic risk fell for much of the summer, it has recently picked up, so stock-picking has had more potential to generate alpha in recent months.

We now return to the ProntoNLP-based signal. Recall that the signal identifies companies for which management sentiment has increased (and decreased) in the most recent earnings call. In the original publication, we noted that stocks where management saw the largest increase in relative sentiment tended to subsequently outperform the index; likewise, companies where management sentiment soured tended to underperform. The performance of the signal displayed consistency across years and quarters. We update the performance through Q3 (see Fig. 9); we see that the ProntoNLP-generated signal has continued to generate outperformance through 2022.

Fig. 9 – Quarterly Performance of ProntoNLP-based favored and disfavored baskets

Source: ProntoNLP, S&P, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows quarterly aggregated return of ProntoNLP sentiment-based long (orange bars) and short (gray bars) baskets relative to the S&P 500 index. Long (short) basket consists of stocks in the top (bottom) decile of adjusted sentiment score. Long and short baskets are equally-weighted and rebalanced monthly. Sentiment is based on ProntoNLP scores for stocks within the S&P 500. Period of analysis is from January 2017 through October 2022. Transaction costs are not considered.

In Fig. 10, we show the latest sector weights for the favored and disfavored baskets from the NLP signal. Both baskets include stocks from a variety of sectors, with the favored basket heavy on stocks from the industrials and health care sectors, while the disfavored basket has its largest concentration in the technology and financials sectors. The constituents for each basket are listed in the appendix.

Fig. 10 – Sector Weights in Current Baskets from ProntoNLP-generated signal

Source: ProntoNLP, Fundstrat analysis.Alternative Data: Updating Social Media and NLP Signals
Note: Shows sector weights of current ProntoNLP sentiment-based long (left-hand chart) and short (right-hand chart) baskets. Long (short) basket consists of stocks in the top (bottom) decile of adjusted sentiment score. Sentiment is based on ProntoNLP scores for stocks within the S&P 500. Baskets are as of October 31, 2022.

Conclusion

In this research note, we update our previously published signals based on alternative data. The Reddit Alert list, which includes stocks that have seen a recent pickup in mentions, along with an increase in trading activity, continues to identify underperforming companies. Our Reddit-based sentiment indicator for the market has also continued to be an effective contra-indicator, particularly when it reaches extreme values.

We also discuss the just-completed earnings season, and how investors are now rewarding earnings beats to a larger degree than at any point over the past 12 quarters. Also, through our partnership with ProntoNLP, we apply natural language processing to extract management sentiment from earnings calls and use that sentiment to identify potential outperformers (and underperformers). Historically, companies that see large improvements in management sentiment tend to outperform going forward. This trend has continued through 2022.

Appendix

Below are the current constituents of the favored and disfavored baskets as determined by the ProntoNLP-based signal.

Favored Basket:

Alternative Data: Updating Social Media and NLP Signals

Disfavored basket:

Alternative Data: Updating Social Media and NLP Signals

[1] Alternative Data: Reddit Alerts

[2] Alternative Data: Sentiment from Social Media

[3] Alternative Data: NLP-Based Sentiment

Disclosures (show)

Get invaluable analysis of the market and stocks. Cancel at any time. Start Free Trial

Articles Read 2/2

🎁 Unlock 1 extra article by joining our Community!

You are reading the last free article for this month.

Already have an account? Sign In

Don't Miss Out
First Month Free

Trending tickers in our research
Ticker Price Chg%
$122.27
-1.63%
$486.58
+0.97%
$201.50
+0.15%
$227.95
-2.32%
$22.20
-1.60%
$32.75
-2.93%
$121.22
+0.55%
$41.68
+1.17%
$164.08
+4.05%
$316.78
+1.88%