The Law of Small Numbers: Why Statistical Inference Errors Destroy Trading Accounts

In the excerpt below Matthew Rabin’s quotes from his white paper are in italics. He is responding to the bold news briefs:

Source: Inference by Believers in the Law of Small Numbers, Matthew Rabin, Department of Economics University of California, Berkeley

FROM NEWS REPORTS: Sudden Reversal for Coles: Warriors guard is now hitting jumpers with confidence. The body filling out that No. 12 in a Warriors uniform is the same one as in the last three years, the one prone to clanking jump shots and running the offense at a glacial pace…Ever since Bimbo Coles got a release from the deepest part of the bench a week ago…the nine-year point guard has been a changed player…Considering Coles’ shooting percentage has declined in each of the last five years, to a career-low of 37.9 in 1997-98, the turnaround is stunning. In the last four games, Coles has shot 15-for-26 (57.7 percent)…The mind is a crazy thing, Coles said. When you totally lose your confidence, you’re not going to play well. I’m starting to regain my confidence and going out there and having fun…The confidence was especially evident on the late jumpers, the last of which tied the game 87-87 with 1:33 to play.

In the four games following the stunning four-game turnaround, Coles went 7 for 21 for 33%. After regained confidence led to his stunning turnaround, Coles apparently either re-lost his confidence, or became over-confident. It is very plausible that a player can have significantly different success from season to season or team to team (Coles had switched teams the previous season), or even game to game because of changed team compositions or changed position, or different opponents (this headline was written after the Warriors defeated the L.A. Clippers, who were 0-17 at the time, and finished the season with the leagues worst record).

Indeed, for the remainder of the season Coles did improve over the previous year, and finished the season by shooting 137 for 303 (45.2%) following the turnaround, for a year end total of 156 for 348 (44.8%). But the article made an inference that was statistically unwarranted, yet typical of such articles. The chance of a 37.9% shooter making 15 out of a given 26 shots is about 2%. If (say) 200 NBA players a week have enough shots to warrant such a headline were they to perform comparably stunning four-game turnarounds, about one player per day would warrant this headline. It could also be noted that had Coles remained a 37.9%, i.i.d. shooter, then the chance of him having had a 15-for-26 streak at some point in his 348 shot season would be about 76%. Roughly speaking, the standards by which a four-game performance gets labeled a stunning turnaround are such that if most players don’t experience turnarounds in either direction throughout their career, they will be deemed to have had a stunning turnaround at least once a season.

While this example of over-inference is typical of sports and financial media analysis, a more gruesome and more worrying example of the gamblers fallacy comes from an article in the NY Times Magazine titled How Not to Get Killed on Deadline. [It is reported] that:

FROM NEWS REPORTS: In hostile-environment school, foreign correspondents learn how to improve their chances of surviving kidnappings, cross-fires and other perils of the workplace. The article is about advice given to journalists by a company called Centurion Risk Assessment Services Ltd. Centurion gives the following advice to journalists paying for war-survival training: In mortar attacks…lie down. If you can, crawl into one of the holes made by a previous shell because lightning rarely strikes twice in the same place.

I do not know if, within a journalist’s crawling range, the pattern of mortar attacks exhibits positive or negative correlation. But the metaphor chosen to be persuasive is a commonplace metaphor used to convey intuition for a commonplace misguided belief in a law of averages that says that once a rare event occurs, it becomes less likely it will reoccur, because such recurrence will throw averages out of whack. The lightening metaphor is striking: In actuality, heading for the same spot where lightning struck earlier is a bad idea in a thunderstorm, since lightening is more likely to hit a spot it has hit before than to hit a spot for the first time. Lightening rarely strikes twice, but only because it rarely strikes once.

What is the point from a trend following trading perspective (or trading in general)? When someone speaks of a great 3-month trading run and that’s all there is…be very wary.

What This Means for Evaluating Trading Performance

Rabin’s two examples demonstrate the same error from opposite directions. In the Coles case, a short hot streak was interpreted as evidence of genuine improvement rather than as normal statistical variation in an otherwise mediocre shooter’s performance. In the mortar shell case, a low-probability event was interpreted as making a repeat less likely, when in fact it made no difference to the subsequent probability of the next shell landing in the same location.

The direct trading application is in the closing sentence: be very wary when someone speaks of a great 3-month trading run and that’s all there is. Three months is the Bimbo Coles four-game streak applied to trading performance. The statistical probability of any trader having a strong three-month period over the course of a year is high enough that three months of good performance is essentially meaningless as evidence of genuine edge. A 37.9% shooter will produce a 15-for-26 stretch at some point in a 348-shot season 76% of the time. A mediocre trader will produce a strong three-month period at some point in any given year with comparable regularity.

The implication for evaluating trading systems and managers is direct. Three months of performance is insufficient. One year is barely sufficient. Three to five years across different market environments, covering both trending and choppy conditions, rising and falling markets, and different volatility regimes, is the minimum sample needed to assess whether performance reflects genuine systematic edge or normal statistical variation. A system or manager with only three months of good results is offering exactly the kind of insufficient sample that Rabin demonstrates leads to over-inference. The question is always: what is the chance that a random walk of this length would produce this result?

This is also the argument against chasing recent performance when allocating to trend following managers. The manager who had the best returns last year may have been in the right markets at the right time. The manager with a consistent 15-year audited track record has survived enough different market environments to provide statistically meaningful evidence of a genuine systematic edge. Sample size matters. Small samples mislead. This is the law of small numbers applied directly to the most common error in investment decision-making.

Frequently Asked Questions

What is the law of small numbers?

The law of small numbers is the mistaken belief that small samples should reflect the properties of the larger population they are drawn from. People expect short sequences to be representative and draw confident conclusions from insufficient data. A four-game hot streak is not evidence of a changed player. A three-month trading run is not evidence of genuine edge. Both require much larger samples to support the conclusions observers typically draw from them.

What is the gambler’s fallacy and how does the mortar shell example illustrate it?

The gambler’s fallacy is the belief that once a low-probability event occurs, it is less likely to occur again because the average must balance out. The mortar shell advice illustrates this perfectly: the shell crater is not a safer location because the shell already struck there. Each shell’s landing location is independent of the previous one. The law of averages does not operate on individual sequences of independent events.

How long must a trading track record be to be statistically meaningful?

Long enough to cover multiple market environments, including trending and range-bound periods, rising and falling markets, and different volatility regimes. Three to five years is typically the minimum, and even that can be misleading if the period happened to favor the particular approach. A mediocre system will produce impressive performance in the specific environment it was designed for. The question is whether it performs consistently across all environments the market presents.

Why is a three-month trading run insufficient evidence of edge?

Because the probability of any trader having a strong three-month period at some point in a year is high enough that it provides almost no information about the quality of the underlying approach. Just as a 37.9% shooter will produce a 15-for-26 stretch in a full season 76% of the time, a mediocre trading system will produce a strong three-month period with comparable regularity. The short sample is consistent with both genuine edge and random variation, making it uninformative.

Trend Following Systems
Want to learn more and start trading trend following systems? Start here.