Part 3

Interpreting Correlation: Proceed with Caution

One trap that we all have fallen into is within us: Our own bias. Bias is a disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Within statistical analysis, bias occurs when you overestimate or underestimate the population parameter of interest. While difficult to manage, there are ways of controlling the monster that bias can be.

Once you have overcome the obstacle of picking the right sample size for your analysis, the next step is collecting an accurate sample of the population of interest. The sample is a group of individuals that represent a population. A simple random sample (SRS) is a sample of the population being analyzed, chosen at random. To take a random sample, you might use a lottery method as in “drawing from a hat” from your entire population. Or, you can use a random number generator, where you assign each individual in your population a number. You would then use a random number generator to select the numbers or individuals that you would include within your sample. While it may be a tedious process, a random sample ensures that the sample within your analysis is as closely representative of the population of interest as possible.

Interpreting Correlation: Proceed with Caution

While a random sample removes bias and is the preferred method of attaining a sample, there are other methods of generating your sample. Stratified sample, cluster sample, systematic random sample, and others, all provide samples that represent the population of interest, but will include some sort of bias compared to a sample, solely based on chance. For example, a stratified random sample breaks your data into sub-groups. Within each of these subgroups, a random sample is performed to achieve X amount of individuals from each sub-group. The benefit of this method of gathering your sample data is that it ensures that all “groups” within the population are represented, compared to the chance that one group is over-represented in a SRS, as those individuals are chosen simply through chance. 

Interpreting Correlation: Proceed with Caution

Bias isn’t just found in sampling; bias within results is one of the largest beasts to conquer. Confounding variables are external factors that may affect your results. For example, if you are studying weight gain based on activity level, you have to consider that many factors will play a role such as age, gender, genetics, type of workout, and so on. Just because something seems to be the case, doesn’t mean it is the only factor causing it to happen. Simply put Correlation ≠ causation. This is one of the largest mistakes seen within analysis. It is true, NO correlation = NO causation, but this is not the case for the opposite. Just because your analysis shows there is statistical association (correlation) between variables A and B, it does not mean that variable A causes variable B to occur. Frequently, we fall into the trap of jumping to conclusions based on what the data seems to be telling us, when in reality there is no knowing for certain if that conclusion is true or not. 

American author and economist Thomas Sowell once said, “One of the first things taught in introductory statistics textbooks is that correlation is not causation. It is also one of the first things forgotten.”

In investing, this idea couldn’t be more relevant. When analyzing performance and stocks, look for reasoning, not just statistical links. Any causal explanation must be made apart from the statistics. Then we can avoid making statistics the third kind of lie. 

Beginner icon
Beginner
The Inevitable Nature of Making Errors
3 Lessons
~9 mins in total
Completed!
Good job completing this one! Up for another? Pick your next lesson.

Related Guides