Statistical Sins: Six Ways To Mislead With Numbers

what are the 6 deadly statistical sins

Statistical analysis is a powerful tool, but it is not without its pitfalls. In fact, there are seven deadly sins of statistical innumeracy that can lead even the most well-intentioned analysts astray. These include non-response bias, mistaking statistical association for causality, poisoned control, data enhancement, absoluteness, partiality, and a bad measuring stick. These errors can be just as misleading as outright lies, if not more so, because they are often committed by people who genuinely believe their data approximates the truth.

shunspirit

Non-response bias

For example, consider a survey that claims gay men have an average life expectancy of 43 years. This figure was calculated by checking urban gay newspapers for obituaries and news stories about deaths. However, this method produces an unrepresentative sample that includes only those who die; gay men of the same generation who live longer are not included in the sample. The sample is also biased towards urban gay men with AIDS who have come out.

For instance, in psychiatric research aiming to analyse and diagnose major depression in a population, the behaviours of those who respond could differ vastly from those who do not. In such cases, it becomes challenging to make accurate assumptions or fully understand the data.

To mitigate non-response bias, researchers can employ strategies such as increasing sample sizes or using randomisation techniques to ensure a more representative sample.

shunspirit

Mistaking statistical association for causality

For example, a journalism class at Columbia University plotted a scattergram of census data on single-parent households in the Bronx. They found that as the percentage of minority members in a ZIP code area increased, so did the percentage of single-parent households. However, this does not necessarily mean that having more minorities in an area causes there to be more single-parent households. In fact, in the same dataset, the students found that household income was a better predictor of the variance in single-parent households. In this case, income is likely the causal factor, as areas with lower income may not be able to support two-parent households.

Another example of this sin in action can be found in New Jersey, just across the river from the Bronx. The town of Edgewater, which is well-off and largely white, has the highest incidence of single-parent households in Bergen County. In this case, the relatively inexpensive two-bedroom rental apartments, which are ideal for women and children from newly separated families, are likely responsible for the high number of single-parent households.

Drawing a clear distinction between association and causation is crucial for avoiding stereotypes and making accurate predictions. It is important to remember that correlation does not equal causation, and that other factors may be at play.

shunspirit

Poisoned control

For example, studies of the effects of Agent Orange on Vietnam War veterans are susceptible to this sin, because the chemical was so widespread that it is difficult to find a control group of veterans who were not exposed to it. Furthermore, the level and exact types of dioxins varied depending on the vendor, and the army lost track of which soldiers were exposed to which vendor's product.

shunspirit

Data enhancement

Another example of data enhancement is the commonly quoted statistic that most auto accidents happen within 10 miles of home, on familiar roads. This is true, but most driving is done within 10 miles of home. Data enhancement would suggest that the farther away from home you drive, the safer you will be.

> "In 1960, each car entering a central city had 1.7 people in it. By 1970, this had dropped to less than 1.2. If present trends continue, by 1980 more than one out of every 10 cars entering a city center will have no driver!"

Not all the journalists at the press conference understood that this was a joke.

shunspirit

Absoluteness

A good example of this is the consumer price index for all urban wage earners (CPI-U). This index acts as a surrogate for a comprehensive cost-of-living index, which the federal government does not publish. However, the CPI-U does not include all consumers, especially retirees, nor does it account for all consumer spending. A panel of economists suggested that the CPI-U overstated the cost of living by 0.6% to 1.6% annually. This range was then approximated in media headlines as a simple "1%", ignoring the nuances and complexities of the data.

Frequently asked questions

Written by
Reviewed by
Share this post
Print
Did this article help you?

Leave a comment