On Twitter today, I came across this post from statist blog, Think Progress, the"journalistic" arm of Democratic "think" tank, Center for American Progress. It's a perfect encapsulation of how a media narrative gets constructed using data distortion, correlation instead of causation, and confirmation bias.
HIV Infection Is Most Concentrated In The South, Where Students Don’t Learn About It In School: The CDC’s most recent HIV Surveillance Report contains the first-ever comprehensive data set allowing researchers to map HIV infections across the entire country.
They include the following CDC map of the data, which makes their headline and "conclusion" seem self-evident:
WOW! Look at all that dark shading covering the whole South (except Arkansas, Tennessee, Virginia...)! Even a moron can see that the South (and by extension conservatism) has an HIV problem. And that's exactly the problem. They want you to use the critical thinking skills of a moron, see the obvious pattern, add the obvious conclusion to your mental catalog of ideological reference material, share it with a few friends, and move on. But let's look further.
Notice the ranges of infection used in shading. The lightest shade covers roughly 0-5 infections per 100,000 people, the next covers 5-10, the next 10-20, and the next...20-180??? So two categories are groups of 5 infections per population, one is 10, the darkest shade covers a range of 160? That's a perfect example of skewed data. Why would you group 20-180 all together? Why not 20-30, 30-40, and so on?
Let's look at
the data itself to find out why. The rate of infection for all states ranges from 1.9 (Utah) to 30.6 (Maryland). The outlier is the District of Columbia at 155.6. DC being entirely urban has a naturally high rate of infection, as any urban area would. So let's toss that out. We'll also then adjust the shades so that they're a more equal representation of the data. Voila:
Not a huge change, but it certainly dilutes the premise of the article. You could go even further by splitting it into all groups of 5 instead of two groups of 5 and two of 10. Regardless, it's not really "the South" that has an HIV problem, it's Louisiana and Maryland. "The South" has a roughly equivalent HIV problem as New York, Illinois, Massachusetts, and New Jersey. In fact, only three states in the South have higher infection rates than New York - Louisiana, Georgia, Florida.
While the map has problems, the real logical leaps here come with the connection of HIV infection rates to HIV education policies. I don't have access to statistical analysis software, but it doesn't seem the Think Progress author did either, as there is no actual correlation drawn, only enthusiastically assumed. TP uses
data from the Guttmacher Institute showing which states mandate sex ed in their curricula, and specifically HIV education. They then make a broad statement about how few states mandate sex/HIV ed without even attempting to correlate the data. It turns out that's because they couldn't if they tried.
Nationally, 33 states require HIV education in their curricula. Only
three of the thirteen states with infection rates of 20+ aren't on that list. If there were a correlation between lack of HIV education and HIV infection rates, this data doesn't show that, even less the implied causation. In fact, of the five states with the highest infection rates (Maryland, Louisiana, New York, Florida, Georgia), three have mandated HIV ed, and one (Georgia...in the South!) has mandated sex
and HIV ed. Another talking point bites the dust.
There's actually a much more obvious correlation to be drawn from the CDC data and map. What else correlates strongly with most of the darker shaded states on the above map? The politically correct Think Progress wouldn't dare to say it - African-Americans.
It has long been an unfortunate scientific fact that HIV infection is much higher in the black community than any other ethnicity.
According to the CDC, there were 12,500 new HIV infections in the white population in 2010. There were 20,550 in the black population, which is only 13% of the total population! In terms of the infection map, of the 13 states with more than 15% African-American population, only 4 (Virginia, Tennessee, Delaware, Arkansas) aren't in the darkest shaded areas, though they're all in the next group down. This isn't an exact correlation either, of course. The state with the highest ratio black population, Mississippi, has "only" the 6th highest HIV infection rate.
That's the trouble with drawing fast conclusions from data like this; it's impossible. Even if we know the main factors that lead to these things, they're difficult to quantify, and even more difficult to analyze. But instead of looking at the data and trying to transmit important information clearly to their readers, Think Progress looked at a map that had a lot of Southern (read: conservative) states highlighted and shrieked, "OOH! Bias confirmed!"