Data mining coincidences: Bellwether electoral districts
Lurking in this story below are general points about data-mining and the constant need for genuinely predictive replication. There are big incentives in data-mining to produce (false) alarms, for what data-mining consultant is going to say “We found nothing at all in your data”? Also, in many domains (such as medical testing, security systems), the money is in the false alarms. The current notorious example is prostate cancer testing, which requires mutilating surgery on 47 men to extend the life of one man. That is, there are 46 false alarms, 46 wrong unnecessary surgeries, and $1,000,000 in medical fees. Our paper suggests methodologies for validating–or not–the results of data mining.
For years journalists have data-mined election returns to find “bellwethers,” geographic units whose overall vote division mimicked the national result in election after election. I published an article (along with my Princeton undergraduate student Richard Sun) that showed that bellwether electoral districts have no predictive value (at least winner-take-all bellwethers; we did find some evidence for barometric and swingometric districts, which we constructed for fun). Journalists haven’t always gotten this news and so 30 years later I still get emails from reporters asking about some area that has voted for the electoral winner in the last 00 elections and whether that miracle will continue in the upcoming election.
Edward R. Tufte and Richard A. Sun, “Are There Bellwether Electoral Districts?”
Public Opinion Quarterly, 39 (1975), 1-18.
Click here to view the article, which shows some ways to test data-mined results and avoid being caught by coincidences.
Here’s the summary at the end of paper, concluding with a wonderful quote from Somerset Maugham:
“Are there bellwether electoral districts? No, at least not if they are chosen before the fact. Some counties are more barometric than others, both in retrospect and in prospect. While spectacular in their postdictions, these counties are not sufficiently barometric or swingometric in their predictions to provide a precise or reliable guide to upcoming elections. Several alternative methods of prediction are also preferred because their underlying inferential logic is more secure than the unknown mechanisms producing the highly variable barometric and swingometric behavior observed in our data.
The all-or-nothing counties are only a curiosity and probably should be forgotten. It is a waste of time to send reporters out to interview non-randomly selected citizens of Crook County a week or two before the election–at least from any sort of scientific point of view.
There perhaps remains a magical air about the bellwethers of the past. Some of these districts, considered individually, seem to have such phenomenal records, and while we know better than to take them seriously, still. . . . It may be best to look not to the election returns for the source of the mystery, but rather to ourselves. Somerset Maugham once wrote:
‘The faculty for myth is innate in the human race. It seizes with avidity upon any incidents, surprising or mysterious, in the career of those who have distinguished themselves from their fellows, and invents a legend to which it then attaches a fanatical belief. It is the protest of romance against the commonplace of life.'”
“County’s election streak over: Ferry presidential pick wrong, a first since 1960
Jim Camden, November 23, 2008, spokesmanreview.com
Ferry County’s status as a bellwether in presidential elections is gone.
No longer will the nation’s political prognosticators look to the north-central Washington county as an indicator of
the way America will vote in White House races.
Its string of picking correctly in the presidential race since 1960 snapped last week, when Ferry County went for John
McCain and the nation chose Barack Obama. In neither case was the race really close.
The outcome of the national election came as a bit of a shock to Ferry County Republicans, who were gathered at
the Prospector’s Inn in Republic on election night, county GOP Chairman Sam Jenkins said.
Almost everyone they knew was voting for “Sarah Palin and that white-haired dude,” he said. So when the crowd,
many of whom knew of the county’s string of picking presidents, saw Obama declared the winner a few seconds
after polls closed on the West Coast, “they were stunned,” Jenkins said.
Sarah Spark, the Ferry County Democratic Party chairwoman, said local Democrats were aware of the streak, too. But
“no one could really put their finger on what they were doing to keep the streak going.”
The streak was mentioned in weekly news magazines, on National Public Radio and various political Web sites in the
weeks leading up to Nov. 4. . . .
“We never had a clue,” Galvin said.
The county may have owed its notice this year to the fact that Missouri, which was a battleground state for the
presidential election, was considered a bellwether because it had given its electors to the national winner in every
race since 1960.
Ferry County was among just a handful of counties across the country that hadn’t missed a pick since 1960. In fact,
its voters went with the eventual winner of the presidential race in all but three races for 100 years. They went for
Democrats Alfred E. Smith in 1928 and Adlai Stevenson in 1952 and 1956.
By comparison, Washington state voters “picked wrong” in seven elections since 1908, and six since 1960. . . .
Ferry County’s streak was probably just waiting for the law of averages to catch up to it.
Edward Tufte, a Yale University professor emeritus of political science and statistics, wrote what many consider the
defining statistical analysis of bellwether counties in 1972. He cautioned in that study that a string of past picks
doesn’t mean a county can predict elections. Bellwether counties are a myth that have no scientific basis, and
shouldn’t be taken seriously, he wrote.
In an e-mail this week, Tufte was even more emphatic about the uselessness of bellwether counties: “They are the
B.S. of coincidence.”
Jenkins noted that Missouri, like Ferry County, lost its status of correct picks this year, so maybe there’s a new way to
look at things.
“I’m wondering if how goes Missouri, so goes Ferry County,” Jenkins said.”
http://www.npr.org/templates/story/story.php?storyId=96116110
Howard Berkes, “What is a Bellwether Electoral District?” NPR News, October 24, 2008
I missed this excellent NPR news account of bellwether electoral districts last October. It provides a smart accurate
summary of the Tufte-Sun paper and makes the practical point that journalists need not memorialize coincidences that
result from data mining of election results.