Accuracy of US election polls

Nate Silver does a nice job this morning of summarizing the accuracy of and bias in the 2012 results of the 23 most prolific polling firms.   I’ve copied his table below. Before we look at it we need to remember that there is more involved in these numbers than different sampling methods.  The target population for most of these polls is likely voters and polling firms all have a secret sauce for filtering those folks into their surveys.  Some of the error probably can be sourced to that NateSilver step.


But to get back to the table, the first thing that struck me was the consistent Republican bias.  The second was the especially poor performance by two of the most respected electoral polling brands, Mason-Dixon and Gallup.  But my guess is that readers of this blog are going to look first at how the polls did by methodology.  In that regard there is some good news for Internet methodologies, although we probably should not make too much of it.

  As far back as the US elections of 2000 Harris Interactive showed that with the right adjustments online panels could perform as well as RDD.  When the AAPOR Task Force on Online Panels (which I chaired) reviewed the broader literature on online panels we concluded this about their performance in electoral polling:

A number of publications have compared the accuracy of final pre-election polls forecasting election outcomes (Abate, 1998; Snell et al, 1999; Harris Interactive, 2004, 2008; Stirton and Robertson, 2005; Taylor, Bremer, Overmeyer, Sigeel, and Terhanian, 2001; Twyman, 2008; Vavreck and Rivers, 2008).  In general, these publications document excellent accuracy of online nonprobability sample polls (with some notable exceptions), some instances of better accuracy in probability sample polls, and some instances of lower accuracy than probability sample polls. “ POQ 74:4, p.743

So there is an old news aspect to Nate’s analysis and one would hope that by 2012 the debate has moved on from the research parlor trick of predicting election outcomes to addressing the broader and more complicated problem of accurately measuring a larger set of attributes than the relatively straightforward question of whether people are going to vote for Candidate A or Candidate B.  In Nate’s table there are nine firms with an average error of 2 points or less and four of the nine use an Internet methodology of some sort.  I say “of some sort” because as best I can determine there are three methodologies at play.  Two of the four (Google and Angus Reid) draw their samples to match population demographics (primarily age and gender).  IPSOS, on the other hand, tries to calibrate its samples to using a combination of demographic, behavioral and attitudinal measures drawn from a variety of what it believes to be “high quality sources.”  (YouGov, which is further down the list, does something similar.)  RAND uses a probability-based method to recruit its panel.  So there are a variety of methodologies at play in these numbers.

Back in 2007, Humphrey Taylor argued that the key to generating accurate estimates from online panels is understanding their biases and how to correct them.  I tried to echo that point in a post about #twittersurvey a few weeks back.  Ray Poynter commented on that post.

My feeling is that the breakthrough we need is more insight into when the reactions to a message or question are broadly homogeneous, and when it is heterogeneous . . . When most people think the same thing, the sample structure tends not to matter very much. . .However, when views, attitudes, beliefs differ we need to balance the sample, which means knowing something about the population. This is where Twitter and even online access panels create dangers.

 I think Ray has said it pretty well.

Ying versus yang

NCHS just released the latest data on US wireless only households. The relentless march continues and as of December, 2011, a whopping 34% of US households have only a wireless telephone. To put it another way, you can only hope to reach about two-thirds of the US population when only calling landline telephones. Clear evidence, I'm sure, that telephone surveys are dead. 

Pew also  just released a report saying that 49% of US adults use their cell phones to go online. At this rate about two-thirds of US adults will be using their mobile to go online in 2014. More proof, I'm sure, that mobile's time has come.

The (wireless substitution) beat goes on

To no one's surprise the gold standard on wireless substitution in the US, the NHIS, reports that the proportion of wireless only households keeps growing.  As of July, 2011, it was 31.6%.  That's a 1.9% increase since December of 2010 and translates into 30.2% of all US adults.  Just eyeballing the graph below it does not appear that the slope of the line that matters the most to us--the blue line--is changing much.  Hard to believe there are still people out there doing telephone research without calling cell phones.




Cell phone data quality

My first taste of a methodological imbroglio was 25 years ago and involved the introduction of CAPI (computer-assisted personal interviewing). There was widespread speculation that interviewers using laptops for in-person interviewing might lead to unforeseen impacts on data quality. Empirical research taught us that we needn't worry and so CAPI became the standard.

More recently as the growth in wireless only households has made it necessary to include cell phones in our telephone samples there has been a lot of worry about the quality of data collected by cell. Poor audio quality, an increased likelihood for multitasking while being interviewed, and the possibility of environmental distractions are some of the things that people cite as possible causes for reduced data quality. Now research by Courtney Kennedy and Stephen Everett reported in the current issues of POQ has turned up little empirical evidence that such effects exist. They randomized respondents to be interviewed either by cell or landline and then looked at six data quality indicators—attention to question wording, straightlining, order effects, internal validity, length of open end responses, and item nonresponse. They found no significant differences in five of the six. The outlier was attention to question wording where they found some evidence that cell phone respondents may listen less carefully to complex questions than those responding by landline.

It's gratifying to know that there are researchers out there who approach new methods with skepticism and take a guilty-until-proven-innocent position. More gratifying still that other researchers do the hard work of carefully vetting those concerns with well-designed empirical studies.

Can we really do two things at once?

Like most research companies mine now routinely includes cell phones in our telephone samples. Best practice requires that before we interview someone on a cell phone we determine if it's safe to do the interview. If, for example, the respondent is driving a car we don't do the interview. Yesterday someone asked me if it was ok to do the interview if the respondent is using a hands-free device. Bmw-rallies-with-dot-on-distracted-driving-thumb-28065_1 The research on this is pretty clear: the problem with cells phones and driving is the distraction, not the dexterity required to hold a phone in one hand and drive with the other.  There is no basis for making an exception for hands-free.

This reminded me that responding to survey questions is not easy; it takes some serious cognitive energy. Most researchers accept the four-step response process described a decade ago by Tourangeau, Rips and Rasinski:

  1. Comprehension—understand the question and how to answer it (instructions)
  2. Retrieval—search memory to form an answer
  3. Judgment—assess completeness and relevance of the answer
  4. Respond—map the response onto the right response category

When respondents execute this process faithfully we say they are engaged. When they short-circuit it we talk about lack of engagement. A person talking on a cell phone while driving can either drive or engage with the survey. It's a rare person who can do both well simultaneously.

Which brings us to one of my favorite subjects: respondent engagement and rich media (aka Flash) in Web surveys. What is the rationale for arguing that dressing a Web survey up with more color, pimped-up radio buttons, a slider bar, or a slick drag and drop answering device is going to encourage respondents to execute the four-step response process with greater care than if we just show them the same kind of standard screen they use to enter their credit card details on Amazon? Or are unfamiliar interfaces just a distraction that makes it even less likely? It might get someone to the next question, but is that enough?