Pleeezz!

Today’s update from Research-live.com has this headline: Online trackers not optimised for mobile could 'compromise data quality.' It goes on to explain:

GMI, which manages more than 1,000 tracking studies, claims that online trackers that haven’t been optimised for mobile platforms may exclude this growing audience, which could lead to a drop in data quality, reduced feasibility and the possibility of missing whole sections of the required population from research.

Let me be clear. I don’t disagree that online surveys need to be optimized for mobile and that the numbers of unintentional mobile respondents (aka UMRs) is large and growing. But a warning from an online panel company that scaring away UMRs may be leading to a drop in data quality because of “the possibility of missing whole sections of the required population from research” just drips with irony.

Let’s start with the fact that online research, at least in the US, by definition is excluding the roughly 20% of the population that is not online. Research using an online panel of, say, two million active members is excluding about 99% of the adult population. As the industry has moved more and more to dynamic sourcing it’s hard now to know how big the pool of prospective online respondents is, but it’s a safe bet that that the vast majority of US adults are missing, and not at random.

Surely, if we have figured out a way to deal with the massive coverage error inherent in the online panel model, we can handle the mobile problem.

I suspect that the real issue here is feasibility, not data quality. Just as the now near-universal use of routers is about inventory management rather than improved representativeness. I wish that online panel companies would spend more time trying to deal with real data quality issues like poor coverage and inadequate sampling methods, but that’s only going to happen if their customers start demanding it.


Pew takes a serious look at Google Consumer Surveys

The room is full here at AAPOR and mostly I suspect to hear a presentation of Pew's comparison of the results from a dual frame (landline plus cell) telephone survey and Google Consumer Surveys. There is no shortage of people I've talked to here and elsewhere who think that Pew was overly kind in characterizing the differences. So it will be interesting to see how this plays out. Granted, it's back to keeping score, but I can't resist watching.

Scott Keeter is doing the presentation, and already I feel better. (I'm sure he didn't mean it as a joke, but Scott started by describing Google's quota sampling strategy as based on Google knowing "something about users.") More seriously, he is positioning this in a fit-for-purpose framework.

Scott has shown a chart that estimates the mean differences across 52 comparisons to be 6.5 points. Not awful, and not great. Some topics seem to work well, but others do not. The problem, of course, is that there is no way at the moment to figure out when it will work and when it will not.

He says that Pew will continue to use them, but not for point estimates. It seems useful for some question testing and quick feedback to help with survey design. Hence the link to fit-for-purpose. But hardly a game changer.


AAPOR gets serious about online sampling

I am at the AAPOR annual conference in Boston. My first observation: it is huge. For example, at 8:00 this morning there are no fewer than eight separate sessions, each with five to six presenters. There is no way you can come close to covering the whole thing. So I have tentatively chosen to focus on two consecutive sessions about online sampling, mostly without using online panels.

I'm having two reactions to this. First, I'm feeling like I'm watching the wheel being reinvented. Ok, so maybe a reboot of online sampling is a better description. But there is a certain naiveté in doing sampling from Facebook, Google, or an email blast to a list of unclear origin and expecting some chance of getting a sample that matches some high quality probability sample like that used by the GSS. Second, and probably of greater importance, the level of transparency and analysis is refreshing, especially given the lack of transparency we have seen over the years on the part of online sample companies.

For years I have been frustrated by this industry sector's out-of-hand rejection of online and a research agenda that seems directed at demonstrating only that online does not work. But the people who do this kind of work have a lot they can contribute to the debate about the quality of online samples and how to improve it. To paraphrase a statement made by Doug Rivers at an AAPOR conference two years ago, it's time to move beyond keeping score. It's nice to see that finally happening.


Accuracy of US election polls

Nate Silver does a nice job this morning of summarizing the accuracy of and bias in the 2012 results of the 23 most prolific polling firms.   I’ve copied his table below. Before we look at it we need to remember that there is more involved in these numbers than different sampling methods.  The target population for most of these polls is likely voters and polling firms all have a secret sauce for filtering those folks into their surveys.  Some of the error probably can be sourced to that NateSilver step.

 

But to get back to the table, the first thing that struck me was the consistent Republican bias.  The second was the especially poor performance by two of the most respected electoral polling brands, Mason-Dixon and Gallup.  But my guess is that readers of this blog are going to look first at how the polls did by methodology.  In that regard there is some good news for Internet methodologies, although we probably should not make too much of it.

  As far back as the US elections of 2000 Harris Interactive showed that with the right adjustments online panels could perform as well as RDD.  When the AAPOR Task Force on Online Panels (which I chaired) reviewed the broader literature on online panels we concluded this about their performance in electoral polling:

A number of publications have compared the accuracy of final pre-election polls forecasting election outcomes (Abate, 1998; Snell et al, 1999; Harris Interactive, 2004, 2008; Stirton and Robertson, 2005; Taylor, Bremer, Overmeyer, Sigeel, and Terhanian, 2001; Twyman, 2008; Vavreck and Rivers, 2008).  In general, these publications document excellent accuracy of online nonprobability sample polls (with some notable exceptions), some instances of better accuracy in probability sample polls, and some instances of lower accuracy than probability sample polls. “ POQ 74:4, p.743

So there is an old news aspect to Nate’s analysis and one would hope that by 2012 the debate has moved on from the research parlor trick of predicting election outcomes to addressing the broader and more complicated problem of accurately measuring a larger set of attributes than the relatively straightforward question of whether people are going to vote for Candidate A or Candidate B.  In Nate’s table there are nine firms with an average error of 2 points or less and four of the nine use an Internet methodology of some sort.  I say “of some sort” because as best I can determine there are three methodologies at play.  Two of the four (Google and Angus Reid) draw their samples to match population demographics (primarily age and gender).  IPSOS, on the other hand, tries to calibrate its samples to using a combination of demographic, behavioral and attitudinal measures drawn from a variety of what it believes to be “high quality sources.”  (YouGov, which is further down the list, does something similar.)  RAND uses a probability-based method to recruit its panel.  So there are a variety of methodologies at play in these numbers.

Back in 2007, Humphrey Taylor argued that the key to generating accurate estimates from online panels is understanding their biases and how to correct them.  I tried to echo that point in a post about #twittersurvey a few weeks back.  Ray Poynter commented on that post.

My feeling is that the breakthrough we need is more insight into when the reactions to a message or question are broadly homogeneous, and when it is heterogeneous . . . When most people think the same thing, the sample structure tends not to matter very much. . .However, when views, attitudes, beliefs differ we need to balance the sample, which means knowing something about the population. This is where Twitter and even online access panels create dangers.

 I think Ray has said it pretty well.


The latest online panel dust up

Gregg Peterson's post earlier this week on this blog about the Panel of Panelists at the CASRO Online Conference created quite a stir. I saw an unusual number of pageviews, there was a fair amount of retweeting of the link and other industry commentators worked a similar theme. It came on the heels of Ron Sellers's More Dirty Little Secrets of Online Panels which also created quite a stir.

What I find surprising about all of this is, well, that people are surprised. For anyone paying half attention to the quality issues of online panels the core of what Gregg reported ought to be old news. I thought that we had already worked our way through the first four stages of the Kübler-Ross model and arrived at Acceptance. Recognizing that some people are doing an awfully large number of surveys and doing it for money is the least of it. This is convenience sampling run amok.

We've seen this movie before. A former colleague once spent much of his time on the road doing focus groups with IT managers. Time after time in every city it was always the same faces. He liked to call them "the Band of Brothers." He claimed to be on a first name basis with many of them. But the widely-acknowledged professional focus group participant did not completely undermine the usefulness of that methodology.

I've long been very critical of online and quick to point out its numerous shortcomings but I also think that we are better at it today than we have ever been. To the industry's credit, we finally have come to face its flaws and are evolving a set of standards and practices to deal with them.  We have a pretty good idea of how to do a better job of managing panels and are extending that to other non-panel sample sources. RecalculatingWe have finally admitted that our sampling strategies need to evolve beyond age, gender and regional quotas. There are some very smart people helping us think our way through this. Granted, these developments are not yet industry wide. There are still way too many practitioners making outrageous claims, too many panel companies that have not embraced the kind of transparency that's needed for buyers to make sound judgments about quality and too many clients buying by the pound. Too much DADT.

We all know the old cliché about admitting you have a problem being the first step. I thought we were there.