Representivity ain't what it used to be
Failure to replicate

Online samples: Paying attention to the important stuff

Those of you who routinely prowl the MRX blogosphere may have noticed a recent uptick in worries about speeders, fraudulent respondents, and other undesirables in online surveys. None of this is new. These concerns first surfaced over a decade ago, and I admit to being among those working the worry beads. An awful lot has changed over the last 10 years, but it seems that not everyone has been paying attention. Muchqu

Yesterday, my buddy Melanie Courtright at Research Now reached her I’m-not-going-to-take-it-any-more moment and posted an overview of what are now widely accepted practices for building and maintaining online sample quality. Most this is not new, nor is it unique to Research Now. If you are really worried about this stuff, choose your online sample supplier carefully and sleep at night. ESOMAR has for many years provided advice on how to do this (select a supplier, not sleep at night).

Of course, none of this guarantees that you are not going to have some speeders sneak into your survey who will skip questions, answer randomly, choose non-substantive answers (DK or NA), etc. Your questionnaire could be encouraging that behavior, but let’s assume you have a great, respondent friendly questionnaire. Then the question is, “Does speeding with its attendant data problems matter?”  The answer is pretty much, “No.” It may offend our sensibilities but the likely impact on findings is negligible. Partly that’s because we seldom get a large enough proportion of these “bad respondents” to significantly impact our results, but also because their response patterns generally are random rather than biased. See Robert Greszki’s recent article in Public Opinion Quarterly for a good discussion and example.

The second iteration of the ARF's Foundations of Quality initiative also looked at this issue in considerable detail and offered these three conclusions:

  • For all the energy expended on identifying those with low quality responses, they may make less of a difference in results than focusing more clearly on what makes for a good sample provider.
  • Further, when sub-optimal behaviors occur at higher rates, they generally indicate a poorly designed survey – some combination of too long, too boring, or too difficult for the intended respondents. Most respondents do not enter a survey with the intention of not paying attention or answering questions in sub-optimal ways, but start to act that way as a result of the situation they find themselves in.
  • Deselecting more respondents who exhibit sub-optimal behaviors may increase bias in our samples by reducing diversity, making the sample less like the intended population.

The irony in all of this is that the potential harm caused by a few poor performing respondents pales in comparison to the risk of using samples of people who have volunteered to do surveys online, especially in countries with low Internet penetration. There is the widely-accepted belief in the magical properties of demographic quotas to create representative samples of almost any target population. No doubt that works sometimes, but we also know that, depending on the survey topic, other characteristics are needed to select a proper sample. What characteristics and when to use them remain open questions. Few online sample suppliers have proven solutions and outside of academia little effort is being put to developing one.