There you go again!

One hears lots of silly things said at MR conferences and one of the silliest and oft-repeated refrains is that you can't do surveys with probability samples any more. There are even those who say that you never could. As often as I get the chance I point out that that's total nonsense. Lots of very serious organizations draw high quality probability samples all the time and get very good results. The prime example here in the US is the Current Population Survey, the government survey used as the basis for calculating the unemployment rate each month. Pretty much everything that comes from Pew is based in probability sample surveys as are many of those political polls that we follow so breathlessly every four years.

The concept of a probability sample is very straightforward. The standard definition is a sample for which all members of the frame population have a known, nonzero chance of selection. Unless you have a complex stratified or multi-stage design it's a pretty simple concept. As long as you have a full list of the population of interest to draw on and everyone has a chance to be selected the resulting sample can be said to represent the population of interest. But there are some serious challenges in current practice.

The first is assembling a frame that includes the entire population you want to study. For example, because of the rise of cell phone only households the landline frame that used to contain the phone numbers of well over 90% of US households no longer does. So it has become standard practice to augment the landline frame with a cell phone frame to ensure full coverage of the population. Clients often can supply customer lists that do a good job of covering their full customer base and we can draw good samples from them as well. Online panels are problematic because they use the panel as the frame and it contains only a very small fraction of the total population.

The second major challenge is declining cooperation. While there are studies that show even surveys with alarmingly low response rates can produce accurate estimates, low response rates make everyone nervous, raise fears of representivity and call results into question. The Current Population Survey gets 90% plus and so we trust the employment rate, but that kind of response is very unusual.

There are other challenges as well but I think it's the deterioration of the landline frame and very low response rates that cause some people to think that probability sampling is no longer possible. Anyone willing to spend the time and the money will get very accurate estimates from a probability sample, better than anything they'll get with an online panel or other convenience samples.

As I have written numerous times on this blog, the lure of online has always been that it's fast and cheap, not that it's better. And depending on how the results are to be used the method can be just fine, fit for purpose. But sometimes the problem requires representivity and when it does probability sampling is still the best way to get it.


Those pesky robo-polls

A new issue of Survey Practice is out and among the short articles is one by Jan van Lohuizen and Robert Wayne Samohyl titled "Method Effects and Robo-calls." (Some colleagues and I also have a short piece on placement of navigation buttons in Web surveys.) Like most people I know I have little regard for the accuracy of robo-calling as a competitor to dual frame RDD/cell phone using live interviewers and this article provides some grist for that mill. The paper looks at 624 national polls and the specific issue of Presidential approval. I'll just quote their conclusion:

" . . . while live operator surveys and internet surveys produce quite similar results, robo-polls produce a significantly higher estimate of the disapproval rate of the President and a significantly lower estimate for 'no opinion', attributing the difference in results to non-response bias resulting from low participation rates in robo-polls."

So far so good. But it reminded me of a report I'd recently seen (via Mark Blumenthal) about the latest NCPP report on pollster accuracy. In this study of 295 statewide polls in the 2010 cycle the average error Robocalls in the final outcome was 2.4 percentage points for polls with live interviewers versus 2.6 for robo-polls and 1.7 for Internet. Of course, accuracy on Election Day is not the same as accuracy during the course of the campaign. As even casual observers have noticed there is a tendency for all polls to converge as the election draws near. As this excellent post by Mark Mellman spells out, robo-polls may do well on Election Day but not so well in the weeks prior. I won't speculate as to the reasons.

But I take comfort in all of this. It's always nice to have one's prejudices confirmed.


Let’s get on with it

I spent some time over the weekend putting the finishing touches on a presentation for later this week in Washington at a workshop being put on by the Committee on National Statistics of the National Research Council. The workshop is part of a larger effort to develop a new agenda for research into social science data collections. My topic is "Nonresponse in Online Panel Surveys." Others will talk about nonresponse in telephone surveys and in self-administered surveys generally (presumably mail). The workshop is part of an overall effort driven by the increasing realization in the scientific side of the industry that as response rates continue to fall a key requirement of the probability sampling paradigm is violated. And so the question becomes: what are we going to do about it?

My message for this group is that online panels as we have used them in MR so far are not the answer. As I've noted in previous posts, response rates for online panels typically are an order of magnitude worse than telephone. At least with the telephone you start out with a good sample. (Wireless substitution is a bit of a red herring and completely manageable in the US.) With online panels you start out with something best described as a dog's breakfast. Dogs-breakfast-feature While it's become standard practice to do simple purposive sampling to create a demographically balanced sample that's generally not enough. To their credit, Gordon Black and George Terhanian recognized that fact over a decade ago when they argued for "sophisticated weighting processes" that essentially came down to attitudinal weighting to supplement demographic weighting to correct for biases in online samples. But understanding those biases and which ones are important given a specific topic and target population is not easy, and it doesn't always work. So a dozen years and $14 billion of online research later the industry seems to be just weighting online samples by the demographics and stamping them "REPRESENTATIVE."

The facts would seem to be these. On the one hand, you can still draw a terrific probability sample but the vast majority of people in that sample will not cooperate unless you make the extraordinary and expensive efforts that only governments have the resources to make. On the other hand, online panels have demonstrated that there are millions of people who not only are willing but sometimes eager to do surveys, yet we've not developed a good science-based approach for taking advantage. I take hope in the fact that some people are at least working on the problem. Doug Rivers regularly shares his research on sample matching which is interesting, although I've not seen applications outside of electoral polling. GMI's new Pinnacle product also is interesting, but so far I've only seen a brochure. And statisticians tell me that there is work in nonprobability sampling in other fields that might be adapted to the panel problem.

My message to the workshop group this week is simple: "Let's get on with it."


Getting straight on response rates

AAPOR's Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys has long been the bible for survey researchers interested in systematically tracking response and nonresponse in surveys and summarizing those outcomes in standardized ways that help us judge the strengths and weaknesses of survey results. The first edition, published in 1998, built on earlier work by CASRO (1982), a document that seems to have disappeared into the mists of time. Since then the SD Committee within AAPOR, currently chaired by Tom Smith, has issued a continuing series of updates as new practices emerge and methods change.

A 2011 revision has just been released and a significant part of this revision is focused on Internet surveys. In doing so it makes an important point that is often overlooked in practice. That point is this: if we want to calculate a response rate for an Internet survey that uses an online panel, it's not enough to track the response of the sample drawn from that panel; we also must factor in the response to the panel's recruitment effort(s). This is relatively straightforward for the small number of panels that exclusively do probability-based recruitment (e.g., The Knowledge Panel or LISS Panel). But the vast majority of research done in the US and worldwide uses panels that are not recruited using probability methods. The recruitment methods for these panels vary widely but in almost all cases it's impossible to know with certainty how many people received or saw an invitation. And so, the denominator in the response rate calculation is unknown and therefore no response rate can be computed. (The probability of selection is also unknown which makes computation of weights a problem as well.)

For these reasons a "response rate" applied to nonprobability panels is uncalculable and inappropriate, unless the term is carefully redefined to mean something very different from its meaning in traditional methodologies. These also are the reasons why the two ISO standards covering market, opinion and social research (20252 and 26362) reserve the term "response rate" for probability-based methods and promote the use of the term "participation rate" for access panels, it being defined as "the number of respondents who have provided a usable response divided by the total number of initial personal initiation requesting participation." And, of course, all of this is also getting much more complicated as we increasingly move away from "classic" panels toward designs that look a lot like expanded versions of river sampling with complex routing and even repurposing of cooperative respondents.

To my mind all of this is symptomatic of a much larger problem, namely, the use of a set of tools and techniques developed for one sampling paradigm (probability) to evaluate results produced under a very different sampling paradigm (nonprobability). This should not surprise us. We understand the former pretty well, the latter hardly at all. But therein lies an opportunity, if we can leverage it.


Balancing risk and reward in survey incentives

The current issue of Survey Practice has an interesting little piece on the use of lottery incentives in online surveys. (Here I quickly point out that the correct terminology should be "sweepstakes" since there are legal issues around anyone but governmental entities running lotteries, but let's not get distracted by that.) In self-administered surveys like online the right incentive can have a significant impact on response rate. We all would like to pay an attractive incentive contingent on completion, but money always is an issue. Sweepstakes have long been a favorite of clients looking to boost response without spending a lot of money. My recollection of the literature on this topic is that sweepstakes are better than no incentive at all, but nowhere near as effective as paying everyone who completes.

The article describes an experiment to answer a question that I get asked all the time: is it better to offer one big prize or several smaller prizes? If I have $1000 to spend will I get more bang from that as a single prize or as four $250 prizes or even 10 $100 prizes. The answer from this particular research is the standard answer to virtually all methodological questions: it depends. The authors argue that the key is the economic circumstances of the target respondents. Professionals, who presumably are reasonably well off, respond at a higher rate when a single large prize is offered. Students, on the other hand, are more persuaded by the greater odds of winning a smaller amount of money.

This makes a lot of sense to me. I am embarrassed that I never figured it out on my own.