Previous month:
April 2011
Next month:
June 2011

Posts from May 2011


The final session of this conference offered five papers under the general heading of "Potential for Innovations with New Technology and Communication Tools." (Here I disclose that I presented in this session with help from two colleagues.) The papers ran the gamut from better tools for interviewers to do what they always have done to a variety of Web 2.0 data collection methods to crowdsourcing questionnaire and database design. The precision and rigor that characterized most of the research presented over the previous two and half days gave way to an emphasis on general possibilities and "good idea" experiments. Much of what we presented would be very familiar to most readers of this blog. None of what we offered in this session was presented as a replacement for what is already being done or a radical rethinking of current methods used to collect and disseminate heath data. I think the presenters shared the common goal of offering some new ways to approach data collection and possibly enhance it, but firebrands we were not. Playing to the incremental and evolutionary instincts of the group we simply offered food for thought as thoughtfully as we could. That is, until the discussant, Michael Link from Nielsen got up and took the room to task for being too timid, too afraid of experimentation and too closed to new ways of doing things. He is uniquely qualified on this score, having been an established player in the federal health research complex before going to Nielsen where he has learned firsthand how new technologies and the different kind of social interaction they engender can be the basis for a radical redesign of how we design and do survey research. Having worked in that side of the industry much longer ago than Michael, I certainly felt his frustration. But I don't know whether the right way to move forward on a new agenda with this group is with a soft sell or a hard sell.

As I think I said in the first of these posts, I've not been to the last three of these conferences, stretching back to 2004. I've enjoyed them immensely. These are smart people who know their survey P's and Q's and practice them faithfully every day. In their own way they are patriots who believe that they are performing not just an important but a critical service for our government and country. Good for them.

Over the eight years since my first conference the practice of survey research in my part of this industry has been in near constant turmoil and has a still uncertain future. In theirs it seems little has changed. They still worry about declining cooperation, creating better measures and funding, but making progress on each of these seems painfully slow. I suppose you could argue that they need to go slowly because their numbers are more important than mine but that doesn't keep me from sharing Michael's frustration. Nonetheless, I can't wait for next round.

Zero defects: An admirable but elusive goal

Several years ago I was asked to write a chapter for a book called Methods for Testing and Evaluating Survey Questionnaires. So a couple of colleagues and I wrote something on testing online questionnaires. It led me to scratch the surface of the contemporary software testing literature where I learned that the industry had more or less run up the white flag on zero defects, that software has become so complicated and the competitive pressures to get releases out quickly so intense that most people had quietly given up on the idea of a first release being bug free. This struck me as somewhat analogous to what's happened in MR over the last decade: research designs have become more complex, the questionnaires to support them have followed suit, but the timelines clients insist on continue to get shorter and shorter. So questionnaires are more convoluted, there are more lines of code and more numbers to check but with less time to do it.

Of course, we all insist to our clients that we check it all and that remains the goal. But even if we have all the time we need there are two lessons I took from the software QA literature and they have to do with the priorities that should guide our approach to QA:

  1. Focus first on the most important stuff, that is, the section of the questionnaire, the lines of code and the analytic outputs that will create the biggest problem if they are wrong.
  2. Focus next on those places where there is most likely to be an error, that is, where the questionnaire or code is most complex and the numbers hardest to compute.

Then check everything else. Clients rightfully expect that every deliverable we give them be 100 percent correct. Getting there is not easy.

Getting to the bottom of the respondent engagement problem

I've been working along with some colleagues on the lit review section of a paper for the ESOMAR Congress. The topic is "gamification" as the next experiment designed to increase respondent engagement in online surveys. As anyone who has done their homework knows the issue of survey respondent engagement did not arise with the growth of online panels and online surveys. Over 40 years ago two of the legends, Charlie Cannell and Rober Kahn, were arguing that there is an optimal length for a survey and once that length is exceeded respondents become less motivated to respond and put forth less cognitive effort, causing survey data quality to suffer. In 1981 Regula Herzog and Jerald Bachman identified the tendency for respondents to "straight-line" in large numbers of consecutive items that shared the same scale, especially as they progressed through the questionnaire. Ten years later Jon Krosnick introduced the term "satisficing" to describe the tendency for survey respondents to lose interest and become distracted or impatient as they progress through a survey, putting less and less effort into answering questions.

What I find especially striking about this "early" work is its tone. It's not accusatory. No fingers are pointed and there is no implication that these are "bad respondents." Reflecting its mostly psychological roots, the arguments say that when we create certain kinds of conditions with surveys this is how people will react. No one suggests that people who exhibit these kinds of behaviors don't deserve to be interviewed, that we need to get them out of our datasets or they don't deserve to be heard. This stands in stark contrast to how the same problem has been discussed over about the last five years in the context of online surveys. Name calling has been popular—inattentives, mental cheaters, speedsters, or simply, bad respondents. (Here I admit that I am as guilty as anyone of some this.) And in most circles, current rhetoric to the contrary, the emphasis still is almost totally on getting rid of people who exhibit these behaviors rather than seriously attacking what pretty much everyone who has studied these problems over the last half century agrees is the root cause: long surveys on not very interesting topics. And now to that double whammy the online paradigm offers almost no limits on how often you might be interviewed.

Another legend and former boss, Norman Bradburn, proposed a simple solution way back in 1977: convince respondents that the data are important. Unfortunately, we seem to have taken a different path.

New challenges to online panel data quality

Among the many criticisms we hear of online panels is the charge that we have no idea whether these respondents are who they say they are and that the incentive-driven nature of panels encourages people to pretend they are someone they are not. A number of companies have introduced products to clean up online samples so that we can know with some confidence that our online respondents are real. Not surprisingly, each of these companies uses a different approach and, as research presented at the CASRO Online Conference back in March by Melanie Courtright and Chuck Miller from DMS demonstrated, can produce different results. Hacker-computer-mask

Melanie and Chuck started with an online sample of 7200 people with a roughly 60/40 split between their own panel and other third party sources. They balanced the sample on the front end on age, gender, income, and ethnicity. They asked each respondent at the outset for his or her name, mailing address, and date of birth. They submitted what they got to four companies offering validation services. Then they administered a questionnaire containing a number of demographic, lifestyle, attitudinal, and behavioral questions. All respondents were administered the survey, even those whom the validation service could not validate as real. The results should trouble all of us:

  • The percent of respondents validated varied by almost 10 points across the four providers (87.4% vs. 78.4%).
  • Barely half of 18 to 24 year olds were validated with closer to a third for two of the providers.
  • Hispanics and Asians validated at significantly lower rates than whites and African-Americans.
  • While there were no significant differences by gender across providers, males were almost twice as likely to fail the validation check as females.

These results are similar to results in some proprietary research that we did for a client in 2010. We also found that respondents with lower incomes and less education were more likely to be flagged as invalid.

Melanie and Chuck also compared the substantive survey findings across validated and non-validated respondents where they found other important differences. For example:

  • Validated respondents reported lower rates of iPhone and smartphone ownership than respondents who did not validate.
  • Validated respondents also reported being more careful and thoughtful shoppers than the non-validated group.

I find all of this worrisome. One of the findings from the ARF ORQC's study of 17 online panels was that online panels are not interchangeable, that the answer you get for any given research question might well depend on the panel you choose to work with. This research from DMS suggests that validation rates also will vary depending on the service your panel provider chooses to work with, potentially adding still another layer of bias. The apparently strong relationship between age, ethnicity, education and income on the one hand and likelihood to validate on the other adds still another layer of concern. Are we running the risk of excluding the very people who are toughest to reach—young people, ethnics, the less well-educated—simply because they don't show up in "the system," don't own credit cards or have mortgages? In the name of making things better are we actually making them worse?


I’ve just returned from the AAPOR Annual Conference where I was reminded by one of the HSRM Conference organizers that I never finished my HSRM posts.  Shame on me.  So picking up where I left off, the afternoon session consisted of six papers on the general topic, “Building the Health Data Sets of Tomorrow.” 

The title itself is interesting because it talks about “datasets” rather than surveys.   It raises the possibility that major health datasets that today are built primarily through surveys might in the future be constructed from a combination of surveys and behavioral data, or, in government speak, “administrative records.”  This is hardly a new concept.  I can recall experiments back in the 70’s and 80’s that looked at the feasibility of matching data from different federal record systems and, on some large government contracts, the feasibility of substituting administrative record data for survey data collection or at least development of sampling frames.  These experiments generally faltered on the inherent difficulty of matching, especially with no single identifier (such as a social security number) across systems and the generally poor character of administrative record keeping at the time.

Whether this renewed interest in the use of these behavioral data sources is a sign that these problems are being solved or simply a case of “hope springs eternal” I can’t say.  This session, while offering up some intriguing possibilities, experiments, and lines of inquiry, didn’t really answer the question.   These folks live in a world where precise measurement is not just a good idea, it’s the law.  Their eyes light up about the possibilities, but they are especially good at seeing the gaps and the flaws.   

Lots of people in MR talk about the possibilities of “behavioral data” as a potential revolution in how market research is done, but the potential impact on government health research is even more far reaching.  Most of what we do in MR is make inferences about future behavior based on measuring attitudes and intentions in surveys, focus groups, and, more recently, social media.    Government health surveys, on the other hand, are almost completely about measuring and documenting current and past behavior.  In some instances the only way in which respondents can accurately answer the survey’s questions is by consulting their own records.  So the potential advantage of effectively leveraging the vast amounts of data now being collected about individual health-related behaviors and use of the healthcare system is huge.  Unfortunately, I did not come away convinced that we will get there any time soon.  But at least they are working on it.