Where is our Copernicus?

I was at the AAPOR Conference in Chicago most of last week and while I had planned to do some blogging it was hard given the sheer overwhelming amount of information, opinions, and data being shared. (And besides, Jeffry Henning was there pounding out posts on his shiny new iPad so I am confident the important ground was or shortly will be covered.) By my count there were 66 sessions (not counting the WAPOR overlap) with upwards of 300 papers and while they may not all have been publication ready by any means the vast majority (and I mean that) were by people who know their stuff. I heard no paper that made me shake my head and mutter to myself.

Overall there seemed to be two major themes and both had to do with getting good samples in challenging times. There was one group that is all about dual frame telephone sampling meaning including cell phones and a second group focused on address-based sampling (ABS) with interviewing by some combination of mail and Web. There were multiple sessions on both topics and sometimes two at the same time. Next year AAPOR ought to make an effort to get the two groups in the same room to argue because there are arguments to be had.

But what really struck me is the comment this seems to make on the crisis in the scientific side of the research industry. We have hit a wall. Our mainstay of the last 30 years—telephone research—is not working anymore and we are doing our best to keep propping it up. One implication of that propping up is that costs are continuing to rise even faster because calling cell phones is more expensive than calling landlines. Some are abandoning phone altogether and going back to paper and pencil mail surveys. (If I had predicted that five years ago my blogging credentials would have been revoked!)

We are in desperate need of a breakthrough. And for the kind of work that these people do and the precision requirements they have to meet it's clear that online as currently conceived and practiced is not it. I don't pretend to have the answer, but there were an awful lot of people at the conference last week who have the smarts and experience to do the hard work of figuring it out. Let's hope they do so soon.

I don’t know

I have this assignment of sorts to read an often-cited article by Jon Krosnick and some colleagues titled, "The Impact of 'No Opinion' Response Options on Data Quality," Public Opinion Quarterly, 66:371-403. This is quite timely as I have just finished a bit of empirical research with some colleagues that cites this article, although I confess I have not read it in several years. The research was a Web-based experiment in which we tried to assess the impact of offering a DK option in a Web survey. This is of special interest because so much of Web is about transitioning phone studies to Web and understanding some of the differences one sees. Interviewers seldom read a DK option to respondents. Rather, they hold them in their pocket and use them as a sort of last resort. Online you need to decide whether to display the DK option or not, but when you do it's not unusual to get a significant increase in its use. Hardly surprising. The key research question is whether you get an overall different distribution of substantive responses depending on whether you offer a DK option. In our study (presented by Mick Couper at the General Online Research Conference in Vienna), we got significantly higher rates of nonresponse (i.e., more frequent selection of DK rather than simply skipping the question) when we presented it on the screen. Differences in the distribution of response items question by question were difficult to detect. So the takeaway here is that not presenting the DK reduces nonresponse and does not seem to lead to a lot of guessing and nonsensical answers that destabilize the distributions.

Krosnick and his colleagues come to essentially the same conclusion. There is a body of research (see, for example, Philip Converse, "Attitudes and Non-Attitudes: Continuation of a Dialogue, " in The Quantitative Analysis of Social Problems, ed. Edward Tufte, 1970) that argues sometimes people really don't have a "preconsolidated opinion" and you can't force them to come up with one. So to keep people from guessing or answering randomly, it's best to offer a DK. Krosnick and his colleagues argue that selecting the DK is a form of satisficing that is most prevalent among respondents with less formal education, in self-administered situations, and toward the end of the questionnaire. In other words, it's a sign that respondents are not willing to put forth the cognitive energy to give a thoughtful answer, so they jump at the chance the DK offers. Take away the option and these people mostly will give valid answers and you will reduce your nonresponse.

Of course, taking away the DK option will not drive nonresponse to zero. Survey research, like life, is full of compromise. Unfortunately, due to a design flaw introduced by yours truly our research could shed no light on what happens if you don't offer a DK and don't let respondents just skip the question. On many commercial Web surveys there is no DK and an answer is required, which may produce a different result. That's a variation we need to test, but at least in this experiment our results were not terribly different from what Krosnick describes.

More comparisons of Web to other methods

I am finally getting around to wading through the mother lode of academic research noted in an earlier post way back at the beginning of March.  The special POQ issue has two articles, one looking at Web versus face-to-face and the other comparing CATI, Web and IVR.  The results are not particularly surprising, but it's nice to see one's suspicions confirmed with well-designed and executed research.

Dirk Heerwegh and Geert Loosveldt report on results from a survey in Belgium designed to assess attitudes toward immigrants and asylum seekers. They put considerable effort into designing both the Web and face-to-face survey based in Dillman's unimode construction principles.  In other words, they worked hard at making the two surveys as comparable as possible rather than optimizing each to its own mode.  Their results are pretty convincing.  The Web survey produced a higher rate of "don't know" responses, more missing data, and less differentiation in scales.

Frauke Kreuter, Stanley Presser, and Roger Tourangeau looked at social desirability bias across three methods--one with an interviewer (CATI), one without an interviewer (Web), and one with sort of an interviewer (IVR).  They drew a sample of University of Maryland alumni and asked a variety of questions about academic performance and post graduation giving.  They were able to verify the respondent's answers against university records.  In essence, they were able to tell who was telling the truth and who was not.  As with Heerwegh and Loosveldt, the results are pretty much what we would expect.  Web reporting was the most accurate and CATI the least with IVR generally somewhere in the middle.

So there you have it.  We used to like to say that "the Internet changes everything."  Well, it does not appear to have changed some basic principles of survey research.

From the frying pan to the fire

The last issue of the International Journal of Market Research has an article by Mike Cooke and some colleagues at GfK describing their attempt to migrate Great Britain's 20 year running Financial Research Survey from face-to-face to online. Despite hard work by some of the smartest people I know in this business and after spending around £500,000 they ultimately concluded that full migration to online was not possible unless they were prepared to risk the credibility of the data itself. There simply were too many differences online that would disrupt their 20 year timeline. They ultimately settled on a mixed mode approach that has an online component but with the majority of interviews still done face-to-face.

Seeing the piece in print (I had heard the research presented previously at a conference a year or so ago) reminded me that much of the research one sees on this general topic of conversion of tracking studies from offline to online doesn't have a happy ending. Earlier this year in Toronto at an MRIA conference I heard a paper by Ann Crassweller from the Newspaper Audience Databank in Canada describing her test of the feasibility of converting a tracking study on newspaper readership from telephone to online. She compared results from her telephone survey to those from four different online panels. She was unable to get an online sample from any of the four panels that matched up with her telephone sample on key behavioral items, and the variation among the four panels was substantial. She concluded that at least for now, she needed to stay with telephone.

Fredrik Nauckhoff and his colleagues from Cint had a better story to tell at the ESOMAR Panel Research Conference in 2007. They compared telephone and online results for a Swedish tracker focused on automobile brand and ad awareness. Results mostly were comparable and where they were not the authors felt they were manageable. They did, however, sound a note of caution about the applicability of their results to countries with lower internet penetration than Sweden (81 percent).

I've personally been involved in a number of similar studies over the last few years, most of which I can't describe in any detail because they are proprietary to clients. One exception is work we did back in 2001 on the American Customer Satisfaction Index. We administered the same interview about satisfaction with a leading online book seller by telephone and to an online panel. To qualify for the survey the respondent had to have purchased from the online merchant in the last six months. We found few significant differences in our results. Additional experiments on this study with offline merchants have been less encouraging.

In 2003 we conducted a series of experiments aimed at the possible transition of at least some of Medstat's PULSE survey from telephone to Web. Despite using two different panels and a variety of weighting techniques we were unable to produce online data that did not seriously disrupt the PULSE time series on key measures. This study continues to be done by telephone.

In some cases, despite significant differences between telephone and online, clients still have elected to transition to online, either in whole or in part. In at least one instance of a customer satisfaction study a client felt that the online results were a more accurate reflection of their experience with their customer base. In another, the cost savings were so significant that the client elected to accept the disruption in the time series and to establish new baseline measurement with online.

What all of this suggests to me is that it is impossible to know in advance whether a given study is a good candidate or a poor candidate for transition to online. There is little question that online respondents are different in a whole host of ways—demographic, attitudinal, and behavioral—from the rest of the population and from the survey respondents we typically interview in offline modes. The key is to understand whether those differences matter in the context of whatever we are trying to measure in our survey. We can only learn this through empirical investigation, and even then, explaining the differences in our results can be frustratingly difficult.

Lying about satisfaction?

Back in September I described a WSJ piece that reported on a set of findings from Harris Interactive suggesting that  social desirability operates more widely than perhaps I had thought.  Nonetheless, I was not convinced that it was an especially significant concern for customer satisfaction surveys.  Turns out, I might be wrong about that

We are working on a proposal in which we are looking at the possible impacts of transitioning a customer sat study from telephone to IVR.  While doing my due diligence on this I found a 2002 POQ article (Roger Tourangeau, Darby Miller Steiger, and David Wilson (2002), "Evaluating IVR," POQ, 66, 265-278.)  In a set of well designed experiments they found that telephone interviewing consistently produced higher sat scores than IVR.  While the differences were not major (less than a point on mean scores for a 10 point scale)and not always significant they were very stable across questions and different length scales.

The obvious question (at least for me) is how this might translate to Web where for years we have seen major differences in sat scores when compared to phone.  Of course, with Web we have a second variable, namely seeing the scale displayed rather than having it read.  There is some interesting research there as well, but I'll save that for another post.