The State of Telephone Research
Multiple Response Questions on the Web

Scales Can Be Problematic in Mixed Mode Studies

This interesting problem showed up in my office this morning.

A while back we did a large Web study with physicians that used an extensive battery of seven point scales.  More recently, we completed a telephone study using the same battery with a similar population, but we are seeing somewhat different results.  Mean scores on the the scale questions are sometimes higher than they were on the Web.  What's going on?

There are a few potential explanations but the one that probably explains most of what we are seeing here comes down to how respondents process a scale when they hear it read to them as opposed to seeing it displayed, in this case, on a Web page. 

Tarnai and Dillman (1992) were probably the first to report a tendency for telephone respondents to choose extreme values in scales more often than mail survey respondents, who tend to use more of the entire scale.  This may result in different means (higher or lower depending on the question and scale direction) across modes. More recently, Steiger, Keil and Gaertner (2005) saw evidence of this in a Telephone/Web comparison of satisfaction scores.  They did not see the effect on all of the items they considered, but they saw enough to suggest that Web respondents distributed themselves more across the entire scale than do telephone respondents.  When you stop to think about it this is not all that surprising.  Visualizing the scale in your head versus seeing it displayed on paper or on a computer monitor might produce subtle differences in the value you select.

Unfortunately, the issue gets a bit more complex if you introduce variation in how scales are displayed on the Web.  Tourangeau, Couper, and Conrad (2004) first reported that including non-substantive answer categories (such as Don't Know, Refused, Not Applicable, etc.) on the far right of a horizontal scale display or the bottom of a vertical display can cause the center of the distribution to shift visually.  For example, in a seven point scale where the respondent sees seven radio buttons across the screen the visual center is the fourth radio button from the left.  Adding, for example, two non-substantive codes on the far right means that there now are nine radio buttons displayed and the visual center is the fifth  bottom from the left.  So choosing from the visual middle of the scale can produce a slight elevation in the overall mean.   Baker, Conrad, Couper, and Tourangeau (2004) replicated this result and show that it can be mitigated by such things as not displaying the non-substantive answer categories, displaying them but separating them from the substantive codes with a vertical line, or labeling all points in the scale.  Simply labeling the midpoint of the scale also may help.

More unfortunately still, when we did some experimental comparisons between phone and Web (Speizer, Schneider, Wiitala, and Baker (2004)) the effect reversed and when there were significant differences on satisfaction items across modes the tendency was for Web respondents to use more of the top boxes than telephone.  My untested hypothesis there is that the display (non-substantive answer categories on the right) mitigated the effect.  But I have yet to prove that.

To sum up, it should not surprise us that hearing the scale read and then interpreting it in one's head can sometimes lead to subtle differences compared to seeing the scale displayed on paper or on the Web.  The research records leads us to expect that we may see differences in means and those differences could be in either direction, depending on the question and how it is displayed on the Web.