Previous month:
February 2008
Next month:
April 2008

Posts from March 2008

Cool Tool for Setting Up Meetings

Systems like Lotus Notes and Outlook have freed us from much of the drudgery of organizing meetings, but only as long as everyone you want to meet with is part of the same organization.  Having to coordinate meeting dates and times with  people outside the organization is a vivid reminder of the bad old days when you could spend as much time suggesting and exchanging possible dates and times as in the meeting itself.

From time to time I meet with a group of people from around world as part of a little bit of advising I do for ESOMAR.  The ESOMAR folks use this very cool little tool called Doodle to organize these meetings.  It's extremely easy to use, free, and works like a charm.  I strongly suggest that you take a look.


The Trade-off on Trade-offs

It's just dawned on me that while I posted a number of updates from GOR08 I have not reported on the interesting research that Bob Rayner, Mick Couper, Dan Hartman, and I presented.  The issue at hand was the "best" way to ask feature trade-off questions online.  For some time we have been presenting pairs of features and asking respondents to allocate 10 points between the two features with more points to the one they prefer in a rough approximation to how much they prefer it.  This has always seemed like a tough exercise and we were interested to see if other approaches might work better.  So we tested it along with:

  • Simple radio buttons (RBS)
  • MaxDif (best/worst)
  • Something we called "Two by Four" where you present two features and ask the respondent to choose the one he/she likes best, both the same, or  neither
  • A technique called "Q-Sort" in which respondents pick the features they like best and those they like least over a series of three screens
  • VAS or slider bar where the respondent moves the bar to reflect his/her preference for one feature compared to another

Respondents were randomly assigned to one of these conditions and then asked to complete three sets of exercises, two having to do with healthcare policy and one on banking service features.  Our definition of "best" had two components: (1) respondent preferences and (2) discriminating power in the final measures.

Respondent preferences are summarized in the table below.  In general, respondents did not care much for MaxDif.

                                                       
 Termination RateExercise   Completion Time (Seconds)Debrief Scores
10 Point   Allocation8.10%2946.5
Radio Buttons5.40%2226.7
Two by Four4.20%2286.3
MaxDif15.30%4295.4
Q-Sort6.30%1816.6
VAS4.90%2186.5

The 10 Point Allocation also was rather long when compared to the other methods.  Q-Sort was by far the shortest because it takes just four screens to execute.  The other methods required as many screens as there were features.

As for discriminating power, we expected going in that  MaxDif, Two by Four, and Q-Sort would show more dramatic differences among feature preferences than the other three methods and this was born out in all cases.  We also looked at correlations among the methods to get a sense of whether they all were measuring the same thing.  In general correlations were high, although the Q-Sort allocations were weak when compared with all other methods.  Two by Four was occasionally problematic in this regard as well.

So what method should we prefer?  As always, it depends.  If you want to give respondents an answering technique they will like and not run away from then the answer seems to be anything but MaxDif.  If you are willing to settle for less dramatic differences among feature preferences, then RBS, VAS, and even the 10 Point Allocations all work reasonably well.  But if you want to see lots of discriminating power in your measures Two by Four appears to be best.

Those would seem to be the trade-offs.


Offshoring

I recently saw an interesting report on a survey of "MR Professionals" by MarketResearchCareers.  I can't judge the validity of the survey because it costs money to see it, but for what it's worth here are its lead findings:

  • About one-third of US MR suppliers are offshoring work.
  • Of these 55 percent were satisfied with the experience.
  • Another 27% were not satisfied.
  • Just 54 percent believe that offshoring helps to control costs.
  • 44 percent believe offshoring erodes quality.

I find the last two bullets to be especially interesting.  It's clear that there are still a substantial number of people in the business who have not yet figured this out.  But they had better figure it out soon because whether we like it or not our future will have more offshoring, not less.


How Much Does It Hurt?

Talk of visual analog scales (VAS) was everywhere at GOR08.  These are scales that measure a characteristic or attitude across a continuum without the use of labels, numbers, or other markers, except for endpoints.  Their classic use has been in pain measurement as in the example below:

 

In online research they typically are presented as slider bars that allow the respondent to move the marker continuously across the scale as in the figure below:

 

We have done a number of experiments comparing them to more traditional answering methods (such as radio button scales) and in general have found that they yield essentially the same results as those conventional methods. (See, for example, Couper, M.P., Singer, E., Tourangeau, R., and Conrad, F.G. (2006), "Evaluating the Effectiveness of Visual Analog Scales: A Web Experiment." Social Science Computer Review, 24 (2): 227-245.) But we also have found some downsides, those being that they take longer for respondents to answer, cause more respondents to terminate because of technical problems, and produce more missing data.  Other studies, such as some done at Harris Interactive, have produced similar results.  So we have generally advised against their use.

Despite previous research they continued to be very popular.  See this post, for example.  Researchers are attracted to them because they are cool, because they believe that respondents like them, and because they think they can help make an otherwise tedious survey more interesting.  Some, but not many, argue that they are simply a better way to measure certain kinds of things like attitudes.

By my count there were at least six papers at GOR08 plus one by Mick Couper, Bob Rayner, Dan Hartman and yours truly that touched on VAS in one way or another.  The newest research seems to be saying three things:

  1. The technical problems causing higher termination rates may not be a severe as they once were and so we are losing fewer respondents than in the past.
  2. The gap in response time also is narrowing.  Perhaps respondents are becoming more familiar with them or the newer implementations simply work better.  Some researchers also are arguing that longer response time may not be a bad thing if that time is being spent on cognitive processing rather than trying to figure out how to use the thing.
  3. The data continue to be comparable to other methods such as radio button scales.
  4. The evidence on respondent preferences for VAS or RBS is still not convincing.

I suspect we are going to see more and more use of VAS in online surveys for the simple reason that people think they are cool.  Fortunately, while there is not much in the methods literature to suggest that they improve measurement or the survey experience it at least seems that we are getting to the point where they do no serious harm.  That may be a low bar, but it's better than some of what we have seen online.

 


Trying Hard to Engage

I find it strangely heartening that researchers are finally beginning to tune into the reality that many of the newly-recognized problems of online research are rooted in bad questionnaires.  High termination rates, high levels of satisficing, and falling response rates are at least partially blamed on questionnaires that are too long, too complex, too poorly implemented, or just plain boring.  The heart of the problem, it seems to me, is length and complexity, but those are client issues and difficult to affect in the short-term.  Implementation is easier to deal with.  One popular approach is the movement toward different kinds of answering devices, visual analog scales (a.k.a, slider bars) being a primary example.  I've ranted against VAS before, and will return to that topic in a later post.

For now I want to describe a paper I heard at GOR08 presented by someone from an interesting German company called Psychonomics.  They reasoned that because face-to-face interviewing gets such good data we should try to emulate in our Web surveys what we can of the face-to-face experience.  They had two ideas: (1) use questionnaires transitions to express a variety of different sentiments to respondents (e.g., gratitude, understanding, familiarity, etc.) and (2) use an avatar to "conduct" the interview.    I had seen the avatar idea tested before and in general it's always a dud.  Same here.  Too many technical problems and too slow.  People are not fooled into thinking they are being interviewed by a real person.

But I had hopes that the messages might show some helpful impacts.  No such luck.  Respondents ran away from the survey with reinforcing messages at essentially the same rate as a survey without them.

Later I heard a paper about using stories to induce mood changes in online surveys.  Apparently there are well-established procedures in Psychology for doing this.  They are called Mood Induction Procedures or MIPs.  And it actually works!  I won't go into the details, but it turns out that if you have people read a very sad story or a very happy story it carries over to how they answer survey questions.  Maybe we should think about that.