The Trade-off on Trade-offs
March 22, 2008
It's just dawned on me that while I posted a number of updates from GOR08 I have not reported on the interesting research that Bob Rayner, Mick Couper, Dan Hartman, and I presented. The issue at hand was the "best" way to ask feature trade-off questions online. For some time we have been presenting pairs of features and asking respondents to allocate 10 points between the two features with more points to the one they prefer in a rough approximation to how much they prefer it. This has always seemed like a tough exercise and we were interested to see if other approaches might work better. So we tested it along with:
- Simple radio buttons (RBS)
- MaxDif (best/worst)
- Something we called "Two by Four" where you present two features and ask the respondent to choose the one he/she likes best, both the same, or neither
- A technique called "Q-Sort" in which respondents pick the features they like best and those they like least over a series of three screens
- VAS or slider bar where the respondent moves the bar to reflect his/her preference for one feature compared to another
Respondents were randomly assigned to one of these conditions and then asked to complete three sets of exercises, two having to do with healthcare policy and one on banking service features. Our definition of "best" had two components: (1) respondent preferences and (2) discriminating power in the final measures.
Respondent preferences are summarized in the table below. In general, respondents did not care much for MaxDif.
Termination Rate | Exercise Completion Time (Seconds) | Debrief Scores | |
10 Point Allocation | 8.10% | 294 | 6.5 |
Radio Buttons | 5.40% | 222 | 6.7 |
Two by Four | 4.20% | 228 | 6.3 |
MaxDif | 15.30% | 429 | 5.4 |
Q-Sort | 6.30% | 181 | 6.6 |
VAS | 4.90% | 218 | 6.5 |
The 10 Point Allocation also was rather long when compared to the other methods. Q-Sort was by far the shortest because it takes just four screens to execute. The other methods required as many screens as there were features.
As for discriminating power, we expected going in that MaxDif, Two by Four, and Q-Sort would show more dramatic differences among feature preferences than the other three methods and this was born out in all cases. We also looked at correlations among the methods to get a sense of whether they all were measuring the same thing. In general correlations were high, although the Q-Sort allocations were weak when compared with all other methods. Two by Four was occasionally problematic in this regard as well.
So what method should we prefer? As always, it depends. If you want to give respondents an answering technique they will like and not run away from then the answer seems to be anything but MaxDif. If you are willing to settle for less dramatic differences among feature preferences, then RBS, VAS, and even the 10 Point Allocations all work reasonably well. But if you want to see lots of discriminating power in your measures Two by Four appears to be best.
Those would seem to be the trade-offs.