Two of the more interesting sessions at last week's AAPOR conference featured the US Census Bureau. The shared theme was the Bureau's initiative to reengineer its data collection process in an era of declining cooperation and ever-tightening budgets. The two underpinnings of their strategy are (1) a new data collection approach called adaptive design and (2) big data.
Adaptive design is an enhancement to an earlier strategy called responsive design that replaces the traditional strategy of pursuing the highest possible response rate until either the money or time runs out. Adaptive design essentially says that the quality of the estimates is a better indicator of overall data quality than the response rate. To simplify it for the blogosphere, responsive design says that it makes no sense to continue to pursue interviews with certain types of people (say, a specific demographic group) if getting those data is not going to improve the survey's estimates, or at least the most important estimates. Adaptive design takes things a couple of steps further by saying that I can make decisions about which lines in my sample to pursue by using what I already know about them. Some of that information might come from a close monitoring of the field effort on the survey I'm running, and some might come from other sources. That's were big data comes.
The Census Bureau executes over 100 different surveys of households and businesses every year. Throw in the Decennial Census, and they have tens of thousands of field representative visiting or calling millions of homes and businesses, and learning at least a little something about each of them, whether they complete an interview or not. Putting all of this together in a systematic way will make it possible to separate out the easier respondents to get from the really hard ones. Bringing in data from the administrative records of other government agencies can enrich the database even further, sharpening the Bureau's ability to further prioritize the data collection effort. (I'm one of those people who believe that much of the Decennial Census might be done from these administrative records, but that's another post.) In theory, the survey effort becomes more efficient, can be completed more quickly, and will cost less.
But the Bureau faces the same challenge all users of big data must face: potential limits due to privacy protections. In their case it may come down to their ability to use data collected for another purpose. But unlike many of those other users, the Bureau approaches these issues with the utmost seriousness. Confidentiality protection is an obsession. The bar is significantly higher than simply what is legal. And so, they have an aggressive survey program designed to measure public attitudes toward an approach like what I've just described.
The jury is still out on all of this, but here's hoping they can make this work.
The room is full here at AAPOR and mostly I suspect to hear a presentation of Pew's comparison of the results from a dual frame (landline plus cell) telephone survey and Google Consumer Surveys. There is no shortage of people I've talked to here and elsewhere who think that Pew was overly kind in characterizing the differences. So it will be interesting to see how this plays out. Granted, it's back to keeping score, but I can't resist watching.
Scott Keeter is doing the presentation, and already I feel better. (I'm sure he didn't mean it as a joke, but Scott started by describing Google's quota sampling strategy as based on Google knowing "something about users.") More seriously, he is positioning this in a fit-for-purpose framework.
Scott has shown a chart that estimates the mean differences across 52 comparisons to be 6.5 points. Not awful, and not great. Some topics seem to work well, but others do not. The problem, of course, is that there is no way at the moment to figure out when it will work and when it will not.
He says that Pew will continue to use them, but not for point estimates. It seems useful for some question testing and quick feedback to help with survey design. Hence the link to fit-for-purpose. But hardly a game changer.
I am at the AAPOR annual conference in Boston. My first observation: it is huge. For example, at 8:00 this morning there are no fewer than eight separate sessions, each with five to six presenters. There is no way you can come close to covering the whole thing. So I have tentatively chosen to focus on two consecutive sessions about online sampling, mostly without using online panels.
I'm having two reactions to this. First, I'm feeling like I'm watching the wheel being reinvented. Ok, so maybe a reboot of online sampling is a better description. But there is a certain naiveté in doing sampling from Facebook, Google, or an email blast to a list of unclear origin and expecting some chance of getting a sample that matches some high quality probability sample like that used by the GSS. Second, and probably of greater importance, the level of transparency and analysis is refreshing, especially given the lack of transparency we have seen over the years on the part of online sample companies.
For years I have been frustrated by this industry sector's out-of-hand rejection of online and a research agenda that seems directed at demonstrating only that online does not work. But the people who do this kind of work have a lot they can contribute to the debate about the quality of online samples and how to improve it. To paraphrase a statement made by Doug Rivers at an AAPOR conference two years ago, it's time to move beyond keeping score. It's nice to see that finally happening.
As for this post, see #4 above.