Back again at CASRO Panels. The afternoon is dedicated to survey routers. As this has unfolded it's clear to me that it is the perfect way to end to the conference. It's the logical endpoint. So let's go back to the beginning and note that there seem to be four overriding themes to this conference.
The first was implicit in Kim Dedeker's opening remarks about the focus on reliability rather than validity. It's hard any more to find anyone whose head has been in the game to argue that panel research as we practice it today (with a few notable exceptions) produces representative samples or results that are projectable to target populations with any specific accuracy. That is at long last a settled issue. As Kim suggested, clients can deal with results that they know have some bias as long as those results are not jumping all over the place from survey to survey. For many of their purposes directional results are just fine.
The second is that the panel data quality crisis (which Kim sort of launched) is no longer the focus. Panel companies and research suppliers have developed a set of solutions to deal with the biggest issues and these are being implemented all over the industry. It may be too soon to pronounce the problem solved, but I think it's clear that we are out of the woods on this one. There still is good and important research on this issue, some of it presented at this conference, but we seem to have figured this one out.
The third is the clear realization that there is wide variability in panels and it's unwise to expect to get consistent results from panel to panel. One of the themes of the ARF research is to protect against variability by making sure that the panel you choose to work with as the depth to support the full run of your research. If the panel can't sustain it and you are forced to change, you could be in for a rocky time.
Finally, the era of the panel as we have known it over roughly the last 15 years is rapidly closing. The old model of sending a bunch of invitations to the panel and directing panelists to a single survey is increasingly untenable. As people have been pointing out for years the panel model is not sustainable. The pool of willing respondents is limited and we need strategies that tap multiple sources to draw in the number of respondents we need to meet the demand. And so we need to create a nearly constant stream of willing respondents from panels, from river, from social networks, from IM, from sms messaging ,etc. .
Which brings us to routers. We need effective ways to allocate those respondents. These things have been around for over a decade and mostly used with river sampling, but they have been black boxes that most of us know very little about. Now getting routers right seems like a critical issue. We heard nice presentations from Western Wats and OTX on various approaches to routing. The main takeaway seemed to be that random assignment of Rs to waiting surveys is the most efficient in terms of sample utilization as well as the best way to moderate the diversity/bias across these multiple sample sources. But it's not that simple. There are other issues as well. For example, we need routers that minimize respondent burden. Some routers keep sending Rs through multiple surveys until they find one that the R qualifies for. People can get trapped in these and are unlikely to come back a second time. Equally problematic is the difficulty of computing some standard metrics we are used to like response rate or contact rate. Or should we have willing Rs to more than one survey out of the same recruitment? What happens when the input stream varies by source? Are the types of screening questions that Jackie Lorch described in her paper a good idea? The more people talk about it, the more difficult the problem becomes.
So my main takeway for the last two days is that a dramatic change is upon us and it's not clear just how ready we are for that change. Most of the buzz in the MR world over the last year or so has been about social media, MROCs, Twitter, etc. Panels were passé.
For better or for worse, panels are becoming a lot more interesting.
Back at CASRO Panels for another day. First speaker was to be Joel Rubinson from ARF but he has sent his number two, Ray Pettit. He began by reviewing the major findings from the Foundations of Quality project:
He also put up a graphic showing all of the things that impact "Panel Data Quality." When I saw it my thought was that they are rebuilding the Total Survey Error Model. Seems like that would have been a good starting point rather than trying to slowly reinvent it more or less piece by piece in a somewhat unsystematic way. It's an existing framework with a rich literature to back it up. Drawing on and participating in that might have been a better approach than reinventing it.
The rest of the presentation was about the QeP process that involves a formalized set of forms and procedures to document things at the panel stage, the individual survey stage, and the research agency editing stage. It's been tested with some big suppliers and met with enthusiasm. They are going to run training programs for it as they roll it out more broadly.
In the Q&A one of the program chairs (Jeff Miller) pressed him on when we will see detailed results. So far we've only seen high level stuff and there has been considerable disappointment around the industry in terms of what we have seen so far. The summary stuff apparently was published in the Journal of Advertising Research December issue. He promised the detailed stuff in March.
Also in the Q&A someone pointed out that just as this is rolling out for panels the whole panel landscape is changing. How quickly they can evolve to deal with that probably is a critical success factor for the initiative.
Next we heard from Nallan Suresh and Michael Conklin from MarketTools. They have been doing some interesting building regression models to understand what drives respondent engagement. The essence of their model is a combination of behavior and outcomes on debrief questions. Key findings:
To their credit, they do not advocate dealing with the problem by color and flash gadgets as many others have done.
This is good common sense stuff, and it's nice to see it backed up with data. As an example they have brought a client to testify to his ability to move a fairly complex survey task from a CLT setting to online and simplifying it along the way. Lots of data to show that mostly, it worked. A nice story, but it seems like a bit of a nonsequitur.
It's not their point, but the cynic in me wonders if maybe the kinds of people who show up at CLT testing are the same kind of people who sign up for panels.
Last presenter is this segment was Adam Porter from e-Rewards/Research Now. He reported on some research designed to get a handle on what Rs view as a good survey versus a bad survey.
Not much of a surprise there but the better findings focused on the characteristics of a bad survey:
The positives were essentially the opposite of the negatives. But one key point: the single most problematic feature was restricted answer options. No DK, no way to refuse, could not skip, no open end to express an opinion.
I’m blogging from the CASRO Panels Conference in New Orleans. (#caspan on Twitter)This is the third year for this event and based on the program it could be the strongest year yet. And make no mistake about it; this conference takes its title seriously. The sessions are overwhelming focused on the quality challenges that the panel paradigm faces while also giving some space to some new developments. The MR blogosphere’s current obsession with social media is getting scant attention.
An aside: Jeffrey Henning is sitting behind me and also blogging the conference. So the bar is being set very high. He has told me his secret about how he manages to be so prolific and so smart. Unfortunately, I can’t act on either. Worse yet, I am really rushing and so expect an order of magnitude increase in typos.
Diane Bowers opened with encouraging words about the MR industry showing signs of recovery from 2009. Part of that is their survey data and some of it optimism she’s picking up among CASRO members. Let’s hope so.
True to its theme the conference’s first speaker was the lady who was among the first to wave the warning flag: Kim Dedeker. She promised that her talk would be about ‘reliability’ rather than ‘validity.’ By that she means a system that produces ‘identical outcomes’ given the same inputs. It’s about consistency from survey to survey and not necessarily about accuracy. As a former client-side person she speaks with authority when she says that clients engage with us as part of managing risk on business decisions. If we can’t help them do that then we lose our credibility. I didn’t count them, but she used the term ‘science’ at least 10 times. That’s something clients look to us to provide and the application of science is what delivers on that consistency thing. She was asked a question about accuracy which sort of gets to validity. Sometimes it’s important, but often it’s less important than reliability because many client companies have other sources to benchmark against. As long as they are seeing reasonable consistency over time, they feel reassured. But she also pointed out to all of us that it is absolutely essential that we keep evolving the science. Surveys may or may not be dead, but it’s hard to argue that how we do surveys must change and change rather dramatically. That is the question we have yet to answer.
Next up was Jamie Baker- Prewitt (no relation) from Burke whose topic was the variation in buying patterns that may exist across different sample sources. She did a nice job of summarizing the research on research issues that we have all watched go by over the last five years or so. (I was a bit surprised to hear that we don’t have to worry about coverage error anymore because of high Internet penetration but will soldier on.) Her study looked at six samples—two classic panels, two river samples, and two social networking samples. The demographics of the samples were surprisingly uniform, although Facebook seemed to have delivered a much different group (more male, more lower income, older) than one might expect. The two social networks also delivered samples with people who spend more time online that the other samples. Time forced her to race through product awareness and use measures. It was impossible to keep up but there were lots of instances where there were not a whole lot of differences among these sources, some surprising and some not. In general social networking sample tends to be an outlier more than others. The Facebook sample often stands out, especially in terms of brand awareness where the FB respondents just are not as aware as others. Sample from FB took a real beating. Very different from the other sources on a variety of measures, but also very expensive and inefficient. The bottom line seems to be that there is some consistency across the standard panels and river but it was less so for social networking sample, especially FB. Someone asked about accuracy but she punted on that one. Remember, it’s about consistency.
The final presentation in this leg was Jackie Lorch from SSI. Very interesting. She started with the claim that panels as we have known them are dying and we need to be much more diverse in how we recruit. So she imagines a combination of panel invitations, river, sms messages, etc. with everyone coming into a routing hub where they answer some questions and then get put to one of many online surveys. I expect she’s right about the need for this kind of multiple sourcing going forward. It’s the obvious answer to the-panels-are-not-sustainable argument that one hears over and over. But the interesting part is what happens in the hub where the sample sources get blended together. She starts from the premise that balancing people on demographics is not enough. We need more. We need to take into account the attitudinal and behavioral differences that are at the heart of why panels and online in general fails the representative test. So they have been doing a lot of work with various kinds of psychometric and segmentation ideas to try to create more representative samples than you can get just with demographic balancing. My first reaction was that it was like propensity weighting only on the front end. But the more she talked the clearer it seemed that it was model-based sampling, although she never used the term. Now I am not a sampler and if you are I suggest you stop now because I am about to send you into terminal eye rolling. Once again, I soldier on. You build a model of the distribution of key variables you need to create a representative sample of your target population and then make sure your sample is drawn to conform to it. This is respectable stuff, but also very difficult. As a sampler I know once said, “There is nothing wrong with model-based sampling; it’s just that there are a lot of bad models.” In other words, your sample is only as good as your model and getting the model right is hard. Modeling to a specific outcome is one thing, but modeling to a whole range of possible and unknown outcomes is really really difficult. Some of the people doing online political polling are using this approach. They know the right proportions of characteristics and behaviors to get in the sample. They have been able to do it because the study the same problem over and over and its one with a known outcome. But building a general model to cover all of the possible topics in an MR consumer study sounds like a really tough job. I wish them luck.
Unfortunately, I had a phone meeting and missed the rest of the afternoon. There was a discussion on the legal aspects of digital fingerprinting and a panel discussion about communities. The paper I wish I had heard was Pete Cape’s. He’s always interesting. His topic was “Conditioning effects in online communities.” His abstract says he will try to answer the question of whether surveys of online communities are reliable. I will have to ask him the answer when I see him today. But by far the two best papers I have ever heard on this topic sum things up pretty well. Kristoff de Wulf did one at ESOMAR in Dublin a couple of years back and showed how community members tend to either be in love with the brand to start, fall in love once they join, or become disenchanted and fall away. Last year in Montreux Ray Poynter put it this way (I am paraphrasing): “If you test a concept in your community and they hate it, go back to the drawing board. If they love it, go do some real research.”