"Is it legal?" is not enough

I just posted a link to this Computerworld article on my Twitter feed, but I think it's so important that I have decided to mention it here as well. The article describes the dangers brands are beginning to face with over aggressive big data and data mining practices. The key point is that it's not just about what is legal, but that consumers are can also be sensitive to what they view as privacy violations and over aggressive marketing.

These are extra legal areas where codes of conduct developed by industry and trade associations have traditionally protected both research agencies and their clients from public backlash. It has become fashionable in some quarters to argue that these quaint notions are holding back market research and providing an opening for new entrants to realign the competitive balance in the industry. This is a good reminder that respect for consumers never goes out of fashion.

Is research on research the real deal?

David Carr had a piece in last Sunday's New York Times about the difficulty of distinguishing journalism from activism. His first sentence sums up the issue pretty succinctly, "In a refracted media world where information comes from everywhere, the line between two 'isms' — journalism and activism — is becoming difficult to discern." His case in point is Glenn Greenwald, the Guardian reporter who has been breaking all the Snowden stories. Greenwald seems to be a guy with a very strong suspicion of government, and it shows in what he covers and how he writes about it. Cable news provides still better examples, where people who describe themselves as journalists routinely put a political agenda ahead of the objectivity that some of us expect (hope) from "the news."

We have a similar problem in market research when it comes to distinguishing between good methodological research and what we like to call research on research (RoR). Good methodological research is based in honest and objective scientific inquiry. Hypotheses are formed, the relevant literature reviewed, experiments designed and executed, data analyzed, hypotheses accepted or rejected, conclusions reached, and potential weaknesses in the research fully disclosed.  The best of these studies end up in peer-reviewed journals where they help us to build and refine research methods, brick by brick.

Much of RoR, on the other hand, has become something much different. It often features a point of view rather than a hypothesis, and the exploration of the data is a search for proof points ratBadscienceher an objective analysis aimed at uncovering what the data can tell us. The end product typically is a white paper, designed to sell rather than inform. We might attribute some of the poor quality of RoR to a lack of training and skill, but I expect most of it comes back to the simple fact that MR is a business. Academics achieve success by doing good solid research that earns the respect of their peers. MR companies succeed by selling more of their stuff.

All of which is not to say that there is not some good RoR being done, studies that are based in the fundamentals of objective scientific inquiry. It's just that it's getting harder and harder to tell the difference. And given the methodological disruption that has come to characterize our industry over the last decade, that's a real problem for all of us.

Sir Martin on crowdsourcing of ads

The March issue of The Harvard Business Review has in interview with Martin Sorrell.  At one point the interviewer asks whether crowdsourced ads and algorithms are the future model of advertising.  Sir Martin’s response is interesting:

You are tapping into the knowledge and the information of people all over the world.  That’s fantastic. And the power of the web is that it opens up and plumbs people’s minds and mines all their knowledge.  But somebody has to assess it, and you can’t do that with an algorithm. There’s a judgment call here.  That’s the problem I see: Ultimately, somebody still has to decide what is the best piece of knowledge or the big idea that you got from the crowd. If you don’t get that right, it fails.

Measuring the right stuff

A few weeks back I saw a post by online usability specialist Jakob Nielsen titled, “User Satisfaction vs. Performance Metrics.”  His finding is pretty simple: Users generally prefer designs that are fast and easy to use, but satisfaction isn't 100% correlated with objective usability metrics.  Nielsen looked at results from about 300 usability tests in which he asked participants how satisfied they were with a design and compared that to some standard usability metrics measuring how well they performed a basic set of tasks using that design.  The correlation was around .5.  Not bad, but not great.  Digging deeper he finds that in about 30% of the studies participants either liked the design but performed poorly or did not like the design but performed well.

I immediately thought of the studies we’ve all seen promoting the use of flash objects and other gadgets in surveys by pointing to the high marks they get on satisfaction and enjoyment as evidence that these devices generate better data. The premise here is that these measures are proxies for engagement and that engaged respondents give us better data.  Well, maybe and maybe not.  Nielsen has offered us one data point.  There is another in the experiment we reported on here where we found that while the version of the survey with flash objects scored higher on enjoyment, respondents in that treatment showed evidence of lack of engagement at the same rate as those tortured with classic HTML.  They failed some classic traps at the same rate.

A cynic might say that at least some of the validation studies we see are more about marketing than survey science.  A more generous view might be that we are still finding our way when it comes to evaluating new methods.  Many of the early online evangelists argued that we could not trust telephone surveys any more because of problems with coverage (wireless substitution) and depressingly low response rates.  To prove that online was better they often conducted tests showing that online results were as good as what we were getting from telephone.  A few researchers figured out that to be convincing you needed a different point of comparison.  Election results were good for electoral polling and others compared their online results to data collected by non-survey means, such as censuses or administrative records.  But most didn’t.  Studies promoting mobile often argue for their validity by showing that their results match up well with online.  There seems to be a spiral here and not in a good direction.

The bottom line is that we need to think a lot harder about how to validate new data collection methods.  We need to measure the right things.


Accuracy of US election polls

Nate Silver does a nice job this morning of summarizing the accuracy of and bias in the 2012 results of the 23 most prolific polling firms.   I’ve copied his table below. Before we look at it we need to remember that there is more involved in these numbers than different sampling methods.  The target population for most of these polls is likely voters and polling firms all have a secret sauce for filtering those folks into their surveys.  Some of the error probably can be sourced to that NateSilver step.


But to get back to the table, the first thing that struck me was the consistent Republican bias.  The second was the especially poor performance by two of the most respected electoral polling brands, Mason-Dixon and Gallup.  But my guess is that readers of this blog are going to look first at how the polls did by methodology.  In that regard there is some good news for Internet methodologies, although we probably should not make too much of it.

  As far back as the US elections of 2000 Harris Interactive showed that with the right adjustments online panels could perform as well as RDD.  When the AAPOR Task Force on Online Panels (which I chaired) reviewed the broader literature on online panels we concluded this about their performance in electoral polling:

A number of publications have compared the accuracy of final pre-election polls forecasting election outcomes (Abate, 1998; Snell et al, 1999; Harris Interactive, 2004, 2008; Stirton and Robertson, 2005; Taylor, Bremer, Overmeyer, Sigeel, and Terhanian, 2001; Twyman, 2008; Vavreck and Rivers, 2008).  In general, these publications document excellent accuracy of online nonprobability sample polls (with some notable exceptions), some instances of better accuracy in probability sample polls, and some instances of lower accuracy than probability sample polls. “ POQ 74:4, p.743

So there is an old news aspect to Nate’s analysis and one would hope that by 2012 the debate has moved on from the research parlor trick of predicting election outcomes to addressing the broader and more complicated problem of accurately measuring a larger set of attributes than the relatively straightforward question of whether people are going to vote for Candidate A or Candidate B.  In Nate’s table there are nine firms with an average error of 2 points or less and four of the nine use an Internet methodology of some sort.  I say “of some sort” because as best I can determine there are three methodologies at play.  Two of the four (Google and Angus Reid) draw their samples to match population demographics (primarily age and gender).  IPSOS, on the other hand, tries to calibrate its samples to using a combination of demographic, behavioral and attitudinal measures drawn from a variety of what it believes to be “high quality sources.”  (YouGov, which is further down the list, does something similar.)  RAND uses a probability-based method to recruit its panel.  So there are a variety of methodologies at play in these numbers.

Back in 2007, Humphrey Taylor argued that the key to generating accurate estimates from online panels is understanding their biases and how to correct them.  I tried to echo that point in a post about #twittersurvey a few weeks back.  Ray Poynter commented on that post.

My feeling is that the breakthrough we need is more insight into when the reactions to a message or question are broadly homogeneous, and when it is heterogeneous . . . When most people think the same thing, the sample structure tends not to matter very much. . .However, when views, attitudes, beliefs differ we need to balance the sample, which means knowing something about the population. This is where Twitter and even online access panels create dangers.

 I think Ray has said it pretty well.