The cost of privacy

My email this morning included a message from Hertz describing their new fleet of BMWs. I don’t rent from Hertz anymore and the emails they continue to send are mostly reminders that my driver’s license has expired, which was some while ago. But why the BMW pitch? Europe-3220193_960_720Perhaps because in 2009 I treated myself to a BMW 3 series rental in Ireland which was great fun on the winding roads of the Dingle Peninsula and Ring of Kerry? Maybe they have a long memory? Or maybe the fact that I currently drive a BMW somehow found its way into my profile?

In any case, it reminded me of Pew’s recently released study, “Americans and Privacy.” A few relevant findings:

  • 72% of US adults believe that all or most of what they do online is tracked by companies.
  • 79% are either very or somewhat concerned about how their personal data is being used by those companies.
  • 59% say they understand very little or nothing about “what companies do with their data” and only 18% say have a great deal or some control over their data.
  • 28% say they benefit a great deal or somewhat from the data companies collect on them and 81% say “the potential risks outweigh the potential benefits.
  • 75% say there should be more “regulation of what companies can do with their customers’ personal information.”

I could go on, but I think these few examples make the point: the US public is beyond fed up with daily and routine violations of their privacy. They are especially concerned about the amount of personal information about them collected by social media companies (85%), advertisers (84%), and the companies they buy things from (80%).

The old saying, “On the Internet nobody knows you're a dog,” is no longer a thing.

The sad reality is that most, although not all of this data collection and reuse is legal, at least in the US, and that’s not likely to change anytime soon. One frequently cited reason for not taking privacy and personal data protection more seriously is that it just costs too much.

Earlier this year the Information Technology and Innovation Foundation (ITIF) released a study, “The Costs of an Unnecessarily Stringent Federal Data Privacy Law.” By way of definition “unnecessarily stringent” means something similar to the GDPR or CCPA. And the report estimates that such a privacy regime would cost the US economy about $122 billion (sometimes they say “billion” but the tables say “million”) per year, or $483 per US adult. (By way of comparison, that’s more than 50% of what we spend on electricity every year.)  So what are those costs?

Around 10% would go to Data Protection Officers and upgraded data infrastructure,  two major areas of complaint about the GDPR. But the lion’s share, 85% of the total, would go to  two areas: Reduced Access to Data and Lower Ad Effectiveness.

In the case of the former, privacy requirements such as express consent, data minimization and purpose specification will reduce data sharing. In one of my favorite sentences in the report the authors write, “Unfortunately, opt-in requirements frame consumer choices in a way that leads to suboptimal data sharing because most users select the default option of not giving consent—for a number of irrational reasons.” So best we stop asking.

As for Lower Add Effectiveness, the report tells us that “Targeted advertising is beneficial to consumers and businesses.” Such advertising allows businesses to be more efficient and increase sales. “Consumers benefit by gaining more utility from relevant ads.” More utility?

Sadly, I hear similar arguments from within market research in the form of complaining about the cost of compliance as GDPR goes global.

One of my favorite lines in the old CASRO Code of Conduct is this one: “Since research participants are the lifeblood of the research industry, it is essential that those approached to participate be treated with respect and sensitivity.” I worry that we’ve lost that sense of respect for those whose data we rely on, whether when collecting from them directly or harvesting their data from the cloud. Online panels have led to us thinking of respondents as a commodity and our increasing reliance on big data sources has caused us to stop thinking about them as people at all. In the privacy debate they are an abstraction in a one-sided cost benefit exercise.

There are recent surveys that show those people who are our lifeblood don’t think very highly of us these days. They don’t trust us with their data any more than they trust social media networks or advertisers and, whether rational or irrational, they are less and less inclined to cooperate with research requests. This is not a good thing, to say the least. It’s important that we figure out sooner rather than later whose side we are on.

Is Insights a Profession?

Screenshot 2019-02-04 15.18.21

Yesterday, like 100 million or so other Americans, I tuned into the Super Bowl. I was not there to see whether the Flying Elvis would beat a team that was only there because of one of the most glaring officiating errors in the history of American football. Rather, I tuned in to hear Romostradamus, the twitter-endowed name for Tony Romo, former quarterback turned color commentator who has become a phenomenon based on his ability to analyze a football game, explain strategy to the audience, and predict what play will be run with uncanny accuracy.

Writing in the New Yorker, Zach Helfand describes how Romo learned to do it: “In the course of his career, he watched hundreds of plays from the bench, lived through thousands more on the field, and then relived them many times over in the film room.” Frank Bruni writing in the Times echoes this theme of relentless study and preparation even to this day. In short, Romo approaches his job as a professional.

The Oxford Dictionary defines a profession as “a paid occupation, especially one that involves prolonged training and a formal qualification.” How many of us who fancy ourselves as “insight professionals” meet that test?

As I have said in other contexts, market researchers seem to take an almost perverse pride in describing how they “fell into” MR with little or no relevant education or training.  Over the last decade in particular we have celebrated the inclusion of specialists in other fields—computer science, anthropology, social psychology and journalism, to name a few. As one well known practitioner (whose name I withhold to protect his innocence) once observed, “We have allowed this industry to be taken over by venture capitalists and technology geeks.”

All this would be fine were there a concerted effort to teach these new entrants the foundational principles that underpin good research and actionable insights. What we have some learning by doing, some basic orientation in how a single company does what it does.

In December I ended a four-year stint as Executive Director of the Market Research Institute International (MRII), a non-profit that develops online courses in market research in partnership with the University of Georgia. When I first joined I was astonished by the widespread disinterest in education and training by individual practitioners and their employers, on both the client and supplier sides. And it’s not just me. This unfortunate reality is documented in a recent NewMR survey in which 39% of respondents reported receiving less than six hours of training per year. This in an industry that fancies itself as undergoing rapid and dramatic change, but it’s doing precious little to address it in a meaningful way.

Tony Romo can do what he does because of relentless and ongoing study that continues to this day. That might be more than those of us who aspire to be insight professionals can or want to commit to, but continuing to do what we have been doing is not going to produce a future where MR can play the essential role it seeks to play.

Finding the soul of research

I stole the title of this post from Simon Chadwick's editorial in the November/December issue of Research World. It reminded me that I, like many young people, began my career as something of an idealist. My first two jobs were with nonprofits and then in 1984 I joined NORC at the University of Chicago, whose tagline was and still is, "Social Science Research in the Public Interest." I spent 11 years of my life there and learned an enormous amount before moving to what I still self-mockingly refer to "the dark side" in 1995.

I was reminded of this while reading Simon's editorial and was especially struck by this sentence:

Polls are where research loses its soul; commercial MR is where it forgets it has one; and social research is where we find it.

Well said, Simon.

Big Data: Part 3

This post is the third and last, at least for now, on my series about MR’s struggles with big data. Its theme is simple: big data is hard.

For starters, the quality of the data is not what we are accustomed to. More often than not the data were collected for some other purpose and the attention paid to the accuracy of individual items, their overall completeness, their consistency over time, their full documentation, and even their meaning pose serious challenges to their reuse. Readers familiar with the Total Survey Error (TSE) model will recognize that big data is vulnerable to all of the same deficiencies as surveys—gaps in coverage, missing data, poor measurement, etc. The key difference is that survey researchers, at least in theory, design and control the data making process in a way that users of big data do not. For users of big data the first step is data munging, often a long and very difficult process with uncertain results.

Then there is the technology. We all have heard about the transition from a world where data are scarce and expensive to one where they are plentiful and cheap, but the reality is that taking big data seriously requires a significant investment in people and technology. There is more to big data than hiring a data scientist. The AAPOR Report on Big Data has a nice summary of the skills and technology needed to do big data. While the report does not put a price tag on the investment, it likely is well beyond what all but the very large market research companies can afford.

Much of the value of big data lies in the potential to merge multiple data sets together (e.g. customer transaction data with social media data or Internet of Things data), but that, too, can be an expensive and difficult process. The heart of this merging process are bits of computer code called ETLs that specify what data are extracted from the source databases, how they are edited and transformed for consistency, and then merged to the output database, typically some type of data warehouse. Take a moment and consider the difficulty of specifying all of those rules. Bigdatamiracle

If you have ever written editing specs for a survey dataset then you have some inkling of the difficulty. Now consider that in a data merge from multiple sources you can have the same variable with different coding; the same variable name used to measure different things; differing rules for determining when a item is legitimately missing and when it is not; detailed rules for matching a record from one data source with a record from another; different entities (customers, products, stores, GPS coordinates, tweets) that need to resolved; and so on. This is difficult, tedious, unglamorous, and error-prone work. Get it wrong, and you have a mess.

To sum up this and the previous two posts, I worry that big data is a much bigger deal than most of us realize. We may fancy ourselves as pioneering in this space but it’s not clear to me that we understand just how hard this is going to be. For all of the talk about paradigm shifts and disruption, this is the real deal if for no other reason than it is the right methodology (if the word applies here) to support the other big disruption, a shift away from focusing on attitudes and opinions to focusing on behavior.

Back in 2013 I put up a post about a big data event put on by ESOMAR in which John Deighton from Harvard gave a talk that was the most compelling description I had heard of the threat of big data to traditional MR. The reaction in the room struck me as more whistling by the graveyard than taking the challenge for what it is. Two years later things don’t feel that much different. We are still in a state of denial. We had better get cracking. 

Big Data: Part 2

This second post in my series about MR’s ongoing struggle with big data is focused on our stubborn resistance to the analytic techniques that are an essential part of the big data paradigm. It’s hard to talk about those analytic challenges without referring to Chris Anderson’s 2008 Wired editorial, “The end of theory: The data deluge makes the scientific method obsolete.”

Faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. . . Petabytes allow us to say: ‘Correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

This is a pretty good statement of the data science perspective and its faith in machine learning--the use of algorithms capable of finding patterns in data unguided by a set of analytic assumptions about the relationship among data items. To paraphrase Vasant Dhar, we are used to asking the question, “Do these data fit this model?” The data scientist asks the question, “What model fits these data?”

The research design and analytic approaches that are at the core of what we do developed at a time when data were scarce and expensive, when the analytic tools at our disposal were weak and under powered. The combination of big data and rapidly expanding computing technology has changed that calculus.

So have the goals of MR. More than ever our clients look to us to predict consumer behavior, something we have often struggled with.  Gartner Predict We need better models. The promise of data science is precisely that: more data and better tools leads to better models.

All of this is anathema to many of us in the social sciences. But there also is a longstanding argument within the statistical profession about the value of algorithmic analysis methods. For example, in 2001 the distinguished statistician Leo Breiman described two cultures within the statistical profession.

One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.  . .If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

One can find similar arguments from statisticians going back to the 1960s.

There are dangers, of course, and arguments about correlation versus causality and endogeneity need to be taken seriously. (Check out Tyler Vigen’s spurious correlation website for some entertaining examples.) But any serious data scientist will be quick to note that doing this kind of analysis requires more than good math skills, massive computing power, and a library of machine learning algorithms. Domain knowledge and critical judgment are essential. Or, as Nate Silver reminds us, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise”