Previous month:
January 2013
Next month:
March 2013

Posts from February 2013

Sir Martin on crowdsourcing of ads

The March issue of The Harvard Business Review has in interview with Martin Sorrell.  At one point the interviewer asks whether crowdsourced ads and algorithms are the future model of advertising.  Sir Martin’s response is interesting:

You are tapping into the knowledge and the information of people all over the world.  That’s fantastic. And the power of the web is that it opens up and plumbs people’s minds and mines all their knowledge.  But somebody has to assess it, and you can’t do that with an algorithm. There’s a judgment call here.  That’s the problem I see: Ultimately, somebody still has to decide what is the best piece of knowledge or the big idea that you got from the crowd. If you don’t get that right, it fails.

Big data -- not so scary

I’ve been reading Nate Silver’s The Signal and the Noise.  It’s not the sort of book I normally would read, but since Nate kept me from a jumping off a tall building during the last election I felt I owed him the $27.95.  Given Nate’s record predicting election outcomes you might think this is a book that reveals the hidden secrets of the black art of predicting things.  But it’s not.  It’s about how hard it is to make accurate predictions even when we have mountains of data from which to do it.  And it’s causing me to look differently at the issue of Big Data and predictive analytics.

Nate spends a lot of pages on some of those things for which we have lots of data but still aren’t good at predicting – the weather, earthquakes, economic growth, etc.  Consider economic growth.  We all have a sense of just how much economic data there is and how long the time series.  (Nate estimates around 4 million variables.)    But forecasts of growth are all over the map and even “consensus forecasts” routinely are just plain wrong.

Nate argues that predictions fail because we fall victim to two common errors.  The first is to overfit the prediction model into something that looks very sophisticated and plausible but either ignores important variables or simply fails to understand the underlying structure of the data.  Machine learning is especially susceptible to overfitting.  The second is the classic error of interpreting correlation as causation.  A good example is the Super Bowl indicator, which says that the direction of the stock market can be predicted based on who wins the Super Bowl. 

Ultimately, we need to be able to make good decisions about which data are important.  And we need to be able to look at what a model is saying, why it’s saying it, and judge whether it makes sense.  Finally, we need to understand the uncertainty in the prediction and communicate it.  That sounds a lot like MR, except for that last part about uncertainty.

Right now the possibility of a future world of petabytes, MPP architectures, neural networks and naïve Bayes is scaring the pants off a lot of people in the MR industry. It may well be very bad news for MR companies but maybe not so bad for the MR profession.  There always will be demand for people who understand data, consumers and the competitive challenges that client companies face in the marketplace. 

Or, as Nate writes, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise.”

Faster is better

It’s the college basketball season and that means yours truly is spending way too much time in front of his TV.  One of the more annoying commercials that gets repeated over and over is this one by AT&T, driving home the message that faster is better.  At least on your iPhone.  It’s a sort of focus group with elementary school kids.  All staged, of course. It reminded me of the current buzz about System 1 and System 2 thinking, probably best described in Daniel Kahneman’s excellent book, Thinking Fast and Slow, and what that might tell us about what is a “good” survey question. 

Over about the last decade I had the good fortune to work with a group of old friends (who happen to be world-class survey methodolgists ) fielding experiments on web survey design.  The three of them have collaborated on a book to be published in April that pretty much sums up what they learned over the course of those experiments.   The experiments mostly consisted of varying the presentation of a set of questions to see what changes those different presentations produced in how people answered the questions.  One important variable analyzed in almost every case was response time.  Presentations that allowed respondents to answer quickly were generallyKaizen thinking fast and slow judged to be better than those that took longer.

This sort of fits with what I think I have always known about good questionnaire design.  When we can design a question for which a respondent has a well formed and easily retrieved answer (System 1 thinking) we get good data, at least in the sense that the answer is what the respondent believes to be true and probably acts on.  But the more respondents have to think (System 2) the shakier it gets.  Or, in some cases, don’t bother to think at all.  Look no further than customer sat questionnaires and the difference between the top of mind opinion you get when you ask about overall satisfaction first (System 1 thinking again) versus when you ask it later after you taken the poor respondent through the full attribute set (i.e., forced System 2 thinking).  

The folks arguing for gamficiation of surveys questions seem to think that the longer someone takes to answer a question the better the answer.  While that may be true for certain types of questions, in most cases it’s probably a bad sign.  Faster probably really is better.