There has been a lot of buzz in MR circles about US electoral polling and especially the summary of accuracy of the polls produced by Nate Silver. As I wrote in this blog at the time Nate's piece appeared it's only natural for researchers to zero in on data collection methodology and that's pretty much happened in spades, most recently in today's research-live bulletin. But I can't help but wonder whether all of these arguments about telephone, online, cell phones, robopolls, etc. miss the real point.
The political pros who are the consumers of these data (we might call them clients) see a different story, much of it traceable to consistent misjudgments about who would actually show up at the polls and vote. This interview with Obama pollster Joel Berenson makes two really important points. The first is that the likely voter models used by many pollsters no longer work in the world of sophisticated GOTV efforts like the one the Obama campaign engineered. The second point, and perhaps more important one, is the folly of relying on a singled data source. The Obama campaign built its prediction models from multiple sources, some good and some not so good. And they did it in a way that obviously was very effective.
The really interesting story here would be for someone to do a deep dive on the sampling methodologies and likely voter models of the major polling firms, although I don't expect to see that anytime soon.
So let's not make too much of a fuss over this or draw lessons that may not be worth learning. Putting it in MR terms we might say that where there was failure it was not one of technique but the more fundamental error of not understanding the dynamics of that particular marketplace. And let's not overlook the fact that the most accurate poll of all (missed by just .1%) was by the Columbus Post Dispatch, and it was a mail survey.