分享

为什么数据不能替代思考(英文)

 haosunzhe 2014-11-28

Why Data Will Never Replace Thinking


Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”



Last year, at the Techonomy conference outside Tucson, I heard Vivek Ranadivé — founder and CEO of financial-data software provider TIBCO, subject of a Malcolm Gladwell article on how to win at girls’ basketball, and part owner of the Golden State Warriors — say pretty much the same thing:



I believe that math is trumping science. What I mean by that is you don’t really have to know why, you just have to know that if a and b happen, c will happen.



Anderson and Ranadivé are reacting to something real. If the scientific method is to observe, hypothesize, test, and analyze, the explosion of available data and computing power have made observation, testing, and analysis so cheap and easy in many fields that one can test far more hypotheses than was previously possible. Quick-and-dirty online “A/B tests,” in which companies like Google and Amazon show different offers or page layouts to different people and simply go with the approach that gets the best response, are becoming an established way of doing business.



But that does that really mean there are no hypotheses involved? At Techonomy, Ranadivé made his math-is-trumping-science comments after recommending that the Federal Open Market Committee, which sets monetary policy in the U.S., be replaced with a computer program. Said he:



The fact is, you can look at information in real time, and you can make minute adjustments, and you can build a closed-loop system, where you continuously change and adjust, and you make no mistakes, because you’re picking up signals all the time, and you can adjust.



As best I can tell, there are three hypotheses inherent in this replace-the-Fed-with-algorithms-plan. The first is that you can build U.S. monetary policy into a closed-loop system, the second is that past correlations in economic and financial data can usually be counted on to hold up in the future, and the third is that when they don’t you’ll always be able to make adjustments as new information becomes available.


These feel like pretty dubious hypotheses to me, similar to the naive assumptions of financial modelers at ratings agencies and elsewhere that helped bring on the financial crisis of 2007 and 2008. (To be fair, Ranadivé is a bit more nuanced about this stuff in print.) But the bigger point is that they are hypotheses. And since they’d probably prove awfully expensive to test, they’ll presumably stay hypotheses for a while.



There are echoes here of a centuries-old debate, unleashed in the 1600s by protoscientist Sir Francis Bacon, over whether deduction from first principles or induction from observed reality is the best way to get at truth. In the 1930s, philosopher Karl Popper proposed a synthesis, in which the only scientific approach was to formulate hypotheses (using deduction, induction, or both) that were falsifiable. That is, they generated predictions that — if they failed to pan out — disproved the hypothesis.



Actual scientific practice is more complicated than that. But the element of hypothesis/prediction remains important, not just to science but to the pursuit of knowledge in general. We humans are quite capable of coming up with stories to explain just about anything after the fact. It’s only by trying to come up with our stories beforehand, then testing them, that we can reliably learn the lessons of our experiences — and our data. In the big-data era, those hypotheses can often be bare-bones and fleeting, but they’re still always there, whether we acknowledge them or not.



“The numbers have no way of speaking for themselves,” political forecaster Nate Silver writes, in response to Chris Anderson, near the beginning of his wonderful new doorstopper of a book, The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t. “We speak for them.” He continues:



Data-driven predictions can succeed — and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.



One key role we play in the process is choosing which data to look at. That this choice is often made for us by what happens to be easiest to measure doesn’t make it any less consequential, as Samuel Arbesman wrote in Sunday’s Boston Globe (warning: paywall):



Throughout history, in one field after another, science has made huge progress in precisely the areas where we can measure things — and lagged where we can’t.



In his book, Silver spends a lot of time on another crucial element, how we go about revising our views as new data comes in. Silver is a big believer in the Bayesian approach to probability, in which we all have our own subjective ideas about how things are going to pan out, but follow the same straightforward rules in revising those assessments as we get new information. It’s a process that uses data to refine our thinking. But it doesn’t work without some thinking first.


    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多