分享

经济学人 | 2017.03.04 | Machine-learning census of Ame...

 Marsdry 2017-03-09

导言:本篇为《The Economist》2017年3月4日的科技文章,标题为“A machine-learning census of America's cities ”,介绍的是斯坦福大学教授借助时下大热的machine learning(机器学习)方法来进行粗略的人口调查工作,实现artificial intelligence (AI,人工智能)。



“WOULD it not be of great satisfaction to the king to know, at a designated moment every year, the number of his subjects?” A military engineer by the name of Sébastien le Prestre de Vauban posed this question to Louis XIV in 1686, pitching him the idea of a census. All France’s resources, the wealth and poverty of its towns and the disposition of its nobles would be counted, so that the king could control them better.


  • pitch:Try to persuade someone to buy or accept (something). 

  • census:An official count or survey, especially of a population.


译文:1686年法国军事工程师沃邦给国王灌输人口调查的理念,向国王路易十四提出这样一个问题“如果国王能在每年的某一个特定时间知道他的臣民数量,岂不是一件非常称心如意的事情”。如果整个法国的资源、城镇的贫富和贵族排布情况能调查清楚,那国王的统治就更加有效。


These days, such surveys are common. But they involve a lot of shoe-leather, and that makes them expensive. America, for instance, spends hundreds of millions of dollars every year on a socioeconomic investigation called the American Community Survey; the results can take half a decade to become available. Now, though, a team of researchers, led by Timnit Gebru of Stanford University in California, have come up with a cheaper, quicker method. Using powerful computers, machine-learning algorithms and mountains of data collected by Google, the team carried out a crude, probabilistic census of America’s cities in just two weeks.


  • shoe-leather :informal . Used in reference to the wear on shoes through walking. 口头语,走路带来的鞋子磨损,结合语境这里指步行上门调查工作。


译文:如今这样的普查已是常态,但需要大量挨家挨户上门工作,开展起来成本很高。以美国为例,一项名为“美国社区调查”的社会经济学调查每年要花掉数百万美元,结果要五年才可面世。现在加州斯坦福大学教授Timnit Gebru带领的研究团队发明了一种更加便宜快捷的方法。利用计算机的强大计算能力,机器学习算法和谷歌搜集的大数据,该团队能够在两周内对美国城市人口进行粗略的概率性普查工作。


First, the researchers trained their machine-learning model to recognise the make, model and year of many different types of cars. To do that they used a labelled data set, downloaded from automotive websites like Edmunds and Cars.com. Once the algorithm had learned to identify cars, it was turned loose on 50m images from 200 cities around America, all collected by Google’s Streetview vehicles, which provide imagery for the firm’s mapping applications. Streetview has photographed most of the public streets in America, and in among them the researchers spotted 22m different cars—around 8% of the number on America’s roads.


  • turn loose 释放,解放


译文:首先,这些研究者通过训练机器学习模型,使模型能够识别大量不同类型车辆的品牌、车型和年限。算法模型的数据来源于Edmunds或者Cars.com等汽车网站下载的标记过的数据集。一旦算法学会了识别车辆,研究者们就将算法运用于谷歌街景采图车收集的全美200个城市的五千万份汽车图像。 谷歌收集这些数据是服务于他们的地图应用。谷歌街景项目已经对美国公共街道大部分车辆进行了摄像,算法模型从中识别出两千万不同车辆,占美国在路车辆总额的8%左右。


The computer classified those cars into one of 2,657 categories it had learned from studying the Edmunds and Cars.com data. The researchers then took data from the traditional census, and split them in half. One half was fed to the machine-learning algorithm, so it could hunt for correlations between the cars it saw on the roads in those neighbourhoods and such things as income levels, race and voting intentions. Once that was done, the algorithm was tested on the other half of the census data, to see if these correlations held true for neighbourhoods it had never seen before. They did. The sorts of cars you see in an area, in other words, turn out to be a reliable proxy for all sorts of other things, from education levels to political leanings. Seeing more sedans than pickup trucks, for instance, strongly suggests that a neighbourhood tends to vote for the Democrats.


译文:计算机通过学习汽车网站上的数据,将每辆车归到2657中的一个类别。研究者们把传统的人口普查数据一分为二,一半的数据用于机器学习算法的训练,以探寻社区内车辆和收入水平、种族以及投票意向之间的关联关系。训练工作完成后,再把另一半普查数据作为测试数据集,测试该算法发现的相关关系是否在其他社区同样适用。结果发现一个地区出现的车辆种类确实能够作为一个可靠的代理变量去测算该地区诸如教育水平和政治倾向等各项人口统计学数据。例如,如果一个社区的轿车数量比皮卡多,那么这个社区更倾向于投票给民主党。


The system has limitations: unlike a census, it generates predictions, not facts, and the more fine-grained those predictions are the less certain they become. The researchers reckon their system is accurate to the level of a precinct, an American political division that contains about 1,000 people. And because those predictions rely on the specific, accurate data generated by traditional surveys, it seems unlikely ever to replace them.


译文:当然模型也有其局限性,不同于人口普查,它只是进行预测,得出的结果并非真实数据,而且预测结果越细致,其确定性就越差。研究者们认为该算法体系只能精确到选举区(美国的一种每千人政治区域划分)。而且由于这些预测是依赖于传统普查所产生的精准数据,它永远无法取代传统普查。


On the other hand, it is much cheaper and much faster. Dr Gebru’s system ran on a couple of hundred processors, a modest amount of hardware by the standards of artificial-intelligence research. It nevertheless managed to crunch through its 50m images in two weeks. A human, even one who could classify all the cars in an image in just ten seconds, would take 15 years to do the same.


译文:另一方面,这种方法确实便宜快捷。Gebru博士的这套系统只需数百个计算机处理器即可运行,对比标准的人工智能研究来说已经是非常节省硬件了,却能在两周内成功处理了五千万张图像数据。单个人即使能在10秒钟内将一张图像内的全部汽车区分出来,完成这一庞大工作也需要15年。 


The other advantage of the AI approach is that it can be re-run whenever new data become available. As Dr Gebru points out, Streetview is not the only source of information out there. Self-driving cars, assuming they catch on, will use cameras, radar and the like to keep track of their surroundings. They should, therefore, produce even bigger data sets. (Vehicles made by Tesla, an electric-car firm, are capturing such information even now.) Other kinds of data, such as those from Earth-imaging satellites, which Google also uses to refresh its maps, could be fed into the models, too. De Vauban’s “designated moment” could soon become a constantly updated one.


  • Self-driving cars 新词积累,无人驾驶汽车

  • be fed into 数据去喂模型,表示利用数据训练出模型


译文:人工智能处理方法的另一个优势在于一旦有新的数据,算法能够在新数据上再运用。Gebru博士指出,谷歌街景不是唯一的车辆图像数据来源。假如无人驾驶汽车能大量投入使用,它们可以利用摄像头、雷达以及其他设备记录车辆周围环境,这样可以收集更多数据(电动汽车公司特斯拉制造的汽车已经在做数据收集的工作)。其他数据类型,例如谷歌拿来刷新地图的地球拍照卫星的数据,也可以用于训练算法模型。沃邦所期望的“特定时间”很快就会变成实时了。


注:单词和词组的英文解释均来源于https://en.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多