北京大学：王立威教授——面向理解的深度学习

手写的从前2016 2018-01-22

展开全文

王立威北京大学信息科学技术学院教授。1999年和2002年分别于清华大学电子工程系获本科、硕士学位。2005年毕业于北京大学数学系，获博士学位。2005年起在北京大学信息学院任教。长期从事机器学习研究。在机器学习领域国际权威期刊会议发表论文100余篇。2011年入选AI’s 10 to Watch，是自该奖项设立以来首位入选的亚洲学者。2012年获首届国家自然科学基金优秀青年基金，并入选新世纪优秀人才。担任人工智能国际权威会议NIPS等领域主席和多家学术期刊编委。目前为中国计算机学会模式识别与人工智能专委会委员、中国人工智能学会模式识别专委会委员。

摘要： Deep learning has achieved greatsuccess in many applications. However, deep learning is a mystery from alearning theory point of view. In all typical deep learning tasks, the numberof free parameters of the networks is at least an order of magnitude largerthan the number of training data. This rules out the possibility of using anymodel complexity-based learning theory (VC dimension, Rademacher complexityetc.) to explain the good generalization ability of deep learning. Indeed, thebest paper of ICLR 2017 “Understanding Deep Learning Requires RethinkingGeneralization” conducted a series of carefully designed experiments andconcluded that all previously well-known learning theories fail to explain thephenomenon of deep learning.
In this talk, I will give two theories characterizing the generalizationability of Stochastic Gradient Langevin Dynamics (SGLD), a variant of thecommonly used Stochastic Gradient Decent (SGD) algorithm in deep learning.Building upon tools from stochastic differential equation and partialdifferential equation, I show that SGLD has strong generalization power. Thetheory also explains several phenomena observed in deep learning experiments.