分享

semi-supervised learning

 slsally 2012-06-11

semi-supervised learning

分类: 机器学习

数据挖掘]Semi-Supervised Learning(半监督学习)
网上资源,  心得体会

数据挖掘青年 发表于 2007-8-18 18:42:41

以下为Xiaojin Zhu在ICML2007上的陈述内容梗概,更多内容可以在他的主页上得到。懒得整理成中文了,而且很多术语虽然理解了意思,但还不知道确切的中文描述...现在才深刻体会到,要想学习前沿的知识,只有通过互联网,平常看书只是看些稳定成型的老知识。。。

半监督学习已经兴起七八年了吧,但在中国还是刚刚起步罢。

半监督学习已经兴起七八年了吧,但在中国还是刚刚起步罢。

一、Introduction to semi-supervised learning

   What is semi-supervised learning and transductive learning?  Why can we ever learn a classifier from unlabeled data?  Does unlabeled data always help?  Which semi-supervised learning methods are out there?  Which one should I use?  Answers to these questions set the stage for a detailed look at individual algorithms.

二、Semi-supervised learning algorithms

   In fact we will focus on classification algorithms that uses both labeled and unlabeled data.  Several families of algorithms will be discussed, which uses different model assumptions:

1、Self-training

  Probably the earliest semi-supervised learning method.  Still extensively used in the natural language processing community.

2、Generative models

   Mixture of Gaussian or multinomial distributions, Hidden Markov Models, and pretty much any generative model can do semi-supervised learning.  We will also look into the EM algorithm, which is often used for training generative models when there is unlabeled data.

3、S3VMs

  Originally called Transductive SVMs, they are now called Semi-Supervised SVMs to emphasize the fact that they are capable of induction too, not just transduction.  The idea is simple and elegant, to find a decision boundary in 'low density' regions.  However, the optimization problem behind it is difficult, and so we will discuss the various optimization techniques for S3VM, including the one used in SVM-light, Convex-Concave Procedure (CCCP), Branch-and-Bound, continuation method, etc.

4、Graph-based methods

   Here one constructs a graph over the labeled and unlabeled examples, and assumes that two strongly-connected examples tend to have the same label.  The graph Laplacian matrix is a central quantity.  We will discuss representative algorithms, including manifold regularization.

5、Multiview learning

   Exemplified by the Co-Training algorithm, these methods employ multiple 'views' of the same problem, and require that different views produce similar classifications.

6、Other approaches

   Metric based model selection, tree-based learning, information-based method, etc.

7、Related problems

   Regression with unlabeled data, clustering with side information, classification with positive and unlabeled data; dimensionality reduction with side information, inferring label missing mechanism, etc.

三、Semi-supervised learning in nature

    Long before computers come around and machine learning becomes a discipline, learning has occurred in nature.  Is semi-supervised learning part of it?  The research in this area has just begun.  We will look at a few case studies, ranging from infant word learning, human visual system, and human categorization behavior.

四、Challenges for the future

    There are many open questions.  What new algorithms / assumptions can we make?  How to efficiently perform semi-supervised learning for very large problems?  What special methods are needed for structured output domains?  Can we find a way to guarantee that unlabeled data would not decrease performance?  What can we borrow from natural learning?  We suggest these as a few potential research directions.

研究半监督的人,主页上有更多更详细的介绍:

http://pages.cs./~jerryzhu/

http://www.kyb.tuebingen./~chapelle

from:http://bbs./blog/more.asp?name=DMman&id=27357

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多