High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way. The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost. To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data. | 高维数据集很难直观地展示其内在结构。虽然二维或三维数据可以绘制图表以显示数据的内在结构,但等价的高维图表则很难理解。为了帮助可视化数据集的结构,必须以某种方式降低维度。 实现这种降维最简单的方法是随机投影数据。虽然这样做可以在一定程度上可视化数据结构,但选择的随机性仍有很大改进空间。在随机投影中,数据的更有趣的结构很可能会丢失。 为了解决这个问题,设计了许多有监督和无监督的线性降维框架,例如主成分分析(PCA)、独立成分分析、线性判别分析等。这些算法定义了特定的标准来选择数据的“有趣”线性投影。这些方法可以非常强大,但通常会忽略数据中的重要非线性结构。 |
Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications. | Manifold可以被认为是一种推广线性框架的尝试,如PCA,以敏感的非线性数据结构。虽然有监督变量存在,但典型的Manifold问题是非监督的:它从数据本身学习数据的高维结构,而不使用预定的分类。 |