如何選擇機器學習演算法 | Microsoft Docs

Arthur1668 2018-03-05

展开全文

如何選擇 Microsoft Azure Machine Learning 的演算法How to choose algorithms for Microsoft Azure Machine Learning

18/12/2017

機器學習演算法小祕技The Machine Learning Algorithm Cheat Sheet

Microsoft Azure Machine Learning 演算法小祕技 可協助您從 Microsoft Azure Machine Learning 演算法資源庫中選擇適合您預測性分析解決方案的機器學習演算法。The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Microsoft Azure Machine Learning library of algorithms.本文將引導您如何使用它。This article walks you through how to use it.

注意

若要下載小祕技，並搭配本文使用，請移至適用於 Microsoft Azure Machine Learning Studio 的機器學習演算法小祕技。To download the cheat sheet and follow along with this article, go to Machine learning algorithm cheat sheet for Microsoft Azure Machine Learning Studio.

請記住，這份小祕技有非常特定的預設對象：一位剛起步的資料科學家，其機器學習的經驗為大學生程度，正試著在 Azure Machine Learning Studio 中選擇要開始使用的演算法。This cheat sheet has a very specific audience in mind: a beginning data scientist with undergraduate-level machine learning, trying to choose an algorithm to start with in Azure Machine Learning Studio. 這表示小祕技可能會比較概括且過於簡化，但它為您指引一個可靠的方向。That means that it makes some generalizations and oversimplifications, but it points you in a safe direction. 同時這也意味著還有許多演算法並未列入其中。It also means that there are lots of algorithms not listed here. 當 Azure 機器學習成長到擁有一組更完整的可用方法時，我們就會新增這些演算法。As Azure Machine Learning grows to encompass a more complete set of available methods, we'll add them.

這些建議是收集許多資料科學家與機器學習專家的意見反應和提示所編撰而成。These recommendations are compiled feedback and tips from many data scientists and machine learning experts. 雖然我們的想法並不一致，但我已試著將我們的意見整理成粗略的共識。We didn't agree on everything, but I've tried to harmonize our opinions into a rough consensus. 而大部分的爭論其實都具有同一個考量：「視情況而定。」Most of the statements of disagreement begin with 'It depends…'

如何使用小祕技How to use the cheat sheet

請將圖表上的路徑和演算法標籤解讀為「如果需要<路徑標籤>則使用<演算法>。」Read the path and algorithm labels on the chart as 'For , use .' 例如「如果需要 speed (速度)，則使用 two class logistic regression (雙類別羅吉斯迴歸)。」For example, 'For speed, use two class logistic regression.' 有時候適用於多個分支。Sometimes more than one branch applies.有時候則不完全適用。Sometimes none of them are a perfect fit. 這些建議通常是來自經驗法則，因此不必擔心是否準確。They're intended to be rule-of-thumb recommendations, so don't worry about it being exact.我和一些資料科學家討論過，他們都認為唯有全部試用一次，才能找出最佳的演算法。Several data scientists I talked with said that the only sure way to find the very best algorithm is to try all of them.

以下是 Azure AI 資源庫中的實驗範例，該實驗對相同的資料嘗試數種演算法，並比較其結果：比較多類別分類器：字母辨識。Here's an example from the Azure AI Gallery of an experiment that tries several algorithms against the same data and compares the results: Compare Multi-class Classifiers: Letter recognition.

提示

若要下載並列印提供 Machine Learning Studio 功能概觀的圖表，請參閱 Azure Machine Learning Studio 功能的概觀圖。To download and print a diagram that gives an overview of the capabilities of Machine Learning Studio, see Overview diagram of Azure Machine Learning Studio capabilities.

機器學習的類型Flavors of machine learning

監督式Supervised

監督式學習演算法會根據一組範例做出預測。Supervised learning algorithms make predictions based on a set of examples. 例如，利用歷史股價來大膽猜測未來的價格。For instance, historical stock prices can be used to hazard guesses at future prices. 用於定型的各個範例都會標上需要關注的值，在這裡指的就是股價。Each example used for training is labeled with the value of interest—in this case the stock price. 監督式學習演算法會在這些值標籤中尋找模式。A supervised learning algorithm looks for patterns in those value labels. 它可以使用任何可能相關的資訊 (星期幾、季度、公司的財務資料、產業類型、是否有破壞性的地緣政治事件等)，然後每個演算法就會尋找不同類型的模式。It can use any information that might be relevant—the day of the week, the season, the company's financial data, the type of industry, the presence of disruptive geopolitical events—and each algorithm looks for different types of patterns. 當演算法找到最佳模式之後，它會使用這種模式為沒有標示的測試資料 (也就是未來的股價) 做出預測。After the algorithm has found the best pattern it can, it uses that pattern to make predictions for unlabeled testing data—tomorrow's prices.

監督式學習是常見且實用的機器學習類型。Supervised learning is a popular and useful type of machine learning. 除了一個例外之外，Azure Machine Learning 中的所有模組都是監督式學習演算法。With one exception, all the modules in Azure Machine Learning are supervised learning algorithms. Azure 機器學習中有幾個代表性的特定監督式學習類型：分類、迴歸和異常偵測。There are several specific types of supervised learning that are represented within Azure Machine Learning: classification, regression, and anomaly detection.

分類。Classification. 當資料用來預測類別時，這種監督式學習也稱為分類。When the data are being used to predict a category, supervised learning is also called classification. 將影像指定為 'cat' 或 'dog' 的圖片便屬這種情況。This is the case when assigning an image as a picture of either a 'cat' or a 'dog'. 如果只有兩個選擇，則稱作雙類別或二項式分類。When there are only two choices, it's called two-class or binomial classification. 如果有多個類別，例如預測 NCAA 季後賽的優勝隊伍，則這個問題就稱為 多類別分類。When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.
迴歸。Regression. 如果要預測值，例如股價，這種監督式學習稱為迴歸。When a value is being predicted, as with stock prices, supervised learning is called regression.
異常偵測。Anomaly detection. 有時候它的目的只是要找出異常的資料點。Sometimes the goal is to identify data points that are simply unusual. 例如在偵測詐騙時，只要是極不尋常的信用卡消費模式都有嫌疑。In fraud detection, for example, any highly unusual credit card spending patterns are suspect. 由於詐騙可能產生的變化過多，而定型的範例過少，因此難以學習何謂詐騙活動。The possible variations are so numerous and the training examples so few, that it's not feasible to learn what fraudulent activity looks like. 異常偵測採用的方法，只能使用非詐騙交易的歷史記錄來了解何謂正常活動，並找出與正常活動明顯不同的情況。The approach that anomaly detection takes is to simply learn what normal activity looks like (using a history non-fraudulent transactions) and identify anything that is significantly different.

未監督式Unsupervised

在未監督的學習中，資料點沒有與其相關聯的標籤。In unsupervised learning, data points have no labels associated with them. 然而，未經指導的學習演算法的目標在於以某種方式組織資料或描述其結構。Instead, the goal of an unsupervised learning algorithm is to organize the data in some way or to describe its structure. 這種方式可能是將資料劃分為叢集，或尋找各種查看複雜資料的方式，讓資料變得更簡單或更整齊。This can mean grouping it into clusters or finding different ways of looking at complex data so that it appears simpler or more organized.

增強式學習Reinforcement learning

在增強式學習中，演算法需要選擇一個動作來回應每個資料點。In reinforcement learning, the algorithm gets to choose an action in response to each data point. 此學習演算法也會在短時間內收到獎勵訊號，指出決策的好壞程度。The learning algorithm also receives a reward signal a short time later, indicating how good the decision was.演算法會據此修改其策略，以達到最高的獎勵。Based on this, the algorithm modifies its strategy in order to achieve the highest reward. Azure 機器學習中目前沒有增強式學習演算法模組。Currently there are no reinforcement learning algorithm modules in Azure Machine Learning. 增強式學習是機器人領域中的常見方法，其中在某個時間點的感應器讀數集就是一個資料點，而演算法必須選擇機器人的下一個動作。Reinforcement learning is common in robotics, where the set of sensor readings at one point in time is a data point, and the algorithm must choose the robot's next action. 它的性質也很適合物聯網應用。It is also a natural fit for Internet of Things applications.

選擇演算法時的考量Considerations when choosing an algorithm

精確度Accuracy

您不一定常常需要取得最準確的答案。Getting the most accurate answer possible isn't always necessary.視您的用途而定，有時候近似值便已足夠。Sometimes an approximation is adequate, depending on what you want to use it for. 如果是這樣，您就能採用近似法，並大幅縮短處理時間。If that's the case, you may be able to cut your processing time dramatically by sticking with more approximate methods. 近似法的另一項優點是，它們會自然傾向於避免過度學習。Another advantage of more approximate methods is that they naturally tend to avoid overfitting.

定型時間Training time

定型出一個模型可能需要幾分鐘或幾小時，這在各個演算法間有很大的差異。The number of minutes or hours necessary to train a model varies a great deal between algorithms. 定型時間通常取決於精確度，這兩者的關係密不可分。Training time is often closely tied to accuracy—one typically accompanies the other. 此外，有些演算法對資料點的數目較為敏感。In addition, some algorithms are more sensitive to the number of data points than others.如果有時間限制，就可以促使演算法做出選擇 (尤其是資料集很大時)。When time is limited it can drive the choice of algorithm, especially when the data set is large.

線性Linearity

許多機器學習演算法都會使用線性。Lots of machine learning algorithms make use of linearity. 線性分類演算法會假設可以直線 (或較高維度類比) 分隔類別。Linear classification algorithms assume that classes can be separated by a straight line (or its higher-dimensional analog). 這些演算法包括羅吉斯迴歸和支援向量機器 (如同 Azure 機器學習中所實作)。These include logistic regression and support vector machines (as implemented in Azure Machine Learning).線性迴歸演算法會假設資料趨勢依循著一條直線。Linear regression algorithms assume that data trends follow a straight line. 這類假設對某些問題而言還不錯，但在其他問題上會降低精確度。These assumptions aren't bad for some problems, but on others they bring accuracy down.

非線性類別界限

非線性類別界限 - 依賴線性分類演算法會造成低精確度的結果Non-linear class boundary - relying on a linear classification algorithm would result in low accuracy

具有非線性趨勢的資料

具有非線性趨勢的資料 ：使用線性迴歸方法會產生較大且不必要的誤差Data with a nonlinear trend - using a linear regression method would generate much larger errors than necessary

儘管有風險，線性演算法對於首次攻擊而言仍是一種非常熱門的方式。Despite their dangers, linear algorithms are very popular as a first line of attack. 這種演算法定型起來通常又快又簡單。They tend to be algorithmically simple and fast to train.

參數數目Number of parameters

參數是資料科學家在設定演算法時的必經之路。Parameters are the knobs a data scientist gets to turn when setting up an algorithm. 參數就是會影響演算法行為的數值，例如容錯或反覆運算次數，或是演算法運作方式的變化選項。They are numbers that affect the algorithm's behavior, such as error tolerance or number of iterations, or options between variants of how the algorithm behaves. 定型時間和演算法的精確度有時候很容易因為設定是否正確而受到影響。The training time and accuracy of the algorithm can sometimes be quite sensitive to getting just the right settings. 一般而言，具有大量參數的演算法需要最多次的反覆試驗，才能找出好的組合。Typically, algorithms with large numbers parameters require the most trial and error to find a good combination.

或者，Azure 機器學習中有參數掃掠模組區塊，會依照您選擇的細微性，自動嘗試所有參數組合。Alternatively, there is a parameter sweeping module block in Azure Machine Learning that automatically tries all parameter combinations at whatever granularity you choose. 雖然這是確認是否定義出參數空間的好方法，但定型模型時所需的時間，仍會以指數方式隨著參數數目而增加。While this is a great way to make sure you've spanned the parameter space, the time required to train a model increases exponentially with the number of parameters.

一般而言，具有許多參數的優點是可讓演算法有更大的彈性。The upside is that having many parameters typically indicates that an algorithm has greater flexibility. 這通常可以達到很高的精確度。It can often achieve very good accuracy. 不過前提是您得為參數設定找到正確的組合。Provided you can find the right combination of parameter settings.

特徵數目Number of features

就特定的資料類型而言，可能會有比資料點數目更龐大的特徵數目。For certain types of data, the number of features can be very large compared to the number of data points. 基因學或文字資料通常屬於這種情況。This is often the case with genetics or textual data. 大量的特徵會使一些學習演算法陷入膠著，讓定型時間長到無法作業。The large number of features can bog down some learning algorithms, making training time unfeasibly long. 支援向量機器就特別適合此情況 (請參閱下述)。Support Vector Machines are particularly well suited to this case (see below).

特殊案例Special cases

有些學習演算法會對資料結構或想要的結果做出特定假設。Some learning algorithms make particular assumptions about the structure of the data or the desired results. 如果可以找到符合需求的假設，您就能獲得更實用的結果、更精確的預測或更快的定型時間。If you can find one that fits your needs, it can give you more useful results, more accurate predictions, or faster training times.

演算法Algorithm	精確度Accuracy	定型時間Training time	線性Linearity	參數Parameters	注意事項Notes
雙類別分類Two-class classification
羅吉斯迴歸logistic regression		●●	●●	55
決策樹系decision forest	●●	○○		66
決策叢林decision jungle	●●	○○		66	低記憶體使用量Low memory footprint
促進式決策樹boosted decision tree	●●	○○		66	高記憶體使用量Large memory footprint
類神經網路neural network	●●			99	支援其他自訂項目Additional customization is possible
平均感知器averaged perceptron	○○	○○	●●	44
支援向量機器support vector machine		○○	●●	55	適用於大型特徵集Good for large feature sets
本機深度支援向量機器locally deep support vector machine	○○			88	適用於大型特徵集Good for large feature sets
貝氏點機器Bayes’ point machine		○○	●●	33
多類別分類Multi-class classification
羅吉斯迴歸logistic regression		●●	●●	55
決策樹系decision forest	●●	○○		66
決策叢林 decision jungle	●●	○○		66	低記憶體使用量Low memory footprint
類神經網路neural network	●●			99	支援其他自訂項目Additional customization is possible
one-v-allone-v-all	-	-	-	-	請參閱選取的兩個類別方法的屬性See properties of the two-class method selected
迴歸Regression
線性linear		●●	●●	44
貝氏線性Bayesian linear		○○	●●	22
決策樹系decision forest	●●	○○		66
促進式決策樹boosted decision tree	●●	○○		55	高記憶體使用量Large memory footprint
快速樹系分量fast forest quantile	●●	○○		99	分佈而不是點預測Distributions rather than point predictions
類神經網路neural network	●●			99	支援其他自訂項目Additional customization is possible
波氏Poisson			●●	55	技術上為對數線性。Technically log-linear. 針對預測計算For predicting counts
序數ordinal				00	針對預測順位排序For predicting rank-ordering
異常偵測Anomaly detection
支援向量機器support vector machine	○○	○○		22	特別適用於大型特徵集Especially good for large feature sets
PCA 型異常偵測PCA-based anomaly detection		○○	●●	33
K-meansK-means		○○	●●	44	叢集演算法A clustering algorithm

演算法屬性：Algorithm properties:

● ：顯示優異的精確度、快速定型時間及使用線性● - shows excellent accuracy, fast training times, and the use of linearity

○ ：顯示不錯的精確度和適度的定型時間○ - shows good accuracy and moderate training times

演算法備註Algorithm notes

線性迴歸Linear regression

如同之前提到，線性迴歸會使一條線 (或平面或超平面) 符合資料集。As mentioned previously, linear regression fits a line (or plane, or hyperplane) to the data set. 它是常被使用的主力，簡單又快速，但可能會過度簡化某些問題。It's a workhorse, simple and fast, but it may be overly simplistic for some problems.請查看這裡的線性迴歸教學課程。Check here for a linear regression tutorial.

具有線性趨勢的資料

具有線性趨勢的資料Data with a linear trend

羅吉斯迴歸Logistic regression

雖然名稱中的「迴歸」讓人感到有點奇怪，但羅吉斯迴歸對雙類別和多類別分類而言，其實是很強大的工具。Although it confusingly includes 'regression' in the name, logistic regression is actually a powerful tool for two-class and multiclass classification. 它既快速又簡單。It's fast and simple. 它事實上會使用 'S' 形的曲線而不是一條直線，這使它在性質上很適合將資料分組。The fact that it uses an 'S'-shaped curve instead of a straight line makes it a natural fit for dividing data into groups. 羅吉斯迴歸提供線性類別界限，因此使用它時，請確定線性近似值是您能接受的結果。Logistic regression gives linear class boundaries, so when you use it, make sure a linear approximation is something you can live with.

羅吉斯迴歸與只有一項特徵的雙類別資料

羅吉斯迴歸與只有一項特徵的雙類別資料 - 類別界限的點就是羅吉斯曲線接近這兩個類別的地方A logistic regression to two-class data with just one feature - the class boundary is the point at which the logistic curve is just as close to both classes

樹、樹系和叢林Trees, forests, and jungles

決策樹系 (迴歸、雙類別和多類別)、決策叢林 (雙類別和多類別) 以及促進式決策樹 (迴歸和雙類別)，都是以基本的機器學習概念「決策樹」做為基礎。Decision forests (regression, two-class, and multiclass), decision jungles (two-class and multiclass), and boosted decision trees (regression and two-class) are all based on decision trees, a foundational machine learning concept. 決策樹有許多變化，但是用途都相同：將特徵空間細分成區域，這些區域大多具有相同的標記。There are many variants of decision trees, but they all do the same thing—subdivide the feature space into regions with mostly the same label. 根據您是執行分類或迴歸而定，這些區域可能會有一致的類別或常數值。These can be regions of consistent category or of constant value, depending on whether you are doing classification or regression.

細分特徵空間的決策樹

此決策樹將特徵空間細分為值大致統一的區域A decision tree subdivides a feature space into regions of roughly uniform values

由於特徵空間可以任意細分成較小的區域，因此會很容易推斷出每個區域都能細分成只有一個資料點。Because a feature space can be subdivided into arbitrarily small regions, it's easy to imagine dividing it finely enough to have one data point per region. 而這就是過度學習的極端範例。This is an extreme example of overfitting. 若要避免這個問題，建構一大組樹需要採取特殊的數學計算方式，也就是讓樹與樹之間沒有相互關聯。In order to avoid this, a large set of trees are constructed with special mathematical care taken that the trees are not correlated. 這種「決策樹系」的平均就是可避免過度學習的樹。The average of this 'decision forest' is a tree that avoids overfitting. 決策樹系會使用大量記憶體。Decision forests can use a lot of memory. 決策叢林則是使用較少記憶體的變體，但代價是定型時間較長。Decision jungles are a variant that consumes less memory at the expense of a slightly longer training time.

促進式決策樹可藉由限制細分的次數，以及每個區域中允許的最少資料點，來避免過度學習。Boosted decision trees avoid overfitting by limiting how many times they can subdivide and how few data points are allowed in each region. 此演算法會建構一連串的樹，其中每個樹都會學習彌補前一個樹所留下來的錯誤。The algorithm constructs a sequence of trees, each of which learns to compensate for the error left by the tree before. 這種學習方式雖然非常精確，但通常會使用大量記憶體。The result is a very accurate learner that tends to use a lot of memory. 如需完整的技術說明，請參閱 Friedman 的原始文件。For the full technical description, check out Friedman's original paper.

快速樹系分量迴歸是決策樹的變化之一，適用的特殊案例是，您不只想要知道區域內資料的一般 (中位數) 值，還想知道資料在分量形式中的分佈。Fast forest quantile regression is a variation of decision trees for the special case where you want to know not only the typical (median) value of the data within a region, but also its distribution in the form of quantiles.

類神經網路和感知器Neural networks and perceptrons

類神經網路是受大腦啟發的學習演算法，其中涵蓋多類別、雙類別和迴歸問題。Neural networks are brain-inspired learning algorithms covering multiclass, two-class, and regression problems. 雖然它們具有無限的變化，但 Azure 機器學習內的類神經網路都是定向非循環圖的形式。They come in an infinite variety, but the neural networks within Azure Machine Learning are all of the form of directed acyclic graphs. 這表示輸入的特徵在轉換成輸出前，會在一連串的層中一直向前傳遞，且永遠不會向後。That means that input features are passed forward (never backward) through a sequence of layers before being turned into outputs. 在每個層中，會以各種組合加權輸入、再加以加總，然後傳遞到下一個層。In each layer, inputs are weighted in various combinations, summed, and passed on to the next layer. 這種簡單的計算組合能夠學習複雜的類別界限和資料趨勢，就好像魔術一樣。This combination of simple calculations results in the ability to learn sophisticated class boundaries and data trends, seemingly by magic. 這類擁有許多層的網路會執行所謂的「深度學習」，聽起來很適合當作科技報導和科幻小說題材的能力。Many-layered networks of this sort perform the 'deep learning' that fuels so much tech reporting and science fiction.

可惜這種高效能並非隨手可得。This high performance doesn't come for free, though. 類神經網路需要用很長的時間來定型，特別是有許多特徵的大型資料集。Neural networks can take a long time to train, particularly for large data sets with lots of features. 它們也比大部分的演算法具有更多參數，這表示參數掃掠會大幅延長定型時間。They also have more parameters than most algorithms, which means that parameter sweeping expands the training time a great deal.但對於那些想要自行指定專屬網路結構的佼佼者而言，它們所賦予的可能性將會是取之不盡的。And for those overachievers who wish to specify their own network structure, the possibilities are inexhaustible.

類神經網路所學到的界限可能會相當複雜且不規則The boundaries learned by neural networks can be complex and irregular

雙類別的平均感知器是可以急速定型的類神經網路。The two-class averaged perceptron is neural networks' answer to skyrocketing training times. 它使用的網路結構會提供線性類別界限。It uses a network structure that gives linear class boundaries. 以現今的標準來看，它似乎過於簡略，但它長期以來具有穩定運作的歷史記錄，而且小到足以快速學會。It is almost primitive by today's standards, but it has a long history of working robustly and is small enough to learn quickly.

SVMSVMs

支援向量機器 (SVM) 會尋找使用盡可能寬的邊界來分隔類別的界限。Support vector machines (SVMs) find the boundary that separates classes by as wide a margin as possible. 如果無法清楚地分隔兩個類別，演算法就會盡量找出最佳界限。When the two classes can't be clearly separated, the algorithms find the best boundary they can. 如同 Azure 機器學習中所述，雙類別 SVM 會以一條直線來尋找As written in Azure Machine Learning, the two-class SVM does this with a straight line only. (但以 SVM 的說法，應該是使用線性核心)。因為使用這種線性近似值，所以能夠相當快速地執行。(In SVM-speak, it uses a linear kernel.) Because it makes this linear approximation, it is able to run fairly quickly. 可以真正發揮它功效的地方是特徵密集的資料，例如文字或基因資料。Where it really shines is with feature-intense data, like text or genomic. 在這些情況中，SVM 能比其他大部分的演算法更快區隔出類別，也較不容易過度學習，只不過它需要適度的記憶體量才能執行。In these cases SVMs are able to separate classes more quickly and with less overfitting than most other algorithms, in addition to requiring only a modest amount of memory.

支援向量機器類別界限

典型的支援向量機器類別界限，會將分隔兩個類別的邊界最大化A typical support vector machine class boundary maximizes the margin separating two classes

另一個 Microsoft Research 的產品雙類別本機深度 SVM 是 SVM 的非線性變體，能夠保留線性版本中大部分的速度和記憶體效率。Another product of Microsoft Research, the two-class locally deep SVM is a non-linear variant of SVM that retains most of the speed and memory efficiency of the linear version. 當線性方法的答案不夠精確時，就非常適合使用它。It is ideal for cases where the linear approach doesn't give accurate enough answers. 開發人員藉由將問題分成一堆小型線性 SVM 問題，讓它能保持快速。The developers kept it fast by breaking down the problem into a bunch of small linear SVM problems. 請閱讀完整說明，其中有如何完成這個技巧的詳細資料。Read the full description for the details on how they pulled off this trick.

使用聰明的非線性 SVM 延伸模組 (即單一類別 SVM )，可繪製出緊密圍繞整個資料集的界限。Using a clever extension of nonlinear SVMs, the one-class SVM draws a boundary that tightly outlines the entire data set. 這適合用於異常偵測。It is useful for anomaly detection. 任何遠超出該界限外的新資料點，其異常程度都值得特別注意。Any new data points that fall far outside that boundary are unusual enough to be noteworthy.

貝氏方法Bayesian methods

貝氏方法具有令人滿意的高品質：可避免過度學習。Bayesian methods have a highly desirable quality: they avoid overfitting. 這些方法會針對可能的答案分佈事先做出一些假設。They do this by making some assumptions beforehand about the likely distribution of the answer. 這種方法的另一個附加好處在於其參數非常少。Another byproduct of this approach is that they have very few parameters. Azure 機器學習中的分類 (雙類別貝氏點機器) 和迴歸 (貝氏線性迴歸) 都各有貝氏演算法。Azure Machine Learning has both Bayesian algorithms for both classification (Two-class Bayes' point machine) and regression (Bayesian linear regression).請注意，這些假設是建立在資料可以用一條直線分割或符合一條直線的情況下。Note that these assume that the data can be split or fit with a straight line.

在歷史記錄中，貝氏點機器是由 Microsoft Research 所開發。On a historical note, Bayes' point machines were developed at Microsoft Research. 它們有一些格外出色的理論做為後盾。They have some exceptionally beautiful theoretical work behind them. 有興趣的學生可參考 JMLR 中的原始文章和 Chris Bishop 深入探討的部落格。The interested student is directed to the original article in JMLR and an insightful blog by Chris Bishop.

專門的演算法Specialized algorithms

如果您有非常特定的目標，那麼您的可能運氣特別好。If you have a very specific goal you may be in luck. 在 Azure Machine Learning 集合中，有演算法專門：Within the Azure Machine Learning collection, there are algorithms that specialize in:

排名預測 (序數迴歸)、rank prediction (ordinal regression),
計算預測 (波氏迴歸)、count prediction (Poisson regression),
異常偵測 (一個以主體元件分析為基礎，一個以支援向量機器為基礎)anomaly detection (one based on principal components analysis and one based on support vector machines)
叢集 (K-means)clustering (K-means)

PCA 型異常偵測

PCA 型異常偵測：大部分的資料均可分成舊式的散佈；而大幅偏離該散佈的點都是可疑之處PCA-based anomaly detection - the vast majority of the data falls into a stereotypical distribution; points deviating dramatically from that distribution are suspect

使用 K-means 分組的資料集

資料集使用 K-means 分為五個叢集A data set is grouped into five clusters using K-means

另外還有整體的一對多多類別分類器，其中會將 N 類別分類問題分成 N-1 雙類別分類問題。There is also an ensemble one-v-all multiclass classifier, which breaks the N-class classification problem into N-1 two-class classification problems. 精確度、定型時間和線性屬性是由使用的雙類別分類器來決定。The accuracy, training time, and linearity properties are determined by the two-class classifiers used.

雙類別分類器結合成三個類別分類器

一組雙類別分類器結合成三個類別分類器A pair of two-class classifiers combine to form a three-class classifier

Azure 機器學習中也可存取 Vowpal Wabbit標題下，一個功能強大的機器學習架構。Azure Machine Learning also includes access to a powerful machine learning framework under the title of Vowpal Wabbit.VW 背離這裡的歸納，因為它可以學習分類和迴歸問題，甚至還能從一些沒有標記的資料中學習。VW defies categorization here, since it can learn both classification and regression problems and can even learn from partially unlabeled data. 您可以將它設定為任意使用一些學習演算法、損失函式以及最佳化演算法。You can configure it to use any one of a number of learning algorithms, loss functions, and optimization algorithms. 它從一開始就是設計為高效率、可平行工作且速度極快。It was designed from the ground up to be efficient, parallel, and extremely fast. 它可以輕鬆處理大到不可思議的特徵集。It handles ridiculously large feature sets with little apparent effort.由創辦 Microsoft Research 的 John Langford 所發起及領導的 VW，可謂原裝賽車演算法領域中的一級方程式項目。Started and led by Microsoft Research's own John Langford, VW is a Formula One entry in a field of stock car algorithms. 並非所有問題都符合 VW，但如果能符合，或許會值得您花時間在它的介面中了解它的學習曲線。Not every problem fits VW, but if yours does, it may be worth your while to climb the learning curve on its interface. 它也有以數種語言提供獨立的開放原始程式碼。It's also available as stand-alone open source code in several languages.