Libsvm学习

Z2ty6osc12zs6c 2018-05-20

展开全文

Libsvm的使用

tools

subset - subset.py

抽取子数据:
Training large data is time consuming. Sometimes one should work on a smaller subset first. The python script subset.py randomly selects a specified number of samples.

保证分布和原分布是一样的:
For classification data, we provide a stratified selection to ensure the same class distribution in the subset.

使用方法

subset.py [options] dataset number [output1] [output2]1

This script selects a subset of the given data set.

options:
-s method : method of selection (default 0)
0 – stratified selection (classification only)
1 – random selection

output1 : the subset (optional)
output2 : the rest of data (optional)

Parameter Selection Tools - grid.py

Cross Validation：
grid.py is a parameter selection tool for C-SVM classification using the RBF (radial basis function) kernel.

使用方法

grid.py [grid_options] [svm_options] dataset1

grid_options :
-log2c {begin,end,step | “null”} :
set the range of c (default -5,15,2)
begin,end,step – c_range = 2^{begin,…,begin+k*step,…,end}
“null” – do not grid with c
-log2g {begin,end,step | “null”} : set the range of g (default 3,-15,-2)
begin,end,step – g_range = 2^{begin,…,begin+k*step,…,end}
“null” – do not grid with g
-v n : n-fold cross validation (default 5)
-svmtrain pathname : set svm executable path and name
-gnuplot {pathname | “null”} :
pathname – set gnuplot executable path and name
“null” – do not plot
-out {pathname | “null”} : (default dataset.out)
pathname – set output file path and name
“null” – do not output file
-png pathname : set graphic output file path and name (default dataset.png)
-resume [pathname] : resume the grid task using an existing output file (default pathname is dataset.out)

Use this option only if some parameters have been checked for the SAME data.

svm_options : additional options for svm-train

使用示例

Calling grid in Python

In addition to using grid.py as a command-line tool, you can use it as a
Python module.

>>> from grid import *
>>> rate, param = find_parameters('../heart_scale', '-log2c -1,1,1 -log2g -1,1,1')
[local] 0.0 0.0 rate=74.8148 (best c=1.0, g=1.0, rate=74.8148)
[local] 0.0 -1.0 rate=77.037 (best c=1.0, g=0.5, rate=77.037)
.
.
[local] -1.0 -1.0 rate=78.8889 (best c=0.5, g=0.5, rate=78.8889)
.
.
>>> rate
78.8889
>>> param
{'c': 0.5, 'g': 0.5}1
2
3
4
5
6
7
8
9
10
11
12
13

Format Checking Tools - checkdata.py

使用方法

Usage: checkdata.py dataset1

使用示例

> cat bad_data
1 3:1 2:4
> python checkdata.py bad_data
line 1: feature indices must be in an ascending order, previous/current features 3:1 2:4
Found 1 lines with error.1
2
3
4
5

libsvm’s main function

svm-train

使用方法

svm-train [options] training_set_file [model_file]1

options:
-s svm_type : set type of SVM (default 0)
0 – C-SVC (multi-class classification)
1 – nu-SVC (multi-class classification)
2 – one-class SVM
3 – epsilon-SVR (regression)
4 – nu-SVR (regression)
-t kernel_type : set type of kernel function (default 2)
0 – linear: u’*v
1 – polynomial: (gamma*u’*v + coef0)^degree
2 – radial basis function: exp(-gamma*|u-v|^2)
3 – sigmoid: tanh(gamma*u’*v + coef0)
4 – precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)

option -v randomly splits the data into n parts and calculates cross
validation accuracy/mean squared error on them.

使用示例

svm-predict

使用方法

svm-predict [options] test_file model_file output_file1

options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); for one-class SVM only 0 is supported

使用实例

svm-scale

使用方法

svm-scale [options] data_filename1

options:
-l lower : x scaling lower limit (default -1)
-u upper : x scaling upper limit (default +1)
-y y_lower y_upper : y scaling limits (default: no y scaling)
-s save_filename : save scaling parameters to save_filename
-r restore_filename : restore scaling parameters from restore_filename

使用示例

Tips for practical use

- Scale your data. For example, scale each attribute to [0,1] or [-1,+1].
- For C-SVC, consider using the model selection tool in the tools directory.
- nu in nu-SVC/one-class-SVM/nu-SVR approximates the fraction of training
  errors and support vectors.
- If data for classification are unbalanced (e.g. many positive and
  few negative), try different penalty parameters C by -wi (see
  examples below).
- Specify larger cache size (i.e., larger -m) for huge problems.