分享

Xu Cui ? SVM (support vector machine) with libsvm

 dzh1121 2011-01-16

I am learning svm lately and tried libsvm. It’s a good package.

Linear kernel example (support vectors are in circles):

Linear

Linear

Nonlinear example (radial basis)

Nonlinear, circle

Nonlinear, circle

Nonlinear, two circles

Nonlinear, two circles

Nonlinear, quadrant

Nonlinear, quadrant

3-class example

Linear, 3 classes

Linear, 3 classes

Basic procedure to use libsvm:

  1. Preprocess your data. This including normalization (make all values between 0 and 1) and transform non-numeric values to numeric. You can use the following code to normalize (from libsvm webpage):
    (data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
  2. Find optimal parameter values. For linear kernel, you have 1 parameter C (penalize parameter). For commonly used radial kernel, you have two parameters (C and gamma). Different parameter values will yield different accuracy rate. To avoid over fitting, you use n-fold cross validation. For example, a 5-fold cross validation is to use 4/5 of the data to train the svm model and the rest 1/5 to test. The option -c, -g, and -v controls parameter C, gamma and n-fold cross validation. A piece of code from libsvm website is:
    bestcv = 0;
    for log2c = -1:3,
    for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(heart_scale_label, heart_scale_inst, cmd);
    if (cv >= bestcv),
    bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
    end
    end
  3. You may have to run the above code several times with different range of parameter values to find the optimal values. For example, you might want to start from a bigger range with coarse resolution; then fine tune to smaller regions with higher resolution.
  4. After finding the optimal parameter values, use all data to train your model with your optimal parameter values.
    cmd = ['-t 2 -c ', num2str(bestc), ' -g ', num2str(bestg)];
    model = svmtrain(l, d, cmd);
  5. If you have new data, you may use this model to classify the new data.
    [predicted_label, accuracy, decision_values] = svmpredict(zeros(size(dd,1),1), dd, model);

Commonly used options

  • -v n: n-fold cross validation
  • -t 0: linear kernel
  • -t 2: radial basis (default)
  • -s 0: SVC type = C-SVC
  • -C: C parameter value, default 1
  • -g: gamma parameter value

libsvm performance

I tested on different data size and record the time spent (in second).

Computer: Processor: 2×2.66G, memory: 12G, OS: Windows XP installed in VMWare in Mac OS 10.5

data size    # features    svmtrain    svmpredict
100    2    0.00    0.00
100    6    0.00    0.00
100    10    0.00    0.00
100    20    0.00    0.00
100    50    0.01    0.00
100    100    0.02    0.01
500    2    0.02    0.01
500    6    0.03    0.02
500    10    0.05    0.03
500    20    0.08    0.03
500    50    0.46    0.07
500    100    0.56    0.12
1000    2    0.07    0.04
1000    6    0.10    0.06
1000    10    0.15    0.10
1000    20    0.36    0.14
1000    50    1.09    0.30
1000    100    3.07    0.50

It’s fairly fast.

Resources:

MatLab code to generate the plots above:cuixu_test_svm1

SVM basics: http://en./wiki/Support_vector_machine

Download libsvm for matlab at: http://www.csie./~cjlin/libsvm/#matlab

The meaning of libsvm output is at: http://www.csie./~cjlin/libsvm/faq.html#f804

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多