Libsvm - Cross Validation Similar to Tag Ratio

Question

Libsvm - Cross Validation Similar to Tag Ratio

I use the Python interface for libsvm, and I notice that after choosing the best Cand gammaparameters (RBF kernel) using a grid search, when I train the model and cross-check it (5 times, if relevant), then the accuracy I I get the same as the label ratio in my training dataset.

I have 3947 samples, and 2898 of them have a label of -1, and the rest have a label of 1. Thus, 73.4229% of the samples.

And when I train the model and cross-check it 5 times, this is what I get -

optimization finished, #iter = 1529
nu = 0.531517 obj = -209.738688,
rho = 0.997250 nSV = 1847, nBSV = 1534
Total nSV = 1847
Cross Validation Accuracy = 73.4229%

Does this mean that SVM does not take these functions into account? Or is this fault data here? Are they both connected at all? I just could not get past the number 73.4229. In addition, the number of supporting vectors should be much smaller than the size of the data set, but in this case it is not.

In general, what does this mean when the accuracy of checking cross-references matches the label relationship in the dataset?

+5

python svm libsvm

Siddhant Aug 31 '12 at 10:36

source share

1 answer

user1871307 · Accepted Answer · 2012-12-03T11:15:00+0000

Your data set is unbalanced, which means that a large percentage belongs to one class. This leads to what is called the default classifier or the majority class class, where high accuracy is achieved by simply classifying everything as part of the majority class. Therefore, you are correct that you do not take these functions into account because of the data.

libsvm README , . : https://stats.stackexchange.com/questions/20948/best-way-to-handle-unbalanced-multiclass-dataset-with-svm

. 7 .

Libsvm - Cross Validation Similar to Tag Ratio

More articles: