Applying the attribute selection separately in the train and the test can lead to the selection of different attributes, which will make them incompatible. Thus, to make sure that both sets have the same attributes, you need to apply an attribute selection to the entire data set. Once you have selected the most useful attributes, you will divide your data into a train and a test suite.
Regarding the use of value -N, I would use your total number of attributes. This will lead to a ranked list of all your attributes, and you will be able to evaluate various ratings of all attributes yourself. You can then define a clear threshold separating attributes containing any useful information to train the classifier from attributes that add nothing. Then I set this threshold using the option -T.
Sicco source
share