Scikit learn: Problems creating custom CountVectorizer and ChiSquare

I have the following code (based on the samples here ), but it does not work:

[...]
def my_analyzer(s):
    return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)

ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]

When calling fit_transform, the following error is given:

AttributeError: 'function' object has no attribute 'analyze'

According to the documentation, CountVectorizer should be established as follows: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".

My actual version of scikit-learn is 0.10.

+3
source share
1 answer

0.11 ( ), . 0.10, tokenizer, analyzer , analyze:

class MyAnalyzer(object):
    @staticmethod
    def analyze(s):
        return s.split()

v = CountVectorizer(analyzer=MyAnalyzer())

http://scikit-learn.org/dev - ( ), http://scikit-learn/stable .

+3

All Articles