How to extend the Scipy Sparse Matrix returned by sklearn TfIdfVectorizer to hold more features

I am working on the problem of classifying text using scikit-learn classifiers and a text function extractor, especially the TfidfVectorizer class.

The problem is that I have two kinds of functions: the first ones are captured by n-grams received from TfidfVectorizer, and the others are captured by domain-specific functions that I extract from each document. I need to combine both functions in one vector function for each document; for this I need to update the scipy sparse matrix returned by TfidfVectorizer, adding a new dimension to each row containing the domain function for this document. Nevertheless, I can’t find a neat way to do this, neatly I mean not to turn a sparse matrix into a dense one, because it just won’t fit into the memory.

Maybe I don't have a function in scikit-learn or something else, since I'm new to scipy and scikit-learn.

+5
source share
1

, , scipy.sparse.hstack . "FeatureUnion" .

+5

All Articles