How many functions can be performed with scikit-learn?

I have a csv file of size [66k, 56k] (rows, columns). Its sparse matrix. I know numpy can handle this matrix size. I would like to know, based on each experience, how many functions can scikit-learn algorithms handle with comfort?

+5
source share
2 answers

Depends on the assessment. With these sizes, linear models still work well, while SVMs are likely to train forever (and forget about random forests, as they will not process sparse matrices).

I personally used it LinearSVC, LogisticRegressionand SGDClassifierwith sparse matrices about 300k ร— 3.3 million in size without any problems. See @amueller scikit-learn cheat sheet for the right grade for your job.

Full disclosure: I'm a scikit-learn kernel developer.

+12
source

Some linear model (regression, SGD, Bayes) is likely to be your best choice if you need to train your model often.

Although before launching any models you can try the following

1) Decrease in function. Are there any features in your data that can be easily deleted? For example, if your data is text or ratings, there are many well-known options.

2) . , , .

.

+1

All Articles