Scikit-learn pipe: grid search by transformer parameters for data generation

Question

Scikit-learn pipe: grid search by transformer parameters for data generation

I would like to use the first step of the scikit-learn pipeline to create a toy dataset to evaluate the effectiveness of my analysis. The as-simple-as-it-gets-example solution I came up with is as follows:

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.base import TransformerMixin
from sklearn import cluster

class FeatureGenerator(TransformerMixin):

    def __init__(self, num_features=None):
        self.num_features = num_features

    def fit(self, X, y=None, **fit_params):
        return self

    def transform(self, X, **transform_params):
        return np.array(
            range(self.num_features*self.num_features)
        ).reshape(self.num_features,
                  self.num_features)

    def get_params(self, deep=True):
        return {"num_features": self.num_features}

    def set_params(self, **parameters):
        self.num_features = parameters["num_features"]
        return self

This transformer in action will be e. d. to be called like this:

pipeline = Pipeline([
    ('pick_features', FeatureGenerator(100)),
    ('kmeans', cluster.KMeans())
])

pipeline = pipeline.fit(None)
classes = pipeline.predict(None)
print classes

It becomes difficult for me as soon as I try to draw a grid along this pipeline:

parameter_sets = {
    'pick_features__num_features' : [10,20,30],
    'kmeans__n_clusters' : [2,3,4]
}

pipeline = Pipeline([
    ('pick_features', FeatureGenerator()),
    ('kmeans', cluster.KMeans())
])

g_search_estimator = GridSearchCV(pipeline, parameter_sets)

g_search_estimator.fit(None,None)

A grid search involves selections and labels as input and is not as strong as a pipeline that does not complain about the Nonequality of the input parameter:

TypeError: Expected sequence or array-like, got <type 'NoneType'>

This makes sense because a grid search should divide the data set into different cv sections.

, , . .

: X y GridSearch ? , GridSearch ( )? - GridSearchCV ?

+4

python scikit-learn cross-validation grid-search

Milla Well 27 . '15 14:32

1

ldirer · Answer 1 · 2015-07-27T20:42:10+0000

, :

g_search_estimator.fit([1., 1., 1.],[1., 0., 0.])
g_search_estimator.best_params_

:

[tons of int64 to float64 conversion warnings]
{'kmeans__n_clusters': 4, 'pick_features__num_features': 10}

, 3 , ( ).

, , - , GridSearchCV, , - . "" :

Xs ys GridSearch ?

EDIT:
, , : g_search_estimator.fit([1., 1., 1.], [1., 0., 0.]) g_search_estimator.fit([1., 1., 1.], None) g_search_estimator.fit([1., 1., 1.])

, y .

, : scoring=None GridSearchCV ( , ), . , . KMeans .
, y .

, :

Xs GridSearch

"" X - , . random_X. , ( ), y.
- , y, X. :

g_search_estimator.fit(random_X, y, scoring=my_scoring_function)

. y, , , .

Scikit-learn pipe: grid search by transformer parameters for data generation

More articles: