Skip to content

TensorCross

Python License Build codecov Documentation

pip install tensorcross

Cross Validation, Grid Search and Random Search for tf.data.Datasets in TensorFlow 2.0+ and Python 3.7+.

Motivation

Currently, there is the tf.keras.wrapper.KerasClassifier/KerasRegressor class, which can be used to transform your tf.keras model into a sklearn estimator. However, this approach is only applicable if your dataset is a numpy.ndarray for your x and y data. If you want to use the new tf.data.Dataset class, you cannot use the sklearn wrappers. This python package aims to help with this use-case.

API

Dataset and TensorFlow Model for the Examples

    import tensorflow as tf

    dataset = tf.data.Dataset.from_tensor_slices(
        (np.array([1, 2, 3]).reshape(-1, 1),  # x
         np.array([-1, -2, -3]).reshape(-1, 1))  # y
    )

    def build_model(
        optimizer: tf.keras.optimizers.Optimizer,
        learning_rate: float
    ) -> tf.keras.models.Model:
        x_input = tf.keras.layers.Input(shape=2)
        y_pred = tf.keras.layers.Dense(units=1)(x_input)
        model = tf.keras.models.Model(inputs=[x_input], outputs=[y_pred])

        opt = optimizer(learning_rate=learning_rate)

        model.compile(
            loss="mse", optimizer=opt, metrics=["mse"]
        )

        return model

The dataset must be a tf.data.Dataset object and you have to define a function/callable that returns a compiled tf.keras.models.Model object. This object will then be trained in e.g. the GridSearch.

GridSearch Example

Assuming you have a tf.data.Dataset object and a build_model function, defined as above. You can run a GridSearch as below:

    from tensorcross.model_selection GridSearch

    train_dataset, val_dataset = dataset_split(
        dataset=dataset,
        split_fraction=(1 / 3)
    )

    param_grid = {
        "optimizer": [
            tf.keras.optimizers.Adam,
            tf.keras.optimizers.RMSprop
        ],
        "learning_rate": [0.001, 0.0001]
    }

    grid_search = GridSearch(
        model_fn=build_model,
        param_grid=param_grid,
        verbose=1,
        num_features=1,
        num_targets=1
    )

    grid_search.fit(
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        epochs=1,
        verbose=1
    )

    grid_search.summary()

This would result in the following console output:

    --------------------------------------------------
    Best score: 1.1800532341003418 using params: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------
    Idx: 0 - Score: 0.2754371166229248 with param: {
        'learning_rate': 0.001, 'optimizer': 'Adam'
    }
    Idx: 1 - Score: 1.1800532341003418 with param: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    Idx: 2 - Score: 0.055416107177734375 with param: {
        learning_rate': 0.0001, 'optimizer': 'Adam'
    }
    Idx: 3 - Score: 0.12417340278625488 with param: {
        'learning_rate': 0.0001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------

GridSearchCV Example

Assuming you have a tf.data.Dataset object and a build_model function, defined as above. You can run a GridSearchCV as below:

    from tensorcross.model_selection GridSearchCV

    param_grid = {
        "optimizer": [
            tf.keras.optimizers.Adam,
            tf.keras.optimizers.RMSprop
        ],
        "learning_rate": [0.001, 0.0001]
    }

    grid_search_cv = GridSearchCV(
        model_fn=build_model,
        param_grid=param_grid,
        n_folds=2,
        verbose=1,
        num_features=1,
        num_targets=1
    )

    grid_search_cv.fit(
        dataset=dataset,
        epochs=1,
        verbose=1
    )

    grid_search_cv.summary()

This would result in the following console output:

    --------------------------------------------------
    Best score: 1.1800532341003418 using params: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------
    Idx: 0 - Score: 0.2754371166229248 with param: {
        'learning_rate': 0.001, 'optimizer': 'Adam'
    }
    Idx: 1 - Score: 1.1800532341003418 with param: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    Idx: 2 - Score: 0.055416107177734375 with param: {
        learning_rate': 0.0001, 'optimizer': 'Adam'
    }
    Idx: 3 - Score: 0.12417340278625488 with param: {
        'learning_rate': 0.0001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------