Using `bbo` for auto-tuning of Machine Learning models

In this example, we will use bbo as a stand-alone black-box optimization library to find the optimum hyper-parameters (a Support Vector Machine and a RandomForestClassifier) for two Machine Learning models, on the breast_cancer dataset.

Loading the dataset

The breast_cancer dataset is a classic and very easy binary classification dataset. We will use it as an example for tuning the models. After loading the data, we will use the train_test_split function to divide the dataset into a train dataset (that we will train the model on) and a test dataset (where we will evaluate the model on to get its accuracy on unseen data). As bbo is a minimizer, we will use the opposite value of the accuracy as a training target.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(data["data"], data["target"], test_size=0.33, random_state=42)

Design black-box class

To use bbo, we need to design two classes that act as black-box that can be tuned: they have a compute method that takes as input a parametrization, trains the model using this parametrization on the train data, evaluates the model on the test data and return the opposite of the accuracy.

We will tune two sklearn models: - SVM: we will look for the optimum value of: * C * kernel * degree * gamma * coef0 * shrinking * probability * tol

Random forest: we will look for the optimum of:
- n_estimators
- criterion
- max_depth
- min_samples_split
- min_weight_fraction_leaf
- max_features

For each model, we will define the class with the compute method and the corresponding parametric_grid (i.e the values that can be tested by the optimizer).

from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import numpy as np

class OptimizableSVM:
    """
    Optimizable class to find the optimal parametrization of a SVM model.
    """
    def compute(self, parameters):
        """
        Outputs the cross validation score for the model.
        """
        parameters_dict = {"C": float(parameters[0]), "kernel": str(parameters[1]), "degree": int(parameters[2]), "gamma": str(parameters[3]), "coef0": float(parameters[4]), "shrinking": bool(parameters[5]), "probability": bool(parameters[6]), "tol": float(parameters[7])}
        svc = SVC(**parameters_dict)
        svc.fit(X_train, y_train)
        return -svc.score(X_test, y_test)

# Define parametric grid
c = np.arange(1, 20, 1)
kernel = np.array(["linear", "poly", "rbf", "sigmoid"])
degree = np.arange(1, 4, 1)
gamma = np.array(['scale', 'auto'])
coef = np.arange(0, 1, 0.01)
shrinking = np.array([True, False])
probability = np.array([True, False])
tol = np.arange(0.01, 0.05, 0.01)

svm_parametric_grid = np.array([c, kernel, degree, gamma, coef, shrinking, probability, tol], dtype=object)

class OptimizableRandomForest:
    """
    Model that will act as a black-box.
    """
    def compute(self, parameters):
        parameters_dict = {"n_estimators": int(parameters[0]), "criterion": str(parameters[1]), "max_depth": float(parameters[2]), "min_samples_split": int(parameters[3]), "min_weight_fraction_leaf":float(parameters[4]), "max_features": str(parameters[5])}
        random_forest = RandomForestClassifier(**parameters_dict)
        random_forest.fit(X_train, y_train)
        return -random_forest.score(X_test, y_test)

n_estimators = np.arange(50, 200, 20)
criterion = np.array(["gini", "entropy"])
max_depth = np.arange(5, 10, 1)
min_samples_split = np.arange(2, 10, 1)
min_weight_fraction_leaf = np.arange(0, 0.4, 0.1)
max_features = np.array(["auto", "sqrt", "log2"])

rf_parametric_grid = np.array([n_estimators, criterion, max_depth, min_samples_split, min_weight_fraction_leaf, max_features], dtype=object)

Setup optimizer

We then need to set-up the optimizer. As they are currently the only heuristic that support qualitative variable, we will use genetic algorithms, with single point crossovers and a tournament pick. The mutation rate is set to 0.3. We will have 5 initial data points, for a maximum of 20 iterations.

from bbo.optimizer import BBOptimizer
from bbo.heuristics.genetic_algorithm.mutations import mutate_chromosome_to_neighbor
from bbo.heuristics.genetic_algorithm.selections import tournament_pick 
from bbo.heuristics.genetic_algorithm.crossover import single_point_crossover

svm_model = OptimizableSVM()
svm_bb = BBOptimizer(black_box = svm_model, # the black-box to optimize
                     parameter_space = svm_parametric_grid, # the grid on which to perform the optimization
                     initial_sample_size=10,# the initial size of the sample
                     heuristic="genetic_algorithm", # the name of the heuristics to use
                     max_iteration=20, # the maximum number of iterations
                     time_out=200, # in seconds, the maximum elapsed time
                     # the following arguments are specific to genetic algorithms:
                     mutation_method= mutate_chromosome_to_neighbor, # the mutation function
                     mutation_rate=0.3,#the mutation rate
                     crossover_method=single_point_crossover, # the crossover function
                     selection_method=tournament_pick # the selection function
                     )

svm_bb.optimize()

2020-11-09 08:30:39.667 | DEBUG    | bbo.optimizer:_initialize:487 - Initializing parameter space
2020-11-09 08:30:39.677 | DEBUG    | bbo.optimizer:_initialize:488 - Parameter space given by user: [array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])
 array(['linear', 'poly', 'rbf', 'sigmoid'], dtype='<U7') array([1, 2, 3])
 array(['scale', 'auto'], dtype='<U5')
 array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99])
 array([ True, False]) array([ True, False])
 array([0.01, 0.02, 0.03, 0.04])]
2020-11-09 08:30:39.680 | DEBUG    | bbo.initial_parametrizations:hybrid_lhs_uniform_sampling:106 - Selected number of initial parameters: 10
2020-11-09 08:30:39.686 | DEBUG    | bbo.optimizer:_initialize:494 - Selected initial parameter space: {initial_parameters}
2020-11-09 08:30:39.688 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [7 'rbf' 1 'auto' 0.86 True False 0.01]
2020-11-09 08:30:39.737 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.6436170212765957
2020-11-09 08:30:39.740 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [19 'poly' 2 'scale' 0.34 False True 0.02]
2020-11-09 08:30:39.763 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:30:39.766 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['6' 'sigmoid' '2' 'auto' '0.7000000000000001' 'True' 'False' '0.03']
2020-11-09 08:30:39.824 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.6436170212765957
2020-11-09 08:30:39.827 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['3' 'poly' '3' 'scale' '0.9400000000000001' 'True' 'False' '0.03']
2020-11-09 08:30:39.848 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:30:39.851 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['12' 'poly' '2' 'scale' '0.09' 'True' 'False' '0.01']
2020-11-09 08:30:39.879 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:30:39.881 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['2' 'linear' '1' 'scale' '0.9500000000000001' 'True' 'False' '0.02']
2020-11-09 08:30:52.758 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:30:52.761 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
2020-11-09 08:30:58.003 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:30:58.005 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
2020-11-09 08:30:58.035 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:30:58.037 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['4' 'sigmoid' '1' 'auto' '0.6' 'True' 'False' '0.01']
2020-11-09 08:30:58.123 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.6436170212765957
2020-11-09 08:30:58.125 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['1' 'sigmoid' '1' 'scale' '0.31' 'False' 'True' '0.01']
2020-11-09 08:30:58.282 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.5638297872340425
2020-11-09 08:30:58.308 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
2020-11-09 08:31:00.907 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:00.910 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['5' 'rbf' '1' 'auto' '0.59' 'False' 'False' '0.02']
2020-11-09 08:31:01.121 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.6436170212765957
2020-11-09 08:31:01.125 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
2020-11-09 08:31:04.498 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:31:04.498 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '2' 'scale' '0.19' 'True' 'False' '0.03']
2020-11-09 08:31:04.531 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:04.534 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
2020-11-09 08:31:04.571 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:31:04.574 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
2020-11-09 08:31:07.742 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:07.745 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
2020-11-09 08:31:11.671 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:11.680 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['6' 'poly' '1' 'scale' '0.59' 'True' 'False' '0.01']
2020-11-09 08:31:11.703 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:11.708 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.58' 'True' 'False' '0.02']
2020-11-09 08:31:14.617 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:14.621 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.18' 'True' 'False' '0.03']
2020-11-09 08:31:16.683 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:31:16.686 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01']
2020-11-09 08:31:22.471 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:22.474 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['5' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
2020-11-09 08:31:25.239 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:25.243 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.5700000000000001' 'True' 'True' '0.02']
2020-11-09 08:31:28.550 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:31:28.553 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.19' 'False' 'False' '0.01']
2020-11-09 08:31:32.507 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:31:32.510 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '1' 'auto' '0.58' 'True' 'True' '0.03']
2020-11-09 08:31:35.890 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:31:35.895 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'linear' '1' 'scale' '0.19' 'False' 'False' '0.02']
2020-11-09 08:32:05.377 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:32:05.383 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.03']
2020-11-09 08:32:08.375 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:32:08.383 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['9' 'poly' '2' 'auto' '0.56' 'True' 'False' '0.03']
2020-11-09 08:36:33.061 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9361702127659575
2020-11-09 08:36:33.066 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['5' 'poly' '2' 'scale' '0.19' 'True' 'False' '0.03']
2020-11-09 08:36:33.083 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:33.083 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['10' 'poly' '1' 'scale' '0.18' 'True' 'False' '0.03']
2020-11-09 08:36:33.109 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383

array(['9', 'poly', '1', 'auto', '0.19', 'False', 'False', '0.01'],
      dtype=object)

rf_model = OptimizableRandomForest()
rf_bb = BBOptimizer(black_box = rf_model, # the black-box to optimize
                     parameter_space = rf_parametric_grid, # the grid on which to perform the optimization
                     initial_sample_size=10,# the initial size of the sample
                     heuristic="genetic_algorithm", # the name of the heuristics to use
                     max_iteration=20, # the maximum number of iterations
                     time_out=200, # in seconds, the maximum elapsed time
                     # the following arguments are specific to genetic algorithms:
                     mutation_method= mutate_chromosome_to_neighbor, # the mutation function
                     mutation_rate=0.3,#the mutation rate
                     crossover_method=single_point_crossover, # the crossover function
                     selection_method=tournament_pick # the selection function
                     )

rf_bb.optimize()

2020-11-09 08:36:33.175 | DEBUG    | bbo.optimizer:_initialize:487 - Initializing parameter space
2020-11-09 08:36:33.178 | DEBUG    | bbo.optimizer:_initialize:488 - Parameter space given by user: [array([ 50,  70,  90, 110, 130, 150, 170, 190])
 array(['gini', 'entropy'], dtype='<U7') array([5, 6, 7, 8, 9])
 array([2, 3, 4, 5, 6, 7, 8, 9]) array([0. , 0.1, 0.2, 0.3])
 array(['auto', 'sqrt', 'log2'], dtype='<U4')]
2020-11-09 08:36:33.180 | DEBUG    | bbo.initial_parametrizations:hybrid_lhs_uniform_sampling:106 - Selected number of initial parameters: 10
2020-11-09 08:36:33.186 | DEBUG    | bbo.optimizer:_initialize:494 - Selected initial parameter space: {initial_parameters}
2020-11-09 08:36:33.187 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [70 'entropy' 7 3 0.0 'auto']
2020-11-09 08:36:33.660 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:33.663 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'gini' 8 8 0.30000000000000004 'log2']
2020-11-09 08:36:34.366 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:34.369 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' '8' '9' '0.0' 'auto']
2020-11-09 08:36:34.853 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:34.856 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['70' 'gini' '5' '9' '0.0' 'auto']
2020-11-09 08:36:35.209 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:35.212 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['190' 'entropy' '9' '4' '0.2' 'auto']
2020-11-09 08:36:36.183 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:36:36.198 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['170' 'entropy' '7' '8' '0.2' 'log2']
2020-11-09 08:36:36.933 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:36:36.933 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['50' 'entropy' '9' '4' '0.1' 'log2']
2020-11-09 08:36:37.162 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:37.164 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['130' 'entropy' '5' '6' '0.0' 'sqrt']
2020-11-09 08:36:38.029 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:38.031 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['50' 'gini' '8' '8' '0.2' 'auto']
2020-11-09 08:36:38.361 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9361702127659575
2020-11-09 08:36:38.364 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['70' 'entropy' '8' '8' '0.1' 'auto']
2020-11-09 08:36:38.802 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:36:38.805 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['50' 'entropy' '7' '2' '0.1' 'auto']
2020-11-09 08:36:39.129 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9361702127659575
2020-11-09 08:36:39.132 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'gini' 8 '4' '0.1' 'log2']
2020-11-09 08:36:39.973 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:39.977 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' '8' '4' '0.1' 'log2']
2020-11-09 08:36:40.514 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9468085106382979
2020-11-09 08:36:40.519 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['130' 'gini' '8' '5' '0.0' 'log2']
2020-11-09 08:36:41.362 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9521276595744681
2020-11-09 08:36:41.365 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'entropy' 7 3 0.0 'auto']
2020-11-09 08:36:42.383 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:42.387 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'entropy' 7 3 0.30000000000000004 'log2']
2020-11-09 08:36:43.444 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9414893617021277
2020-11-09 08:36:43.447 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'entropy' 7 3 '0.0' 'auto']
2020-11-09 08:36:44.502 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:44.505 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'entropy' '9' '4' '0.1' 'log2']
2020-11-09 08:36:45.462 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:45.467 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'entropy' 7 3 0.0 'auto']
2020-11-09 08:36:46.501 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:46.505 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['150' 'gini' '8' '8' '0.30000000000000004' 'log2']
2020-11-09 08:36:47.175 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:47.179 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'gini' 8 8 0.0 'auto']
2020-11-09 08:36:47.875 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:47.878 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' 8 8 0.0 'auto']
2020-11-09 08:36:48.510 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:48.513 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['70' 'gini' '5' 3 0.0 'auto']
2020-11-09 08:36:48.976 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9680851063829787
2020-11-09 08:36:48.979 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' '6' '4' '0.0' 'auto']
2020-11-09 08:36:49.572 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:49.575 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' 8 3 0.0 'auto']
2020-11-09 08:36:50.174 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:50.179 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' '8' '7' '0.1' 'auto']
2020-11-09 08:36:50.740 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9361702127659575
2020-11-09 08:36:50.744 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['170' 'gini' '8' '2' '0.0' 'auto']
2020-11-09 08:36:51.633 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9574468085106383
2020-11-09 08:36:51.636 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' 7 3 '0.0' 'auto']
2020-11-09 08:36:52.111 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9787234042553191
2020-11-09 08:36:52.116 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization ['90' 'entropy' 7 3 0.0 'auto']
2020-11-09 08:36:52.552 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085
2020-11-09 08:36:52.555 | DEBUG    | bbo.optimizer:_optimization_step:473 - Evaluating performance of parametrization [150 'gini' 7 3 '0.0' 'auto']
2020-11-09 08:36:53.463 | DEBUG    | bbo.optimizer:_optimization_step:476 - Corresponding performance: -0.9627659574468085

array(['90', 'entropy', 7, 3, '0.0', 'auto'], dtype=object)

Read results

The results of the optimization can be read using the summary method on each object. The fitness (i.e. the accuracy as a function of the iteration step) can also be plotted to look at the convergence trajectory.

from matplotlib import pyplot as plt

svm_bb.summarize()

------ Optimization loop summary ------
Number of iterations: 30
Elapsed time: 353.4460906982422
Best parameters: ['9' 'poly' '1' 'auto' '0.19' 'False' 'False' '0.01']
Best fitness value: -0.9627659574468085
Some statistics are yet unavailable for mixedtypes variables.
--- Heuristic specific summary ---
Number of mutations: 10
Family tree:
['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['5' 'rbf' '1' 'auto' '0.59' 'False' 'False' '0.02']
['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02'] + ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['9' 'poly' '2' 'scale' '0.19' 'True' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['9' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['6' 'poly' '1' 'scale' '0.59' 'True' 'False' '0.01']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['9' 'poly' '1' 'auto' '0.58' 'True' 'False' '0.02']
['9' 'poly' '1' 'auto' '0.58' 'True' 'False' '0.02'] + ['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
|_> ['9' 'poly' '1' 'auto' '0.18' 'True' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01']
['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03'] + ['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
|_> ['5' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01'] + ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['9' 'poly' '1' 'auto' '0.5700000000000001' 'True' 'True' '0.02']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01']
|_> ['9' 'poly' '1' 'auto' '0.19' 'False' 'False' '0.01']
['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02'] + ['9' 'poly' '1' 'auto' '0.5700000000000001' 'True' 'True' '0.02']
|_> ['9' 'poly' '1' 'auto' '0.58' 'True' 'True' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'False' 'False' '0.01'] + ['9' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['9' 'linear' '1' 'scale' '0.19' 'False' 'False' '0.02']
['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01'] + ['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03']
|_> ['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.5700000000000001' 'True' 'True' '0.02'] + ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['9' 'poly' '2' 'auto' '0.56' 'True' 'False' '0.03']
['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02'] + ['9' 'poly' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['5' 'poly' '2' 'scale' '0.19' 'True' 'False' '0.03']
['9' 'poly' '1' 'auto' '0.19' 'True' 'False' '0.03'] + ['5' 'rbf' '2' 'scale' '0.19' 'True' 'False' '0.03']
|_> ['10' 'poly' '1' 'scale' '0.18' 'True' 'False' '0.03']
['8' 'poly' '1' 'auto' '0.58' 'False' 'False' '0.01'] + ['5' 'poly' '1' 'auto' '0.59' 'False' 'False' '0.02']
|_> ['8' 'poly' '2' 'auto' '0.58' 'True' 'False' '0.01']
None

plt.plot(-svm_bb.history["fitness"])
plt.title("Accuracy as a function of the number of iterations for SVM model")

Text(0.5, 1.0, 'Accuracy as a function of the number of iterations for SVM model')

rf_bb.summarize()

------ Optimization loop summary ------
Number of iterations: 30
Elapsed time: 20.292949199676514
Best parameters: ['90' 'entropy' 7 3 '0.0' 'auto']
Best fitness value: -0.9787234042553191
Some statistics are yet unavailable for mixedtypes variables.
--- Heuristic specific summary ---
Number of mutations: 7
Family tree:
['50' 'entropy' '9' '4' '0.1' 'log2'] + [70 'entropy' 7 3 0.0 'auto']
|_> ['50' 'entropy' '7' '2' '0.1' 'auto']
[150 'gini' 8 8 0.30000000000000004 'log2'] + ['50' 'entropy' '9' '4' '0.1' 'log2']
|_> [150 'gini' 8 '4' '0.1' 'log2']
[70 'entropy' 7 3 0.0 'auto'] + ['50' 'entropy' '9' '4' '0.1' 'log2']
|_> ['90' 'entropy' '8' '4' '0.1' 'log2']
[150 'gini' 8 8 0.30000000000000004 'log2'] + ['50' 'entropy' '9' '4' '0.1' 'log2']
|_> ['130' 'gini' '8' '5' '0.0' 'log2']
[150 'gini' 8 8 0.30000000000000004 'log2'] + [70 'entropy' 7 3 0.0 'auto']
|_> [150 'entropy' 7 3 0.0 'auto']
[150 'entropy' 7 3 0.0 'auto'] + [150 'gini' 8 8 0.30000000000000004 'log2']
|_> [150 'entropy' 7 3 0.30000000000000004 'log2']
[150 'entropy' 7 3 0.0 'auto'] + ['70' 'gini' '5' '9' '0.0' 'auto']
|_> [150 'entropy' 7 3 '0.0' 'auto']
[150 'entropy' 7 3 0.0 'auto'] + ['50' 'entropy' '9' '4' '0.1' 'log2']
|_> [150 'entropy' '9' '4' '0.1' 'log2']
[150 'entropy' 7 3 '0.0' 'auto'] + [70 'entropy' 7 3 0.0 'auto']
|_> [150 'entropy' 7 3 0.0 'auto']
[150 'entropy' 7 3 0.0 'auto'] + [150 'gini' 8 8 0.30000000000000004 'log2']
|_> ['150' 'gini' '8' '8' '0.30000000000000004' 'log2']
[150 'gini' 8 8 0.30000000000000004 'log2'] + [150 'entropy' 7 3 0.0 'auto']
|_> [150 'gini' 8 8 0.0 'auto']
['90' 'entropy' '8' '9' '0.0' 'auto'] + [150 'gini' 8 8 0.0 'auto']
|_> ['90' 'entropy' 8 8 0.0 'auto']
['70' 'gini' '5' '9' '0.0' 'auto'] + [150 'entropy' 7 3 0.0 'auto']
|_> ['70' 'gini' '5' 3 0.0 'auto']
['90' 'entropy' 8 8 0.0 'auto'] + [150 'entropy' 7 3 0.0 'auto']
|_> ['90' 'entropy' '6' '4' '0.0' 'auto']
['90' 'entropy' 8 8 0.0 'auto'] + ['70' 'gini' '5' 3 0.0 'auto']
|_> ['90' 'entropy' 8 3 0.0 'auto']
['90' 'entropy' 8 8 0.0 'auto'] + [150 'entropy' 7 3 0.0 'auto']
|_> ['90' 'entropy' '8' '7' '0.1' 'auto']
[150 'gini' 8 8 0.0 'auto'] + [150 'entropy' 7 3 '0.0' 'auto']
|_> ['170' 'gini' '8' '2' '0.0' 'auto']
['90' 'entropy' 8 8 0.0 'auto'] + [150 'entropy' 7 3 '0.0' 'auto']
|_> ['90' 'entropy' 7 3 '0.0' 'auto']
['90' 'entropy' 7 3 '0.0' 'auto'] + [150 'entropy' 7 3 0.0 'auto']
|_> ['90' 'entropy' 7 3 0.0 'auto']
[150 'gini' 8 8 0.0 'auto'] + ['90' 'entropy' 7 3 '0.0' 'auto']
|_> [150 'gini' 7 3 '0.0' 'auto']
['90' 'entropy' 7 3 '0.0' 'auto'] + [150 'entropy' 7 3 0.0 'auto']
|_> ['90' 'entropy' 7 3 0.0 'auto']
None

plt.plot(-rf_bb.history["fitness"])
plt.title("Accuracy as a function of the number of iterations for Random Forest model")

Text(0.5, 1.0, 'Accuracy as a function of the number of iterations for Random Forest model')

Using bbo for auto-tuning of Machine Learning models

Loading the dataset

Design black-box class

Setup optimizer

Read results

Using `bbo` for auto-tuning of Machine Learning models