Skip to content

Comparing Estimators

With the specialized switch optimizer the user can allocate the same computational resource to hyperparameter optimize the pipeline for each learning algorithm in a final switch element, respectively.

The user chooses a hyperparameter optimization strategy, to be applied to optimize the pipeline for each learning algorithm in a distinct hyperparameter space. Thereby each algorithm is optimized with the pipeline with the same settings, so that comparability between the learning algorithms is given.

Another strategy would be to optimize estimator selection within a unified hyperparameter space, e.g. by applying the smac3 optimizer. Within a unified hyperparameter space there is an exploration phase, after which only the most promising algorithms receive further computational time and thus, some learning algorithms receive more computational resources than others. This strategy is capable to auto- matically select the best algorithm, however it is due to the given reasons less suitable for algorithm comparisons.

With the last line of code in this example, the user requests a comparative performance metrics table, that shows the mean validation performances for the best configurations found in each outer fold for each estimator, respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import KFold

from photonai.base import Hyperpipe, PipelineElement, Switch
from photonai.optimization import FloatRange, IntegerRange

my_pipe = Hyperpipe('hp_switch_optimizer',
                    inner_cv=KFold(n_splits=5),
                    outer_cv=KFold(n_splits=3),
                    optimizer='switch',
                    optimizer_params={'name': 'sk_opt', 'n_configurations': 50},
                    metrics=['accuracy', 'precision', 'recall', 'balanced_accuracy'],
                    best_config_metric='accuracy',
                    project_folder='./tmp',
                    verbosity=1)

my_pipe.add(PipelineElement('StandardScaler'))

my_pipe += PipelineElement('PCA',
                           hyperparameters={'n_components': IntegerRange(10, 30)},
                           test_disabled=True)

# set up two learning algorithms in an ensemble
estimator_selection = Switch('estimators')

estimator_selection += PipelineElement('RandomForestClassifier',
                                       criterion='gini',
                                       hyperparameters={'min_samples_split': IntegerRange(2, 4),
                                                        'max_features': ['auto', 'sqrt', 'log2'],
                                                        'bootstrap': [True, False]})
estimator_selection += PipelineElement('SVC',
                                       hyperparameters={'C': FloatRange(0.5, 25),
                                                        'kernel': ['linear', 'rbf']})

my_pipe += estimator_selection

X, y = load_breast_cancer(return_X_y=True)
my_pipe.fit(X, y)

my_pipe.results_handler.get_mean_of_best_validation_configs_per_estimator()