Skip to content

Documentation for Hyperpipe

The PHOTONAI Hyperpipe class creates a custom machine learning pipeline. In addition it defines the relevant analysis’ parameters such as the cross-validation scheme, the hyperparameter optimization strategy, and the performance metrics of interest.

So called PHOTONAI PipelineElements can be added to the Hyperpipe, each of them being a data-processing method or a learning algorithm. By choosing and combining data-processing methods or algorithms, and arranging them with the PHOTONAI classes, simple and complex pipeline architectures can be designed rapidly.

The PHOTONAI Hyperpipe automatizes the nested training, test and hyperparameter optimization procedures.

The Hyperpipe monitors:

  • the nested-cross-validated training and test procedure,
  • communicates with the hyperparameter optimization strategy,
  • streams information between the pipeline elements,
  • logs all results obtained and evaluates the performance,
  • guides the hyperparameter optimization process by a so-called best config metric which is used to select the best performing hyperparameter configuration.

Attributes:

Name Type Description
optimum_pipe PhotonPipeline

An sklearn pipeline object that is fitted to the training data according to the best hyperparameter configuration found. Currently, we don't create an ensemble of all best hyperparameter configs over all folds. We find the best config by comparing the test error across outer folds. The hyperparameter config of the best fold is used as the optimal model and is then trained on the complete set.

best_config dict

Dictionary containing the hyperparameters of the best configuration. Contains the parameters in the sklearn interface of model_name__parameter_name: parameter value.

results MDBHyperpipe

Object containing all information about the for the performed hyperparameter search. Holds the training and test metrics for all outer folds, inner folds and configurations, as well as additional information.

elements list

Contains `all PipelineElement or Hyperpipe objects that are added to the pipeline.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from photonai.base import Hyperpipe, PipelineElement
from photonai.optimization import FloatRange
from sklearn.model_selection import ShuffleSplit, KFold
from sklearn.datasets import load_breast_cancer

hyperpipe = Hyperpipe('myPipe',
                      optimizer='random_grid_search',
                      optimizer_params={'limit_in_minutes': 5},
                      outer_cv=ShuffleSplit(test_size=0.2, n_splits=3),
                      inner_cv=KFold(n_splits=10, shuffle=True),
                      metrics=['accuracy', 'precision', 'recall', "f1_score"],
                      best_config_metric='accuracy',
                      eval_final_performance=True,
                      verbosity=0)

hyperpipe += PipelineElement("SVC", hyperparameters={"C": FloatRange(1, 100)})

X, y = load_breast_cancer(return_X_y=True)
hyperpipe.fit(X, y)

__init__(self, name, inner_cv=None, outer_cv=None, optimizer='grid_search', optimizer_params=None, metrics=None, best_config_metric=None, eval_final_performance=None, use_test_set=True, test_size=0.2, project_folder='', calculate_metrics_per_fold=True, calculate_metrics_across_folds=False, random_seed=None, verbosity=0, learning_curves=False, learning_curves_cut=None, output_settings=None, performance_constraints=None, permutation_id=None, cache_folder=None, nr_of_processes=1, allow_multidim_targets=False) special

Initialize the object.

Parameters:

Name Type Description Default
name Optional[str]

Name of hyperpipe instance.

required
inner_cv Union[sklearn.model_selection._split.BaseCrossValidator, sklearn.model_selection._split.BaseShuffleSplit, sklearn.model_selection._split._RepeatedSplits]

Cross validation strategy to test hyperparameter configurations, generates the validation set.

None
outer_cv Union[sklearn.model_selection._split.BaseCrossValidator, sklearn.model_selection._split.BaseShuffleSplit, sklearn.model_selection._split._RepeatedSplits]

Cross validation strategy to use for the hyperparameter search itself, generates the test set.

None
optimizer str

Hyperparameter optimization algorithm.

  • In case a string literal is given:

    • "grid_search": Optimizer that iteratively tests all possible hyperparameter combinations.
    • "random_grid_search": A variation of the grid search optimization that randomly picks hyperparameter combinations from all possible hyperparameter combinations.
    • "sk_opt": Scikit-Optimize based on theories of Baysian optimization.
    • "random_search": randomly chooses hyperparameter from grid-free domain.
    • "smac": SMAC based on theories of Baysian optimization.
    • "nevergrad": Nevergrad based on theories of evolutionary learning.
  • In case an object is given: expects the object to have the following methods:

    • ask: returns a hyperparameter configuration in form of an dictionary containing key->value pairs in the sklearn parameter encoding model_name__parameter_name: parameter_value
    • prepare: takes a list of pipeline elements and their particular hyperparameters to prepare the hyperparameter space
    • tell: gets a tested config and the respective performance in order to calculate a smart next configuration to process
'grid_search'
metrics Optional[List[Union[Callable, keras.metrics.Metric, Type[keras.metrics.Metric], str]]]

Metrics that should be calculated for both training, validation and test set Use the preimported metrics from sklearn and photonai, or register your own

  • Metrics for classification:
    • accuracy: sklearn.metrics.accuracy_score
    • matthews_corrcoef: sklearn.metrics.matthews_corrcoef
    • confusion_matrix: sklearn.metrics.confusion_matrix,
    • f1_score: sklearn.metrics.f1_score
    • hamming_loss: sklearn.metrics.hamming_loss
    • log_loss: sklearn.metrics.log_loss
    • precision: sklearn.metrics.precision_score
    • recall: sklearn.metrics.recall_score
  • Metrics for regression:
    • mean_squared_error: sklearn.metrics.mean_squared_error
    • mean_absolute_error: sklearn.metrics.mean_absolute_error
    • explained_variance: sklearn.metrics.explained_variance_score
    • r2: sklearn.metrics.r2_score
  • Other metrics
    • pearson_correlation: photon_core.framework.Metrics.pearson_correlation
    • variance_explained: photon_core.framework.Metrics.variance_explained_score
    • categorical_accuracy: photon_core.framework.Metrics.categorical_accuracy_score
None
best_config_metric Union[Callable, keras.metrics.Metric, Type[keras.metrics.Metric], str]

The metric that should be maximized or minimized in order to choose the best hyperparameter configuration.

None
eval_final_performance bool

DEPRECATED! Use "use_test_set" instead!

None
use_test_set bool

If the metrics should be calculated for the test set, otherwise the test set is seperated but not used.

True
project_folder str

The output folder in which all files generated by the PHOTONAI project are saved to.

''
test_size float

The amount of the data that should be left out if no outer_cv is given and eval_final_perfomance is set to True.

0.2
calculate_metrics_per_fold bool

If True, the metrics are calculated for each inner_fold. If False, calculate_metrics_across_folds must be True.

True
calculate_metrics_across_folds bool

If True, the metrics are calculated across all inner_fold. If False, calculate_metrics_per_fold must be True.

False
random_seed int

Random Seed.

None
verbosity int

The level of verbosity, 0 is least talkative and gives only warn and error, 1 gives adds info and 2 adds debug.

0
learning_curves bool

Enables larning curve procedure. Evaluate learning process over different sizes of input. Depends on learning_curves_cut.

False
learning_curves_cut FloatRange

The tested relativ cuts for data size.

None
performance_constraints list

Objects that indicate whether a configuration should be tested further. For example, the inner fold of a config does not perform better than the dummy performance.

None
permutation_id str

String identifier for permutation tests.

None
cache_folder str

Folder path for multi-processing.

None
nr_of_processes int

Determined the amount of simultaneous calculation of outer_folds.

1
allow_multidim_targets bool

Allows multidimensional targets.

False
Source code in photonai/base/hyperpipe.py
def __init__(self, name: Optional[str],
             inner_cv: Union[BaseCrossValidator, BaseShuffleSplit, _RepeatedSplits] = None,
             outer_cv: Union[BaseCrossValidator, BaseShuffleSplit, _RepeatedSplits, None] = None,
             optimizer: str = 'grid_search',
             optimizer_params: dict = None,
             metrics: Optional[List[Union[Scorer.Metric_Type, str]]] = None,
             best_config_metric: Optional[Union[Scorer.Metric_Type, str]] = None,
             eval_final_performance: bool = None,
             use_test_set: bool = True,
             test_size: float = 0.2,
             project_folder: str = '',
             calculate_metrics_per_fold: bool = True,
             calculate_metrics_across_folds: bool = False,
             random_seed: int = None,
             verbosity: int = 0,
             learning_curves: bool = False,
             learning_curves_cut: FloatRange = None,
             output_settings: OutputSettings = None,
             performance_constraints: list = None,
             permutation_id: str = None,
             cache_folder: str = None,
             nr_of_processes: int = 1,
             allow_multidim_targets: bool = False):
    """
    Initialize the object.

    Parameters:
        name:
            Name of hyperpipe instance.

        inner_cv:
            Cross validation strategy to test hyperparameter configurations, generates the validation set.

        outer_cv:
            Cross validation strategy to use for the hyperparameter search itself, generates the test set.

        optimizer:
            Hyperparameter optimization algorithm.

            - In case a string literal is given:
                - "grid_search": Optimizer that iteratively tests all possible hyperparameter combinations.
                - "random_grid_search": A variation of the grid search optimization that randomly picks
                    hyperparameter combinations from all possible hyperparameter combinations.
                - "sk_opt": Scikit-Optimize based on theories of Baysian optimization.
                - "random_search": randomly chooses hyperparameter from grid-free domain.
                - "smac": SMAC based on theories of Baysian optimization.
                - "nevergrad": Nevergrad based on theories of evolutionary learning.

            - In case an object is given:
                expects the object to have the following methods:
                - `ask`: returns a hyperparameter configuration in form of an dictionary containing
                    key->value pairs in the sklearn parameter encoding `model_name__parameter_name: parameter_value`
                - `prepare`: takes a list of pipeline elements and their particular hyperparameters to prepare the
                             hyperparameter space
                - `tell`: gets a tested config and the respective performance in order to
                    calculate a smart next configuration to process

        metrics:
            Metrics that should be calculated for both training, validation and test set
            Use the preimported metrics from sklearn and photonai, or register your own

            - Metrics for `classification`:
                - `accuracy`: sklearn.metrics.accuracy_score
                - `matthews_corrcoef`: sklearn.metrics.matthews_corrcoef
                - `confusion_matrix`: sklearn.metrics.confusion_matrix,
                - `f1_score`: sklearn.metrics.f1_score
                - `hamming_loss`: sklearn.metrics.hamming_loss
                - `log_loss`: sklearn.metrics.log_loss
                - `precision`: sklearn.metrics.precision_score
                - `recall`: sklearn.metrics.recall_score
            - Metrics for `regression`:
                - `mean_squared_error`: sklearn.metrics.mean_squared_error
                - `mean_absolute_error`: sklearn.metrics.mean_absolute_error
                - `explained_variance`: sklearn.metrics.explained_variance_score
                - `r2`: sklearn.metrics.r2_score
            - Other metrics
                - `pearson_correlation`: photon_core.framework.Metrics.pearson_correlation
                - `variance_explained`:  photon_core.framework.Metrics.variance_explained_score
                - `categorical_accuracy`: photon_core.framework.Metrics.categorical_accuracy_score

        best_config_metric:
            The metric that should be maximized or minimized in order to choose
            the best hyperparameter configuration.

        eval_final_performance [bool, default=True]:
            DEPRECATED! Use "use_test_set" instead!

        use_test_set [bool, default=True]:
            If the metrics should be calculated for the test set,
            otherwise the test set is seperated but not used.

        project_folder:
            The output folder in which all files generated by the
            PHOTONAI project are saved to.

        test_size:
            The amount of the data that should be left out if no outer_cv is given and
            eval_final_perfomance is set to True.

        calculate_metrics_per_fold:
            If True, the metrics are calculated for each inner_fold.
            If False, calculate_metrics_across_folds must be True.

        calculate_metrics_across_folds:
            If True, the metrics are calculated across all inner_fold.
            If False, calculate_metrics_per_fold must be True.

        random_seed:
            Random Seed.

        verbosity:
            The level of verbosity, 0 is least talkative and
            gives only warn and error, 1 gives adds info and 2 adds debug.

        learning_curves:
            Enables larning curve procedure. Evaluate learning process over
            different sizes of input. Depends on learning_curves_cut.

        learning_curves_cut:
            The tested relativ cuts for data size.

        performance_constraints:
            Objects that indicate whether a configuration should
            be tested further. For example, the inner fold of a config
            does not perform better than the dummy performance.

        permutation_id:
            String identifier for permutation tests.

        cache_folder:
            Folder path for multi-processing.

        nr_of_processes:
            Determined the amount of simultaneous calculation of outer_folds.

        allow_multidim_targets:
            Allows multidimensional targets.

    """

    self.name = re.sub(r'\W+', '', name)

    if eval_final_performance is not None:
        depr_warning = "Hyperpipe parameter eval_final_performance is deprecated. It's called use_test_set now."
        use_test_set = eval_final_performance
        logger.warning(depr_warning)
        raise DeprecationWarning(depr_warning)

    # ====================== Cross Validation ===========================
    # check if both calculate_metrics_per_folds and calculate_metrics_across_folds is False
    if not calculate_metrics_across_folds and not calculate_metrics_per_fold:
        raise NotImplementedError("Apparently, you've set calculate_metrics_across_folds=False and "
                                  "calculate_metrics_per_fold=False. In this case PHOTONAI does not calculate "
                                  "any metrics which doesn't make any sense. Set at least one to True.")
    if inner_cv is None:
        msg = "PHOTONAI requires an inner_cv split. Please enable inner cross-validation. " \
              "As exmaple: Hyperpipe(...inner_cv = KFold(n_splits = 3), ...). " \
              "Ensure you import the cross_validation object first."
        logger.error(msg)
        raise AttributeError(msg)

    # use default cut 'FloatRange(0, 1, 'range', 0.2)' if learning_curves = True but learning_curves_cut is None
    if learning_curves and learning_curves_cut is None:
        learning_curves_cut = FloatRange(0, 1, 'range', 0.2)
    elif not learning_curves and learning_curves_cut is not None:
        learning_curves_cut = None

    self.cross_validation = Hyperpipe.CrossValidation(inner_cv=inner_cv,
                                                      outer_cv=outer_cv,
                                                      use_test_set=use_test_set,
                                                      test_size=test_size,
                                                      calculate_metrics_per_fold=calculate_metrics_per_fold,
                                                      calculate_metrics_across_folds=calculate_metrics_across_folds,
                                                      learning_curves=learning_curves,
                                                      learning_curves_cut=learning_curves_cut)

    # ====================== Data ===========================
    self.data = Hyperpipe.Data()

    # ====================== Output Folder and Log File Management ===========================
    if output_settings:
        self.output_settings = output_settings
    else:
        self.output_settings = OutputSettings()

    if project_folder == '':
        self.project_folder = os.getcwd()
    else:
        self.project_folder = project_folder

    self.output_settings.set_project_folder(self.project_folder)

    # update output options to add pipe name and timestamp to results folder
    self._verbosity = 0
    self.verbosity = verbosity
    self.output_settings.set_log_file()

    # ====================== Result Logging ===========================
    self.results_handler = None
    self.results = None
    self.best_config = None

    # ====================== Pipeline ===========================
    self.elements = []
    self._pipe = None
    self.optimum_pipe = None
    self.preprocessing = None

    # ====================== Performance Optimization ===========================
    if optimizer_params is None:
        optimizer_params = {}
    self.optimization = Optimization(metrics=metrics,
                                     best_config_metric=best_config_metric,
                                     optimizer_input=optimizer,
                                     optimizer_params=optimizer_params,
                                     performance_constraints=performance_constraints)

    # self.optimization.sanity_check_metrics()

    # ====================== Caching and Parallelization ===========================
    self.nr_of_processes = nr_of_processes
    if cache_folder:
        self.cache_folder = os.path.join(cache_folder, self.name)
    else:
        self.cache_folder = None

    # ====================== Internals ===========================

    self.permutation_id = permutation_id
    self.allow_multidim_targets = allow_multidim_targets
    self.is_final_fit = False

    # ====================== Random Seed ===========================
    self.random_state = random_seed
    if random_seed is not None:
        import random
        random.seed(random_seed)

add(self, pipe_element)

Add an element to the machine learning pipeline. Returns self.

Parameters:

Name Type Description Default
pipe_element PipelineElement

The object to add to the machine learning pipeline, being either a transformer or an estimator.

required
Source code in photonai/base/hyperpipe.py
def add(self, pipe_element: PipelineElement):
    """
    Add an element to the machine learning pipeline.
    Returns self.

    Parameters:
        pipe_element:
            The object to add to the machine learning pipeline,
            being either a transformer or an estimator.

    """
    self.__iadd__(pipe_element)

copy_me(self)

Helper function to copy an entire Hyperpipe

Returns:

Type Description

Hyperpipe

Source code in photonai/base/hyperpipe.py
def copy_me(self):
    """
    Helper function to copy an entire Hyperpipe

    Returns:
        Hyperpipe

    """
    signature = inspect.getfullargspec(OutputSettings.__init__)[0]
    settings = OutputSettings()
    for attr in signature:
        if hasattr(self.output_settings, attr):
            setattr(settings, attr, getattr(self.output_settings, attr))
    self.output_settings.initialize_log_file()

    # create new Hyperpipe instance
    pipe_copy = Hyperpipe(name=self.name,
                          inner_cv=deepcopy(self.cross_validation.inner_cv),
                          outer_cv=deepcopy(self.cross_validation.outer_cv),
                          best_config_metric=self.optimization.best_config_metric,
                          metrics=self.optimization.metrics,
                          optimizer=self.optimization.optimizer_input_str,
                          optimizer_params=self.optimization.optimizer_params,
                          project_folder=self.project_folder,
                          output_settings=settings)

    signature = inspect.getfullargspec(self.__init__)[0]
    for attr in signature:
        if hasattr(self, attr) and attr != 'output_settings':
            setattr(pipe_copy, attr, getattr(self, attr))

    if hasattr(self, 'preprocessing') and self.preprocessing:
        preprocessing = Preprocessing()
        for element in self.preprocessing.elements:
            preprocessing += element.copy_me()
        pipe_copy += preprocessing
    if hasattr(self, 'elements'):
        for element in self.elements:
            pipe_copy += element.copy_me()
    return pipe_copy

fit(self, data, targets, **kwargs)

Starts the hyperparameter search and/or fits the pipeline to the data and targets.

Manages the nested cross validated hyperparameter search:

  1. Filters the data according to filter strategy (1) and according to the imbalanced_data_strategy (2)
  2. requests new configurations from the hyperparameter search strategy, the optimizer,
  3. initializes the testing of a specific configuration,
  4. communicates the result to the optimizer,
  5. repeats 2-4 until optimizer delivers no more configurations to test
  6. finally searches for the best config in all tested configs,
  7. trains the pipeline with the best config and evaluates the performance on the test set

Parameters:

Name Type Description Default
data ndarray

The array-like training and test data with shape=[N, D], where N is the number of samples and D is the number of features.

required
targets ndarray

The truth array-like values with shape=[N], where N is the number of samples.

required
**kwargs

Keyword arguments, passed to Outer_Fold_Manager.fit.

{}

Returns:

Type Description

Fitted Hyperpipe.

Source code in photonai/base/hyperpipe.py
def fit(self, data: np.ndarray, targets: np.ndarray, **kwargs):
    """
    Starts the hyperparameter search and/or fits the pipeline to the data and targets.

    Manages the nested cross validated hyperparameter search:

    1. Filters the data according to filter strategy (1) and according to the imbalanced_data_strategy (2)
    2. requests new configurations from the hyperparameter search strategy, the optimizer,
    3. initializes the testing of a specific configuration,
    4. communicates the result to the optimizer,
    5. repeats 2-4 until optimizer delivers no more configurations to test
    6. finally searches for the best config in all tested configs,
    7. trains the pipeline with the best config and evaluates the performance on the test set

    Parameters:
        data:
            The array-like training and test data with shape=[N, D],
            where N is the number of samples and D is the number of features.

        targets:
            The truth array-like values with shape=[N],
            where N is the number of samples.

        **kwargs:
            Keyword arguments, passed to Outer_Fold_Manager.fit.


    Returns:
        Fitted Hyperpipe.

    """
    # switch to result output folder
    start = datetime.datetime.now()
    self.output_settings.update_settings(self.name, start.strftime("%Y-%m-%d_%H-%M-%S"))

    logger.photon_system_log('=' * 101)
    logger.photon_system_log('PHOTONAI ANALYSIS: ' + self.name)
    logger.photon_system_log('=' * 101)
    logger.info("Preparing data and PHOTONAI objects for analysis...")

    # loop over outer cross validation
    if self.nr_of_processes > 1:
        hyperpipe_client = Client(threads_per_worker=1, n_workers=self.nr_of_processes, processes=False)

    try:
        # check data
        self.data.input_data_sanity_checks(data, targets, **kwargs)
        # create photon pipeline
        self._prepare_pipeline()
        # initialize the progress monitors
        self._prepare_result_logging(start)
        # apply preprocessing
        self.preprocess_data()

        if not self.is_final_fit:

            # Outer Folds
            outer_folds = FoldInfo.generate_folds(self.cross_validation.outer_cv,
                                                  self.data.X, self.data.y, self.data.kwargs,
                                                  self.cross_validation.use_test_set,
                                                  self.cross_validation.test_size)

            self.cross_validation.outer_folds = {f.fold_id: f for f in outer_folds}
            delayed_jobs = []

            # Run Dummy Estimator
            dummy_estimator = self._prepare_dummy_estimator()

            if self.cache_folder is not None:
                logger.info("Removing cache files...")
                CacheManager.clear_cache_files(self.cache_folder, force_all=True)

            # loop over outer cross validation
            for i, outer_f in enumerate(outer_folds):

                # 1. generate OuterFolds Object
                outer_fold = MDBOuterFold(fold_nr=outer_f.fold_nr)
                outer_fold_computer = OuterFoldManager(self._pipe,
                                                       self.optimization,
                                                       outer_f.fold_id,
                                                       self.cross_validation,
                                                       cache_folder=self.cache_folder,
                                                       cache_updater=self.recursive_cache_folder_propagation,
                                                       dummy_estimator=dummy_estimator,
                                                       result_obj=outer_fold)
                # 2. monitor outputs
                self.results.outer_folds.append(outer_fold)

                if self.nr_of_processes > 1:
                    result = dask.delayed(Hyperpipe.fit_outer_folds)(outer_fold_computer,
                                                                     self.data.X,
                                                                     self.data.y,
                                                                     self.data.kwargs,
                                                                     self.cache_folder)
                    delayed_jobs.append(result)
                else:
                    try:
                        # 3. fit
                        outer_fold_computer.fit(self.data.X, self.data.y, **self.data.kwargs)
                        # 4. save outer fold results
                        self.results_handler.save()
                    finally:
                        # 5. clear cache
                        CacheManager.clear_cache_files(self.cache_folder)

            if self.nr_of_processes > 1:
                dask.compute(*delayed_jobs)
                self.results_handler.save()

            # evaluate hyperparameter optimization results for best config
            self._finalize_optimization()

            # clear complete cache ?
            CacheManager.clear_cache_files(self.cache_folder, force_all=True)

        ###############################################################################################
        else:
            self.preprocess_data()
            self._pipe.fit(self.data.X, self.data.y, **kwargs)
    except Exception as e:
        logger.error(e)
        logger.error(traceback.format_exc())
        traceback.print_exc()
        raise e
    finally:
        if self.nr_of_processes > 1:
            hyperpipe_client.close()
    return self

get_permutation_feature_importances(self, **kwargs)

Fits a model for the best config of each outer fold (using the training data of that fold). Then calls sklearn.inspection.permutation_importance with the test data and the given kwargs (e.g. n_repeats). Returns mean of "importances_mean" and of "importances_std" of all outer folds.

Parameters:

Name Type Description Default
X_val

The array-like data with shape=[M, D], where M is the number of samples and D is the number of features. D must correspond to the number of trained dimensions of the fit method.

required
y_val

The array-like true targets.

required
**kwargs

Keyword arguments, passed to sklearn.permutation_importance.

{}

Returns:

Type Description

Dictionary with average of "mean" and "std" for all outer folds, respectively.

Source code in photonai/base/hyperpipe.py
def get_permutation_feature_importances(self, **kwargs):
    """
    Fits a model for the best config of each outer fold (using the training data of that fold).
    Then calls sklearn.inspection.permutation_importance with the test data and the given kwargs (e.g. n_repeats).
    Returns mean of "importances_mean" and of "importances_std" of all outer folds.

    Parameters:
        X_val:
            The array-like data with shape=[M, D],
            where M is the number of samples and D is the number
            of features. D must correspond to the number
            of trained dimensions of the fit method.

        y_val:
            The array-like true targets.

        **kwargs:
            Keyword arguments, passed to sklearn.permutation_importance.

    Returns:
        Dictionary with average of "mean" and "std" for all outer folds, respectively.

    """

    importance_list = {'mean': list(), 'std': list()}
    pipe_copy = self.optimum_pipe.copy_me()
    logger.photon_system_log("")
    logger.photon_system_log("Computing permutation importances. This may take a while.")
    logger.stars()
    for outer_fold in self.results.outer_folds:

        if outer_fold.best_config.best_config_score is None:
            raise ValueError("Cannot compute permutation importances when use_test_set is false")


        # prepare data
        train_indices = outer_fold.best_config.best_config_score.training.indices
        test_indices = outer_fold.best_config.best_config_score.validation.indices

        train_X, train_y, train_kwargs = PhotonDataHelper.split_data(self.data.X,
                                                                     self.data.y,
                                                                     self.data.kwargs,
                                                                     indices=train_indices)

        test_X, test_y, test_kwargs = PhotonDataHelper.split_data(self.data.X,
                                                                  self.data.y,
                                                                  self.data.kwargs,
                                                                  indices=test_indices)
        # set pipe to config
        pipe_copy.set_params(**outer_fold.best_config.config_dict)
        logger.photon_system_log("Permutation Importances: Fitting model for outer fold " + str(outer_fold.fold_nr))
        pipe_copy.fit(train_X, train_y, **train_kwargs)

        logger.photon_system_log("Permutation Importances: Calculating performances for outer fold "
                                 + str(outer_fold.fold_nr))
        outer_fold_perm_imps = permutation_importance(pipe_copy, test_X, test_y, **kwargs)
        importance_list['mean'].append(outer_fold_perm_imps["importances_mean"])
        importance_list['std'].append(outer_fold_perm_imps["importances_std"])

    mean_importances = np.mean(np.array(importance_list["mean"]), axis=0)
    std_importances = np.mean(np.array(importance_list["std"]), axis=0)
    logger.stars()

    return {'mean': mean_importances, 'std': std_importances}

inverse_transform_pipeline(self, hyperparameters, data, targets, data_to_inverse)

Inverse transform data for a pipeline with specific hyperparameter configuration.

  1. Copy Sklearn Pipeline,
  2. Set Parameters
  3. Fit Pipeline to data and targets
  4. Inverse transform data with that pipeline

Parameters:

Name Type Description Default
hyperparameters dict

The concrete configuration settings for the pipeline elements.

required
data ndarray

The training data to which the pipeline is fitted.

required
targets ndarray

The truth values for training.

required
data_to_inverse ndarray

The data that should be inversed after training.

required

Returns:

Type Description
ndarray

Inverse data as array.

Source code in photonai/base/hyperpipe.py
def inverse_transform_pipeline(self, hyperparameters: dict,
                               data: np.ndarray,
                               targets: np.ndarray,
                               data_to_inverse: np.ndarray) -> np.ndarray:
    """
    Inverse transform data for a pipeline with specific hyperparameter configuration.

    1. Copy Sklearn Pipeline,
    2. Set Parameters
    3. Fit Pipeline to data and targets
    4. Inverse transform data with that pipeline

    Parameters:
        hyperparameters:
            The concrete configuration settings for the pipeline elements.

        data:
            The training data to which the pipeline is fitted.

        targets:
            The truth values for training.

        data_to_inverse:
            The data that should be inversed after training.

    Returns:
        Inverse data as array.

    """
    copied_pipe = self.pipe.copy_me()
    copied_pipe.set_params(**hyperparameters)
    copied_pipe.fit(data, targets)
    return copied_pipe.inverse_transform(data_to_inverse)

load_optimum_pipe(file, password=None) staticmethod

Load optimum pipe from file. As staticmethod, instantiation is thus not required. Called backend: PhotonModelPersistor.load_optimum_pipe.

Parameters:

Name Type Description Default
file str

File path specifying .photon file to load trained pipeline from zipped file.

required
password str

Passcode for read file.

None

Returns:

Type Description
PhotonPipeline

Returns pipeline with all trained PipelineElements.

Source code in photonai/base/hyperpipe.py
@staticmethod
def load_optimum_pipe(file: str, password: str = None) -> PhotonPipeline:
    """
    Load optimum pipe from file.
    As staticmethod, instantiation is thus not required.
    Called backend: PhotonModelPersistor.load_optimum_pipe.

    Parameters:
        file:
            File path specifying .photon file to load
            trained pipeline from zipped file.

        password:
            Passcode for read file.

    Returns:
        Returns pipeline with all trained PipelineElements.

    """
    return PhotonModelPersistor.load_optimum_pipe(file, password)

predict(self, data, **kwargs)

Use the optimum pipe to predict the input data.

Parameters:

Name Type Description Default
data ndarray

The array-like prediction data with shape=[M, D], where M is the number of samples and D is the number of features. D must correspond to the number of trained dimensions of the fit method.

required
**kwargs

Keyword arguments, passed to optimum_pipe.predict.

{}

Returns:

Type Description
ndarray

Predicted targets calculated on input data with trained model.

Source code in photonai/base/hyperpipe.py
def predict(self, data: np.ndarray, **kwargs) -> np.ndarray:
    """
    Use the optimum pipe to predict the input data.

    Parameters:
        data:
            The array-like prediction data with shape=[M, D],
            where M is the number of samples and D is the number
            of features. D must correspond to the number
            of trained dimensions of the fit method.

        **kwargs:
            Keyword arguments, passed to optimum_pipe.predict.

    Returns:
        Predicted targets calculated on input data with trained model.

    """
    # Todo: if local_search = true then use optimized pipe here?
    if self._pipe:
        return self.optimum_pipe.predict(data, **kwargs)

predict_proba(self, data, **kwargs)

Use the optimum pipe to predict the probabilities from the input data.

Parameters:

Name Type Description Default
data ndarray

The array-like prediction data with shape=[M, D], where M is the number of samples and D is the number of features. D must correspond to the number of trained dimensions of the fit method.

required
**kwargs

Keyword arguments, passed to optimum_pipe.predict_proba.

{}

Returns:

Type Description
ndarray

Probabilities calculated from input data on fitted model.

Source code in photonai/base/hyperpipe.py
def predict_proba(self, data: np.ndarray, **kwargs) -> np.ndarray:
    """
    Use the optimum pipe to predict the probabilities from the input data.

    Parameters:
        data:
            The array-like prediction data with shape=[M, D],
            where M is the number of samples and D is the number
            of features. D must correspond to the number
            of trained dimensions of the fit method.

        **kwargs:
            Keyword arguments, passed to optimum_pipe.predict_proba.

    Returns:
        Probabilities calculated from input data on fitted model.


    """
    if self._pipe:
        return self.optimum_pipe.predict_proba(data, **kwargs)