Skip to content

Documentation for ImbalancedDataTransformer

Applies the chosen strategy to the data in order to balance the input data. Instantiates the strategy filter object according to the name given as string literal. Underlying architecture: Imbalanced-Learning. More information on their documentation.

Examples:

1
2
3
4
5
6
7
from photonai.optimization import Categorical

tested_methods = Categorical(['RandomOverSampler', 'SMOTEENN', 'SVMSMOTE',
                      'BorderlineSMOTE', 'SMOTE', 'ClusterCentroids'])
PipelineElement('ImbalancedDataTransformer',
                hyperparameters={'method_name': tested_methods},
                test_disabled=True)

__init__(self, method_name='RandomUnderSampler', **kwargs) special

Instantiates an object that transforms the data into balanced groups according to the given method.

Parameters:

Name Type Description Default
method_name str

Imbalanced learning strategy. Possible values with

  • an oversampling strategy are:

    • ADASYN,
    • BorderlineSMOTE,
    • KMeansSMOTE,
    • RandomOverSampler,
    • SMOTE,
    • SMOTENC,
    • SVMSMOTE,
  • an undersampling strategy are:

    • ClusterCentroids,
    • RandomUnderSampler,
    • NearMiss,
    • InstanceHardnessThreshold,
    • CondensedNearestNeighbour,
    • EditedNearestNeighbours,
    • RepeatedEditedNearestNeighbours,
    • AllKNN,
    • NeighbourhoodCleaningRule,
    • OneSidedSelection,
  • a combined strategy are:

    • SMOTEENN,
    • SMOTETomek.
'RandomUnderSampler'
**kwargs

Any parameters to pass to the imbalance strategy object.

{}
Source code in photonai/modelwrapper/imbalanced_data_transformer.py
def __init__(self, method_name: str = 'RandomUnderSampler', **kwargs):
    """
    Instantiates an object that transforms the data into balanced groups according to the given method.

    Parameters:
        method_name:
            Imbalanced learning strategy. Possible values with

            - an oversampling strategy are:
                - ADASYN,
                - BorderlineSMOTE,
                - KMeansSMOTE,
                - RandomOverSampler,
                - SMOTE,
                - SMOTENC,
                - SVMSMOTE,

            - an undersampling strategy are:
                - ClusterCentroids,
                - RandomUnderSampler,
                - NearMiss,
                - InstanceHardnessThreshold,
                - CondensedNearestNeighbour,
                - EditedNearestNeighbours,
                - RepeatedEditedNearestNeighbours,
                - AllKNN,
                - NeighbourhoodCleaningRule,
                - OneSidedSelection,

            - a combined strategy are:
                - SMOTEENN,
                - SMOTETomek.

        **kwargs:
            Any parameters to pass to the imbalance strategy object.

    """
    if not __found__:
        raise ModuleNotFoundError("Module imblearn not found or not installed as expected. "
                                  "Please install the requirements.txt in PHOTON main folder.")

    self.method_name = method_name
    self.needs_y = True

    imbalance_type = ''
    for group, possible_strategies in ImbalancedDataTransformer.IMBALANCED_DICT.items():
        if self.method_name in possible_strategies:
            imbalance_type = group

    if imbalance_type == "oversampling":
        home = over_sampling
    elif imbalance_type == "undersampling":
        home = under_sampling
    elif imbalance_type == "combine" or imbalance_type == "combination":
        home = combine
    else:
        msg = "Imbalance Type not found. Can be oversampling, undersampling or combine. " \
              "Oversampling: method_name one of {}. Undersampling: method_name one of {}." \
              "Combine: method_name one of {}.".format(str(self.IMBALANCED_DICT["oversampling"]),
                                                       str(self.IMBALANCED_DICT["undersampling"]),
                                                       str(self.IMBALANCED_DICT["combine"]))
        logger.error(msg)
        raise ValueError(msg)

    desired_class = getattr(home, method_name)
    self.method = desired_class(**kwargs)

fit_transform(self, X, y=None, **kwargs)

Call of the underlying imblearn.fit_resample(X, y).

Parameters:

Name Type Description Default
X ndarray

The input samples of shape [n_samples, n_features].

required
y ndarray

The input targets of shape [n_samples, 1].

None
**kwargs

Ignored input.

{}

Returns:

Type Description
(<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Transformed data.

Source code in photonai/modelwrapper/imbalanced_data_transformer.py
def fit_transform(self, X: np.ndarray, y: np.ndarray = None, **kwargs) -> (np.ndarray, np.ndarray):
    """
    Call of the underlying imblearn.fit_resample(X, y).

    Parameters:
        X:
            The input samples of shape [n_samples, n_features].

        y:
            The input targets of shape [n_samples, 1].

        **kwargs:
            Ignored input.

    Returns:
        Transformed data.

    """
    return self.method.fit_resample(X, y)

transform(self, X, y=None, **kwargs)

Forwarding to the self.fit_transform method.

Parameters:

Name Type Description Default
X ndarray

The input samples of shape [n_samples, n_features].

required
y ndarray

The input targets of shape [n_samples, 1].

None
**kwargs

Ignored input.

{}

Returns:

Type Description
(<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Transformed data.

Source code in photonai/modelwrapper/imbalanced_data_transformer.py
def transform(self, X: np.ndarray, y: np.ndarray = None, **kwargs) -> (np.ndarray, np.ndarray):
    """
    Forwarding to the self.fit_transform method.

    Parameters:
        X:
            The input samples of shape [n_samples, n_features].

        y:
            The input targets of shape [n_samples, 1].

        **kwargs:
            Ignored input.

    Returns:
        Transformed data.

    """
    return self.fit_transform(X, y)