Remove confounders
In some situations, and especially in the life sciences, we are interested in
the predictive value of certain features but would therefore like to exclude
the contribution of confounding variables. In order to do that, simple linear models
can be used to regress out the effect of a confounder from all of the features.
However, to ensure the independence of training and test set, this has to be done
within the cross-validation framework. Adding a ConfoundRemoval PipelineElement
to a PHOTONAI pipeline will ensure exactly that when regressing out confounding effects.
The confounder variables can be passed to the Hyperpipe in the .fit() method (see
example below).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 | from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import KFold
from photonai.base import Hyperpipe, PipelineElement
# WE USE THE BREAST CANCER SET FROM SKLEARN
data = load_breast_cancer()
y = data.target
# now let's assume we want to regress out the effect of mean_radius and mean_texture
X = data.data[:, 2:]
mean_radius = data.data[:, 0]
mean_texture = data.data[:, 1]
# BUILD HYPERPIPE
pipe = Hyperpipe('confounder_removal_example',
optimizer='grid_search',
metrics=['accuracy', 'precision', 'recall'],
best_config_metric='accuracy',
outer_cv=KFold(n_splits=5),
inner_cv=KFold(n_splits=3),
verbosity=1,
project_folder='./tmp/')
# # there are two ways of specifying multiple confounders
# # first, you can simply pass a dictionary with "confounder" as key and a data matrix or list as value
# pipe += PipelineElement('ConfounderRemoval', {}, standardize_covariates=True, test_disabled=False)
# pipe.fit(X, y, confounder=[mean_radius, mean_texture])
# pipe += PipelineElement('SVC')
# second, you can also specify the names of the variables that should be used in the confounder removal step
pipe += PipelineElement('ConfounderRemoval', {},
standardize_covariates=True,
test_disabled=True,
confounder_names=['mean_radius', 'mean_texture'])
pipe += PipelineElement('SVC')
# those names must be keys in the kwargs dictionary
pipe.fit(X, y, mean_radius=mean_radius, mean_texture=mean_texture)
|