Getting Started

This guide will help you get started with running an analysis using the CPMRegression class. It provides a step-by-step description of how to set up, configure, and execute an analysis, along with explanations of the inputs and parameters.

Step 1: Prepare Your Data

To run an analysis, you need the following inputs:

Connectome Data (X): A 2D array (numpy array or pandas DataFrame) of shape (n_samples, n_features) containing connectome edge values for each subject.
Target Variable (y): A 1D array or pandas Series of shape (n_samples,) containing the outcome variable (e.g., clinical scores, behavioral measures).
Covariates: A 2D array or pandas DataFrame of shape (n_samples, n_covariates) containing variables to control for (e.g., age, sex).

Ensure that all inputs have consistent sample sizes (n_samples).

Step 2: Configure the Analysis

Cross-Validation

The CPMRegression class uses an outer cross-validation loop for performance evaluation and an optional inner cross-validation loop for hyperparameter optimization.

Outer CV (cv): Defines the cross-validation strategy (e.g., KFold).
Inner CV (inner_cv): Used for optimizing hyperparameters during edge selection. Can be left as None if not needed.

Example:

Python

from sklearn.model_selection import KFold

outer_cv = KFold(n_splits=10, shuffle=True, random_state=42)

Edge Selection

The toolbox implements univariate edge selection, allowing users to specify the method for evaluating and selecting edges based on statistical tests.

Edge Statistics

Choose from the following methods for computing edge statistics:

pearson: Pearson correlation
pearson_partial: Pearson partial correlation (controlling for covariates)
spearman: Spearman rank correlation
spearman_partial: Spearman partial correlation (controlling for covariates)

p-Thresholds

Set a single value (e.g., 0.05) or provide multiple values (e.g., [0.01, 0.05, 0.1]).
If multiple thresholds are specified, the toolbox will optimize for the best p-threshold during inner cross-validation.

FDR Correction

Optional FDR correction for multiple comparisons can be applied using correction='fdr_by'.

Example:

Python

from cpm.edge_selection import UnivariateEdgeSelection, PThreshold

edge_statistic = 'pearson'
univariate_edge_selection = UnivariateEdgeSelection(
    edge_statistic=[edge_statistic],
    edge_selection=[PThreshold(threshold=[0.05], correction=['fdr_by'])]
)

Step 3: Set Up the CPMRegression Object

Create an instance of the CPMRegression class with the required inputs:

Python

from cpm.cpm_analysis import CPMRegression

cpm = CPMRegression(
    results_directory="results/",
    cv=outer_cv,
    inner_cv=inner_cv,  # Optional
    edge_selection=univariate_edge_selection,
    select_stable_edges=True,
    stability_threshold=0.8,
    impute_missing_values=True,
    n_permutations=100
)

Key Parameters

results_directory: Directory where results will be saved.
cv: Outer cross-validation strategy.
inner_cv: Inner cross-validation strategy for hyperparameter optimization (optional).
edge_selection: Configuration for univariate edge selection.
select_stable_edges: Whether to select stable edges across folds (True or False).
stability_threshold: Minimum proportion of folds in which an edge must be selected to be considered stable.
impute_missing_values: Whether to impute missing values (True or False).
n_permutations: Number of permutations for permutation testing.

Step 4: Run the Analysis

Call the estimate method to perform the analysis:

Python

X = ...  # Load your connectome data (numpy array or pandas DataFrame)
y = ...  # Load your target variable (numpy array or pandas Series)
covariates = ...  # Load your covariates (numpy array or pandas DataFrame)

cpm.run(X=X, y=y, covariates=covariates)

This will:

Perform edge selection based on the specified method and thresholds.
Train and evaluate models for each cross-validation fold.
Save results, including predictions, metrics, and permutation-based significance tests, to the results_directory.

Step 5: Review Results

After the analysis, you can find the results in the results_directory, including:

Cross-validation metrics (e.g., mean absolute error, R²).
Model predictions for each fold.
Edge stability and significance.

You can load and inspect these results for further analysis.

By following these steps, you can quickly set up and execute a connectome-based predictive modeling analysis using the CPMRegression class. For further customization, refer to the API documentation.