cvasl.vendor.open_nested_combat package¶
Submodules¶
cvasl.vendor.open_nested_combat.nest module¶
- cvasl.vendor.open_nested_combat.nest.GMMSplit(dat, caseno, filepath)¶
The following is from the Hannah Horng library open nested combat here : https://github.com/hannah-horng/opnested-combat As the library is unreleased and unversioned, we are using the MIT lisenced functions directly to version control them.
According to Dr. Horng’s documentation this function
“Completes Gaussian Mixture model fitting and ComBat harmonization by the resulting sample grouping. The assumption here is that there is an unknown batch effect causing bimodality such that we can estimate the sample groupings for this hidden batch effect from the distribution. This function will take in a dataset, determine the best 2-component Gaussian mixture model, and use the resulting sample grouping to harmonize the data with ComBat.” [needs better citation]
Arguments¶
dat : DataFrame of original data with shape (features, samples) caseno : DataFrame/Series containing sample IDs (should be aligned with dat and covars), used to return sample grouping assignments. filepath : root directory path for saving the grouping and corresponding kernel density plots ——- new_dat : DataFrame with shape (features, samples) that has been sequentially harmonized with Nested ComBat
- cvasl.vendor.open_nested_combat.nest.OPNestedComBat(dat, covars, batch_list, filepath, categorical_cols=None, continuous_cols=None, return_estimates=False)¶
This function is from the Hannah Horng library open nested combat here : https://github.com/hannah-horng/opnested-combat As the library is unreleased and unversioned, we are using the MIT lisenced functions directly to version control them. There are some minimal changes for the sake of format correctness
According to Dr. Horng’s documentation this function ” Completes sequential OPNested ComBat harmonization on an input DataFrame. Order is determined by running through all possible permutations of the order, then picking the order with the lowest number of features with significant differences in distribution.”
Arguments¶
dat : DataFrame of original data with shape (features, samples) covars : DataFrame with shape (samples, covariates) corresponding to original data. All variables should be label- encoded (i.e. strings converted to integer designations) batch_list : list of strings indicating batch effect column names within covars (i.e. [‘Manufacturer’, ‘CE’…]) filepath : root directory path for saving KS test p-values and kernel density plots created during harmonization categorical_cols : string or list of strings of categorical variables to adjust for continuous_cols : string or list of strings of continuous variables to adjust for return_estimates : if True, function will return both output_df and final_estimates
Returns¶
output_df : DataFrame with shape (features, samples) that has been sequentially harmonized with Nested ComBat final_estimates : list of dictionaries of estimates from iterative harmonization, used if user is deriving estimates from training data that need to be applied to a separate validation dataset
- cvasl.vendor.open_nested_combat.nest.feature_ad(dat, output_df, covars, batch_list, filepath)¶
This function is from the Hannah Horng library open nested combat here: https://github.com/hannah-horng/opnested-combat As the library is unreleased and unversioned, we are using the MIT lisenced functions directly to version control them. There are minimal changes for linting purposes.
According to Dr. Horng’s documentation this function “Computes AD test p-values separated by batch effect groups for a dataset (intended to assess differences in distribution to all batch effects in batch_list following harmonization NestedComBat”
Arguments¶
dat : DataFrame of original data with shape (samples, features) output_df: DataFrame of harmonized data with shape (samples, features) covars : DataFrame with shape (samples, covariates) corresponding to original data. All variables should be label- encoded (i.e. strings converted to integer designations) batch_list : list of strings indicating batch effect column names within covars (i.e. [‘Manufacturer’, ‘CE’…])
filepath : write destination for kernel density plots and p-values
If a feature is all the same value, the AD test cannot be completed.