cvasl.harmony module¶
Copyright 2023 Netherlands eScience Center and the Amsterdam University Medical Center. Licensed under the Apache License, version 2.0. See LICENSE for details.
This file contains functions for processing csv and tsv files as they relate to specific common harmonization algorithms. Most seperated values processing is in the seperated module, however, this this module has been made so it can be called in environments compatible with common harmonization algorithms which often require older versions of python, pandas and numpy than usual in 2023.
- cvasl.harmony.compare_harm_multi_site_violins(unharmonized_df, harmonized_df, feature_list, batch_column='site')¶
Create a violin plot on multisite harmonization by features.
- cvasl.harmony.compare_harm_one_site_violins(unharmonized_df, harmonized_df, feature_list, chosen_feature='sex')¶
Create a violin plot on single site harmonization by features, split on a binary feature of choice which defaults to sex.
- cvasl.harmony.increment_keys(input_dict, chosen_value=1)¶
This function increments all keys in dictionary by a certain chosen value.
- cvasl.harmony.log_out_columns(dataframe, column_list)¶
This function recodes changes specified column values in a dataframe to a log of the values, which can make overall distributions change.
- cvasl.harmony.make_topper(btF, row0, row1)¶
This function makes top rows for something harmonized out of the btF part produced by the prep_for_neurocombat function i.e. prep_for_neurocombat(dataframename1, dataframename2)
- cvasl.harmony.negative_harm_outcomes(folder, file_extension, number_columns=['sex', 'gm_vol', 'wm_vol', 'csf_vol', 'gm_icvratio', 'gmwm_icvratio', 'wmhvol_wmvol', 'wmh_count', 'deepwm_b_cov', 'aca_b_cov', 'mca_b_cov', 'pca_b_cov', 'totalgm_b_cov', 'deepwm_b_cbf', 'aca_b_cbf', 'mca_b_cbf', 'pca_b_cbf', 'totalgm_b_cbf'])¶
This function given a directory will search all subdirectory for noted file extension If all files are harmonization outcome files it will then return a list of files with negative values, and print off information about negatives in all files.
- cvasl.harmony.prep_for_neurocombat(dataframe1, dataframe2)¶
This function takes two dataframes in the cvasl format, then turns them into the items needed for the neurocombat algorithm with re-identification.
- Parameters:
dataframe1 – frame variable
dataframe2 – frame variable
- Returns:
dataframes for neurocombat algorithm and ints of some legnths
- Return type:
- cvasl.harmony.prep_for_neurocombat_5way(dataframe1, dataframe2, dataframe3, dataframe4, dataframe5)¶
This function takes five dataframes in the cvasl format, then turns them into the items needed for the neurocombat algorithm with re-identification.
- Parameters:
dataframe1 – frame variable
dataframe2 – frame variable
dataframe3 – frame variable
dataframe4 – frame variable
dataframe5 – frame variable
- Returns:
dataframes for neurocombat algorithm and ints of some legnths
- Return type:
- cvasl.harmony.show_diff_on_var(dataset1, name_dataset1, dataset2, name_dataset2, var1, var2)¶
- cvasl.harmony.show_diff_on_var3(dataset1, name_dataset1, dataset2, name_dataset2, dataset3, name_dataset3, var1, var2)¶
- cvasl.harmony.show_diff_on_var5(dataset1, name_dataset1, dataset2, name_dataset2, dataset3, name_dataset3, dataset4, name_dataset4, dataset5, name_dataset5, var1, var2)¶
- cvasl.harmony.split_frame_half_balanced_by_column(frame, column)¶
This is function is made for a dataframe you want to split on a columns with continous values e.g. age.; and returns two dataframes in which the values in this column are about equally distributed e.g. average age over both frames, if age is column variable, will be similar
- Parameters:
dataframe – frame variable
column (Series) – column name
- Returns:
dataframes evenly idstributed on values in specified column
- Return type:
pandas.dataFrame
- cvasl.harmony.top_and_bottom_by_column(frame, column)¶
This is useful in cases where you want to split on a columns with continous values e.g. age.; and upi want the highest and lowest values seperated
- Parameters:
dataframe – frame variable
column (Series) – column name
- Returns:
dataframes unevenly distributed on values in specified column
- Return type:
~pandas.DataFrame