cvasl.file_handler module

Copyright 2023 Netherlands eScience Center and the Amsterdam University Medical Center. Licensed under the Apache License, version 2.0. See LICENSE for details.

This file contains one method to let the user configure all paths for data instead of hard-coding them, as well as methods to check data integrity, and create synthetic data. The data integrity can be checked because this file contains hash functions to track data. Synthetic data can be made with several methods.

class cvasl.file_handler.Config

Bases: object

This class allows configuration on the home computer or remote workspace, of a file setup for data, which is then processed into a variable. Essentially by setting up and modifying a .json file in the appropriate directory users can avoid the need for any hardcoded paths to data. If you do not set up a json file, then your storage space will default to the test_data folder in the repository.

default_layout = {'bids': '{}', 'cvage': '{}/derivates/cvage', 'cvage_inputs': '{}/derivates/cvage/cvasl_inputs', 'cvage_outputs': '{}/derivates/cvage/cvasl_outputs', 'derivatives': '{}/derivatives', 'explore_asl': '{}/derivatives/explore_asl', 'raw_data': '{}/raw_data'}
default_locations = ('./config.json', '/home/runner/.cvasl/config.json', '/etc/cvasl/config.json')
classmethod from_file(location=None, overrides=None)
get_directory(directory, value=None)
load(location)
classmethod no_file(overrides)
parse(found)
parse_overrides(overrides=None, source='<command line>')
pprint(stream)
required_directories = ('bids',)
usage()

This is essentally a notice message if the computer does not have paths configured or files made so that the data paths of a config.json can be used. Until you do it will defailt to test_data

validate()
cvasl.file_handler.extract_common_columns(list_tsv_files)

This function takes a group of tsv files and extracts the common columns

Parameters:

list_tsv_files (list) – list of filenames of tsv files

Returns:

result is common elements in columns

Return type:

set

cvasl.file_handler.find_where_column(list_tsv_files, column_list)

A function to find which tsv contain a list of specified columns

Parameters:
  • list_tsv_files (list) – list of filenames of tsv files

  • column_list (list) – list of columns as strings

Returns:

list of lists of tsv names

Return type:

list

cvasl.file_handler.hash_folder(origin_folder1, file_extension, made, force=False)

Hashing function to be used by command line.

Parameters:
  • origin_folder1 (str) – The string of the folder with files to hash

  • file_extension (str) – File extension

  • made (str) – file directory where csv with hashes will be put

cvasl.file_handler.hash_rash(origin_folder1, file_extension)

Hashing function to check files are not corrupted or to assure files are changed.

Parameters:
  • origin_folder1 (str) – The string of the folder with files to hash

  • file_extension (str) – File extension, written without period

Returns:

Dataframe with hashes for what is in folder

Return type:

DataFrame

cvasl.file_handler.intersect_all(*sets)

A function that given a group of sets will return the elements common to all sets.

Parameters:

*sets (list) – group of set or list of lists, but unpacked

Returns:

result is common elements

Return type:

set

cvasl.file_handler.make_columns(list_tsv_files)

This function takes column titles out of a tsv file.

Parameters:

list_tsv_files (list) – list of filenames of tsv files

Returns:

list of lists of column names

Return type:

list

cvasl.file_handler.unduplicate_dfs(list_of_dataframes)

This function takes a list of dataframes and should return only dataframes that are not duplicated from each other but it must be improved (see TODO)