Core components

Core classes for managing remodeling operations, backups, and validation.

Dispatcher

The main orchestrator for executing remodeling operations.

class remodel.dispatcher.Dispatcher(operation_list, data_root=None, backup_name='default_back', hed_versions=None)

Bases: object

Controller for applying operations to tabular files and saving the results.

REMODELING_SUMMARY_PATH = 'remodel/summaries'
__init__(operation_list, data_root=None, backup_name='default_back', hed_versions=None)

Constructor for the dispatcher.

Parameters:
  • operation_list (list) – List of valid unparsed operations.

  • data_root (str or None) – Root directory for the dataset. If none, then backups are not made.

  • hed_versions (str, list, HedSchema, or HedSchemaGroup) – The HED schema.

Raises:
  • HedFileError – If the specified backup does not exist.

  • ValueError – If any of the operations cannot be parsed correctly.

get_summaries(file_formats=None) list[dict]

Return the summaries in a dictionary of strings suitable for saving or archiving.

Parameters:

file_formats (list or None) – List of formats for the context files (‘.json’ and ‘.txt’ are allowed). If None, defaults to [‘.txt’, ‘.json’].

Returns:

A list of dictionaries of summaries keyed to filenames.

Return type:

list[dict]

get_data_file(file_designator) DataFrame

Get the correct data file give the file designator.

Parameters:

file_designator (str, DataFrame) – A dataFrame or the full path of the dataframe in the original dataset.

Returns:

DataFrame after reading the path.

Return type:

pd.DataFrame

Raises

HedFileError: If a valid file cannot be found.

Notes

  • If a string is passed and there is a backup manager, the string must correspond to the full path of the file in the original dataset. In this case, the corresponding backup file is read and returned.

  • If a string is passed and there is no backup manager, the data file corresponding to the file_designator is read and returned.

  • If a Pandas DataFrame, return a copy.

get_summary_save_dir() str

Return the directory in which to save the summaries.

Returns:

the data_root + remodeling summary path

Return type:

str

Raises

HedFileError: If this dispatcher does not have a data_root.

run_operations(file_path, sidecar=None, verbose=False) DataFrame

Run the dispatcher operations on a file.

Parameters:
  • file_path (str or DataFrame) – Full path of the file to be remodeled or a DataFrame.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

  • verbose (bool) – If True, print out progress reports.

Returns:

The processed dataframe.

Return type:

pd.DataFrame

save_summaries(save_formats=None, individual_summaries='separate', summary_dir=None, task_name='')

Save the summary files in the specified formats.

Parameters:
  • save_formats (list or None) – A list of formats [“.txt”, “.json”]. If None, defaults to [‘.json’, ‘.txt’].

  • individual_summaries (str) – “consolidated”, “individual”, or “none”.

  • summary_dir (str or None) – Directory for saving summaries.

  • task_name (str) – Name of task if summaries separated by task or “” if not separated.

Notes

The summaries are saved in the dataset derivatives/remodeling folder if no save_dir is provided.

Notes

  • “consolidated” means that the overall summary and summaries of individual files are in one summary file.

  • “individual” means that the summaries of individual files are in separate files.

  • “none” means that only the overall summary is produced.

static parse_operations(operation_list) list

Return a parsed a list of remodeler operations.

Parameters:

operation_list (list) – List of JSON remodeler operations.

Returns:

List of Python objects containing parsed remodeler operations.

Return type:

list

static prep_data(df) DataFrame

Make a copy and replace all n/a entries in the data frame by np.nan for processing.

Parameters:

df (DataFrame) – The DataFrame to be processed.

Returns:

A copy of the DataFrame with n/a entries replaced by np.nan.

Return type:

DataFrame

static post_proc_data(df) DataFrame

Replace all nan entries with ‘n/a’ for BIDS compliance.

Parameters:

df (DataFrame) – The DataFrame to be processed.

Returns:

DataFrame with the ‘np.nan replaced by ‘n/a’.

Return type:

pd.DataFrame

static errors_to_str(messages, title='', sep='\n') str

Return an error string representing error messages in a list.

Parameters:
  • messages (list of dict) – List of error dictionaries each representing a single error.

  • title (str) – If provided the title is concatenated at the top.

  • sep (str) – Character used between lines in concatenation.

Returns:

Single string representing the messages.

Return type:

str

static get_schema(hed_versions) HedSchema | HedSchemaGroup | None

Return the schema objects represented by the hed_versions.

Parameters:

hed_versions (str, list, HedSchema, HedSchemaGroup) – If str, interpreted as a version number.

Returns:

Objects loaded from the hed_versions specification.

Return type:

Union[HedSchema, HedSchemaGroup, None]

BackupManager

Manages dataset backups before and during remodeling operations.

class remodel.backup_manager.BackupManager(data_root, backups_root=None)

Bases: object

Manager for file backups for remodeling tools.

DEFAULT_BACKUP_NAME = 'default_back'
RELATIVE_BACKUP_LOCATION = './derivatives/remodel/backups'
BACKUP_DICTIONARY = 'backup_lock.json'
BACKUP_ROOT = 'backup_root'
__init__(data_root, backups_root=None)

Constructor for the backup manager.

Parameters:
  • data_root (str) – Full path of the root of the data directory.

  • backups_root (str or None) – Full path to the root where backups subdirectory is located.

Raises:

HedFileError – If the data_root does not correspond to a real directory.

Notes: The backup_root will have remodeling/backups appended.

create_backup(file_list, backup_name=None, verbose=False) bool

Create a new backup from file_list.

Parameters:
  • file_list (list) – Full paths of the files to be in the backup.

  • backup_name (str or None) – Name of the backup. If None, uses the default

  • verbose (bool) – If True, print out the files that are being backed up.

Returns:

True if the backup was successful. False if a backup of that name already exists.

Return type:

bool

Raises:
  • HedFileError – For missing or incorrect files.

  • OS-related error – OS-related error when file copying occurs.

get_backup(backup_name) dict | None

Return the dictionary corresponding to backup_name.

Parameters:

backup_name (str) – Name of the backup to be retrieved.

Returns:

The dictionary with the backup info.

Return type:

Union[dict, None]

Notes

The dictionary with backup information has keys that are the paths of the backed up files relative to the backup root. The values in this dictionary are the dates on which the particular file was backed up.

get_backup_files(backup_name, original_paths=False) list

Returns a list of full paths of files contained in the backup.

Parameters:
  • backup_name (str) – Name of the backup.

  • original_paths (bool) – If True return the original paths.

Returns:

Full paths of the original files backed (original_paths=True) or the paths in the backup.

Return type:

list

Raises:

HedFileError – If not backup named backup_name exists.

get_backup_path(backup_name, file_name) str

Retrieve the file from the backup or throw an error.

Parameters:
  • backup_name (str) – Name of the backup.

  • file_name (str) – Full path of the file to be retrieved.

Returns:

Full path of the corresponding file in the backup.

Return type:

str

get_file_key(file_name)
restore_backup(backup_name='default_back', task_names=None, verbose=True)

Restore the files from backup_name to the main directory.

Parameters:
  • backup_name (str) – Name of the backup to restore.

  • task_names (list or None) – A list of task names to restore. If None, defaults to empty list (all tasks).

  • verbose (bool) – If True, print out the file names being restored.

static get_task(task_names, file_path) str

Return the task if the file name contains a task_xxx where xxx is in task_names.

Parameters:
  • task_names (list) – List of task names (without the task_ prefix).

  • file_path (str) – Path of the filename to be tested.

Returns:

the task name or ‘’ if there is no task_xxx or xxx is not in task_names.

Return type:

str

make_backup(task, backup_name=None, verbose=False) bool

Make a backup copy the files in the task file list.

Parameters:
  • task (dict) – Dictionary representing the remodeling task.

  • backup_name (str or None) – Name of the backup. If None, uses the default

  • verbose (bool) – If True, print out the files that are being backed up.

Returns:

True if the backup was successful. False if a backup of that name already exists.

Return type:

bool

Raises:
  • HedFileError – For missing or incorrect files.

  • OS-related error – OS-related error when file copying occurs.

RemodelerValidator

Validates remodeling operation specifications against JSON schema.

class remodel.remodeler_validator.RemodelerValidator

Bases: object

Validator for remodeler input files.

MESSAGE_STRINGS = {'0': {'minItems': 'There are no operations defined. Specify at least 1 operation for the remodeler to execute.', 'type': 'Operations must be contained in a list or array. This is also true for a single operation.'}, '1': {'additionalProperties': "Operation dictionary {operation_index} contains an unexpected field '{added_property}'. Every operation dictionary must specify the type of operation, a description, and the operation parameters.", 'required': "Operation dictionary {operation_index} is missing '{missing_value}'. Every operation dictionary must specify the type of operation, a description, and the operation parameters.", 'type': 'Each operation must be defined in a dictionary: {instance} is not a dictionary object.'}, '2': {'additionalProperties': "Operation {operation_index}: Operation parameters for {operation_name} contain an unexpected field '{added_property}'.", 'dependentRequired': 'Operation {operation_index}: The parameter {missing_value} is missing: {missing_value} is a required parameter of {operation_name} when {dependent_on} is specified.', 'enum': '{instance} is not a known remodeler operation. See the documentation for valid operations.', 'required': 'Operation {operation_index}: The parameter {missing_value} is missing. {missing_value} is a required parameter of {operation_name}.', 'type': 'Operation {operation_index}: {instance} is not a {validator_value}. {operation_field} should be of type {validator_value}.'}, 'more': {'additionalProperties': "Operation {operation_index}: Operation parameters for {parameter_path} contain an unexpected field '{added_property}'.", 'enum': 'Operation {operation_index}: Operation parameter {parameter_path} in the {operation_name} operation contains and unexpected value. Value should be one of {validator_value}.', 'minItems': 'Operation {operation_index}: The list in {parameter_path} in the {operation_name} operation should have at least {validator_value} item(s).', 'minProperties': 'Operation {operation_index}: The dictionary in {parameter_path} in the {operation_name} operation should have at least {validator_value} key(s).', 'required': 'Operation {operation_index}: The field {missing_value} is missing in {parameter_path}. {missing_value} is a required parameter of {parameter_path}.', 'type': 'Operation {operation_index}: The value of {parameter_path} in the {operation_name} operation should be {validator_value}. {instance} is not a {validator_value}.', 'uniqueItems': 'Operation {operation_index}: The list in {parameter_path} in the {operation_name} operation should only contain unique items.'}}
BASE_ARRAY = {'items': {}, 'minItems': 1, 'type': 'array'}
OPERATION_DICT = {'additionalProperties': False, 'allOf': [], 'properties': {'description': {'type': 'string'}, 'operation': {'default': 'convert_columns', 'enum': [], 'type': 'string'}, 'parameters': {'properties': {}, 'type': 'object'}}, 'required': ['operation', 'description', 'parameters'], 'type': 'object'}
PARAMETER_SPECIFICATION_TEMPLATE = {'if': {'properties': {'operation': {'const': ''}}, 'required': ['operation']}, 'then': {'properties': {'parameters': {}}}}
__init__()

Constructor for remodeler Validator.

validate(operations) list[str]

Validate remodeler operations against the json schema specification and specific op requirements.

Parameters:

operations (list[dict]) – List of di with input operations to run through the remodeler.

Returns:

List with the error messages for errors identified by the validator.

Return type:

list[str]