Operations

Remodeling and analysis operations for transforming tabular data.

Base classes

All operations inherit from these base classes.

BaseOp

class remodel.operations.base_op.BaseOp(parameters)

Bases: ABC

Base class for operations. All remodeling operations should extend this class.

__init__(parameters)

Constructor for the BaseOp class. Should be extended by operations.

Parameters:

parameters (dict) – A dictionary specifying the appropriate parameters for the operation.

abstract property NAME
abstract property PARAMS
abstract do_op(dispatcher, df, name, sidecar=None)

Base class method to be overridden by each operation.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The tabular file to be remodeled.

  • name (str) – Unique identifier for the data – often the original file path.

  • sidecar (Sidecar or file-like) – A JSON sidecar needed for HED operations.

abstract static validate_input_data(parameters)

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Example: A check to see whether two input arrays are the same length.

Notes: The minimum implementation should return an empty list to indicate no errors were found.

If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.

BaseSummary

class remodel.operations.base_summary.BaseSummary(sum_op)

Bases: ABC

Abstract base class for summary contents. Should not be instantiated.

Parameters:

sum_op (BaseOp) – Operation corresponding to this summary.

DISPLAY_INDENT = '   '
INDIVIDUAL_SUMMARIES_PATH = 'individual_summaries'
__init__(sum_op)
get_summary_details(include_individual=True) dict

Return a dictionary with the details for individual files and the overall dataset.

Parameters:

include_individual (bool) – If True, summaries for individual files are included.

Returns:

A dictionary with ‘Dataset’ and ‘Individual files’ keys.

Return type:

dict

Notes

  • The ‘Dataset’ value is either a string or a dictionary with the overall summary.

  • The ‘Individual files’ value is dictionary whose keys are file names and values are

    their corresponding summaries.

Users are expected to provide merge_all_info and get_details_dict functions to support this.

get_summary(individual_summaries='separate')

Return a summary dictionary with the information.

Parameters:

individual_summaries (str) – “separate”, “consolidated”, or “none”

Returns:

Dictionary with “Dataset” and “Individual files” keys.

Return type:

dict

Notes: The individual_summaries value is processed as follows:
  • “separate” individual summaries are to be in separate files.

  • “consolidated” means that the individual summaries are in same file as overall summary.

  • “none” means that only the overall summary is produced.

get_individual(summary_details, separately=True)

Return a dictionary of the individual file summaries.

Parameters:
  • summary_details (dict) – Dictionary of the individual file summaries.

  • separately (bool) – If True (the default), each individual summary has a header for separate output.

get_text_summary_details(include_individual=True) dict

Return a text summary of the information represented by this summary.

Parameters:

include_individual (bool) – If True (the default), individual summaries are in “Individual files”.

Returns:

Dictionary with “Dataset” and “Individual files” keys.

Return type:

dict

get_text_summary(individual_summaries='separate') dict

Return a complete text summary by assembling the individual pieces.

Parameters:

individual_summaries (str) – One of the values “separate”, “consolidated”, or “none”.

Returns:

Complete text summary.

Return type:

dict

Notes: The options are:
  • “none”: Just has “Dataset” key.

  • “consolidated” Has “Dataset” and “Individual files” keys with the values of each is a string.

  • “separate” Has “Dataset” and “Individual files” keys. The values of “Individual files” is a dict.

save(save_dir, file_formats=None, individual_summaries='separate', task_name='')

Save the summaries using the format indicated.

Parameters:
  • save_dir (str) – Name of the directory to save the summaries in.

  • file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.txt’].

  • individual_summaries (str) – Save one file or multiple files based on setting.

  • task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

save_visualizations(save_dir, file_formats=None, individual_summaries='separate', task_name='')

Save summary visualizations, if any, using the format indicated.

Parameters:
  • save_dir (str) – Name of the directory to save the summaries in.

  • file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.svg’].

  • individual_summaries (str) – Save one file or multiple files based on setting.

  • task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

static dump_summary(filename, summary)
abstract get_details_dict(summary_info)

Return the summary-specific information.

Parameters:

summary_info (object) – Summary to return info from.

Returns:

dictionary with the results.

Return type:

dict

Notes

Abstract method be implemented by each individual summary.

Notes

The expected return value is a dictionary of the form:

{“Name”: “”, “Total events”: 0, “Total files”: 0, “Files”: [], “Specifics”: {}}”

abstract merge_all_info()

Return merged information.

Returns:

Consolidated summary of information.

Return type:

object

Notes

Abstract method be implemented by each individual summary.

abstract update_summary(summary_dict)

Method to update summary for a given tabular input.

Parameters:

summary_dict (dict)

Data transformation operations

Operations that modify or reorganize tabular data.

ConvertColumnsOp

class remodel.operations.convert_columns_op.ConvertColumnsOp(parameters)

Bases: BaseOp

Convert specified columns to have specified data type.

Required remodeling parameters:
  • column_names (list): The list of columns to convert.

  • convert_to (str): Name of type to convert to. (One of ‘str’, ‘int’, ‘float’, ‘fixed’.)

Optional remodeling parameters:
  • decimal_places (int): Number decimal places to keep (for fixed only).

Notes:

NAME = 'convert_columns'
PARAMS = {'additionalProperties': False, 'if': {'properties': {'convert_to': {'const': 'fixed'}}}, 'properties': {'column_names': {'description': 'List of names of the columns whose types are to be converted to the specified type.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'convert_to': {'description': 'Data type to convert the columns to.', 'enum': ['str', 'int', 'float', 'fixed'], 'type': 'string'}, 'decimal_places': {'description': 'The number of decimal points if converted to fixed.', 'type': 'integer'}}, 'required': ['column_names', 'convert_to'], 'then': {'required': ['decimal_places']}, 'type': 'object'}
__init__(parameters)

Constructor for the convert columns operation.

Parameters:

parameters (dict) – Parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None)

Convert the specified column to a specified type.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new DataFrame with the factor columns appended.

Return type:

DataFrame

static validate_input_data(operations)

Additional validation required of operation parameters not performed by JSON schema validator.

FactorColumnOp

class remodel.operations.factor_column_op.FactorColumnOp(parameters)

Bases: BaseOp

Append to tabular file columns of factors based on column values.

Required remodeling parameters:
  • column_name (str): The name of a column in the DataFrame to compute factors from.

Optional remodeling parameters
  • factor_names (list): Names to use as the factor columns.

  • factor_values (list): Values in the column column_name to create factors for.

Notes

  • If no factor_values are provided, factors are computed for each of the unique values in column_name column.

  • If factor_names are provided, then factor_values must also be provided and the two lists be the same size.

NAME = 'factor_column'
PARAMS = {'additionalProperties': False, 'dependentRequired': {'factor_names': ['factor_values']}, 'properties': {'column_name': {'description': 'Name of the column for which to create one-hot factors for unique values.', 'type': 'string'}, 'factor_names': {'description': 'Names of the resulting factor columns. If given must be same length as factor_values', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'factor_values': {'description': 'Specific unique column values to compute factors for (otherwise all unique values).', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name'], 'type': 'object'}
__init__(parameters)

Constructor for the factor column operation.

Parameters:

parameters (dict) – Parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Create factor columns based on values in a specified column.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new DataFrame with the factor columns appended.

Return type:

DataFrame

static validate_input_data(parameters)

Check that factor_names and factor_values have same length if given.

MergeConsecutiveOp

class remodel.operations.merge_consecutive_op.MergeConsecutiveOp(parameters)

Bases: BaseOp

Merge consecutive rows of a columnar file with same column value.

Required remodeling parameters:
  • column_name (str): name of column whose consecutive values are to be compared (the merge column).

  • event_code (str or int or float): the particular value in the match column to be merged.

  • set_durations (bool): If true, set the duration of the merged event to the extent of the merged events.

  • ignore_missing (bool): If true, missing match_columns are ignored.

Optional remodeling parameters:
  • match_columns (list): A list of columns whose values have to be matched for two events to be the same.

Notes

This operation is meant for time-based tabular files that have an onset column.

NAME = 'merge_consecutive'
PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'The name of the column to check for repeated consecutive codes.', 'type': 'string'}, 'event_code': {'description': 'The event code to match for duplicates.', 'type': ['string', 'number']}, 'ignore_missing': {'description': 'If true, missing match columns are ignored.', 'type': 'boolean'}, 'match_columns': {'description': 'List of columns whose values must also match to be considered a repeat.', 'items': {'type': 'string'}, 'type': 'array'}, 'set_durations': {'description': 'If true, then the duration should be computed based on start of first to end of last.', 'type': 'boolean'}}, 'required': ['column_name', 'event_code', 'set_durations', 'ignore_missing'], 'type': 'object'}
__init__(parameters)

Constructor for the merge consecutive operation.

Parameters:

parameters (dict) – Actual values of the parameters for the operation.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Merge consecutive rows with the same column value.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError

  • If dataframe does not have the anchor column and ignore_missing is False.

  • If a match column is missing and ignore_missing is False.

  • If the durations were to be set and the dataframe did not have an onset column.

  • If the durations were to be set and the dataframe did not have a duration column.

static validate_input_data(parameters)

Verify that the column name is not in match columns.

Parameters:

parameters (dict) – Dictionary of parameters of actual implementation.

NumberGroupsOp

class remodel.operations.number_groups_op.NumberGroupsOp(parameters)

Bases: BaseOp

Implementation in progress.

NAME = 'number_groups'
PARAMS = {'additionalProperties': False, 'properties': {'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}, 'source_column': {'type': 'string'}, 'start': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}, 'stop': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}}, 'required': ['number_column_name', 'source_column', 'start', 'stop'], 'type': 'object'}
__init__(parameters)

Constructor for the BaseOp class. Should be extended by operations.

Parameters:

parameters (dict) – A dictionary specifying the appropriate parameters for the operation.

do_op(dispatcher, df, name, sidecar=None)

Add numbers to groups of events in dataframe.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

NumberRowsOp

class remodel.operations.number_rows_op.NumberRowsOp(parameters)

Bases: BaseOp

Implementation in progress.

NAME = 'number_rows'
PARAMS = {'additionalProperties': False, 'properties': {'match_value': {'additionalProperties': False, 'properties': {'column': {'type': 'string'}, 'value': {'type': ['string', 'number']}}, 'required': ['column', 'value'], 'type': 'object'}, 'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}}, 'required': ['number_column_name'], 'type': 'object'}
__init__(parameters)

Constructor for the BaseOp class. Should be extended by operations.

Parameters:

parameters (dict) – A dictionary specifying the appropriate parameters for the operation.

do_op(dispatcher, df, name, sidecar=None)

Add numbers events dataframe.

Parameters:
  • dispatcher (Dispatcher) – Manages operation I/O.

  • df (DataFrame) –

    • The DataFrame to be remodeled.

  • name (str) –

    • Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

RemapColumnsOp

class remodel.operations.remap_columns_op.RemapColumnsOp(parameters)

Bases: BaseOp

Map values in m columns in a columnar file into a new combinations in n columns.

Required remodeling parameters:
  • source_columns (list): The key columns to map (m key columns).

  • destination_columns (list): The destination columns to have the mapped values (n destination columns).

  • map_list (list): A list of lists with the mapping.

  • ignore_missing (bool): If True, entries whose key column values are not in map_list are ignored.

Optional remodeling parameters:

integer_sources (list): Source columns that should be treated as integers rather than strings.

Notes

Each list element list is of length m + n with the key columns followed by mapped columns.

TODO: Allow wildcards

NAME = 'remap_columns'
PARAMS = {'additionalProperties': False, 'properties': {'destination_columns': {'description': 'The columns to insert new values based on a key lookup of the source columns.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}, 'ignore_missing': {'description': 'If true, insert missing source columns in the result, filled with n/a, else error.', 'type': 'boolean'}, 'integer_sources': {'description': 'A list of source column names whose values are to be treated as integers.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'map_list': {'description': 'An array of k lists each with m+n entries corresponding to the k unique keys.', 'items': {'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'source_columns': {'description': 'The columns whose values are combined to provide the remap keys.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}}, 'required': ['source_columns', 'destination_columns', 'map_list', 'ignore_missing'], 'type': 'object'}
__init__(parameters)

Constructor for the remap columns operation.

Parameters:

parameters (dict) – Parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Remap new columns from combinations of others.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError

  • If ignore_missing is False and source values from the data are not in the map.

static validate_input_data(parameters)

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Example: A check to see whether two input arrays are the same length.

Notes: The minimum implementation should return an empty list to indicate no errors were found.

If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.

RemoveColumnsOp

class remodel.operations.remove_columns_op.RemoveColumnsOp(parameters)

Bases: BaseOp

Remove columns from a columnar file.

Required remodeling parameters:
  • column_names (list): The names of the columns to be removed.

  • ignore_missing (boolean): If True, names in column_names that are not columns in df should be ignored.

NAME = 'remove_columns'
PARAMS = {'additionalProperties': False, 'properties': {'column_names': {'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'type': 'boolean'}}, 'required': ['column_names', 'ignore_missing'], 'type': 'object'}
__init__(parameters)

Constructor for remove columns operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Remove indicated columns from a dataframe.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

pd.DataFrame

Raises:

KeyError

  • If ignore_missing is False and a column not in the data is to be removed.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

RemoveRowsOp

class remodel.operations.remove_rows_op.RemoveRowsOp(parameters)

Bases: BaseOp

Remove rows from a columnar file based on the values in a specified row.

Required remodeling parameters:
  • column_name (str): The name of column to be tested.

  • remove_values (list): The values to test for row removal.

NAME = 'remove_rows'
PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'Name of the key column to determine which rows to remove.', 'type': 'string'}, 'remove_values': {'description': 'List of key values for rows to remove.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name', 'remove_values'], 'type': 'object'}
__init__(parameters)

Constructor for remove rows operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Remove rows with the values indicated in the column.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

RenameColumnsOp

class remodel.operations.rename_columns_op.RenameColumnsOp(parameters)

Bases: BaseOp

Rename columns in a tabular file.

Required remodeling parameters:
  • column_mapping (dict): The names of the columns to be renamed with values to be remapped to.

  • ignore_missing (bool): If true, the names in column_mapping that are not columns and should be ignored.

NAME = 'rename_columns'
PARAMS = {'additionalProperties': False, 'properties': {'column_mapping': {'description': 'Mapping between original column names and their respective new names.', 'minProperties': 1, 'patternProperties': {'.*': {'type': 'string'}}, 'type': 'object'}, 'ignore_missing': {'description': "If true ignore column_mapping keys that don't correspond to columns, otherwise error.", 'type': 'boolean'}}, 'required': ['column_mapping', 'ignore_missing'], 'type': 'object'}
__init__(parameters)

Constructor for rename columns operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters

do_op(dispatcher, df, name, sidecar=None) DataFrame

Rename columns as specified in column_mapping dictionary.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

pd.Dataframe

Raises:

KeyError – When ignore_missing is False and column_mapping has columns not in the data.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

ReorderColumnsOp

class remodel.operations.reorder_columns_op.ReorderColumnsOp(parameters)

Bases: BaseOp

Reorder columns in a columnar file.

Required parameters:
  • column_order (list): The names of the columns to be reordered.

  • ignore_missing (bool): If False and a column in column_order is not in df, skip the column.

  • keep_others (bool): If True, columns not in column_order are placed at end.

NAME = 'reorder_columns'
PARAMS = {'additionalProperties': False, 'properties': {'column_order': {'description': 'A list of column names in the order you wish them to be.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'description': "If true, ignore column_order columns that aren't in file, otherwise error.", 'type': 'boolean'}, 'keep_others': {'description': 'If true columns not in column_order are placed at end, otherwise ignored.', 'type': 'boolean'}}, 'required': ['column_order', 'ignore_missing', 'keep_others'], 'type': 'object'}
__init__(parameters)

Constructor for reorder columns operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Reorder columns as specified in event dictionary.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError – When ignore_missing is false and column_order has columns not in the data.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SplitRowsOp

class remodel.operations.split_rows_op.SplitRowsOp(parameters)

Bases: BaseOp

Split rows in a columnar file with onset and duration columns into multiple rows based on a specified column.

Required remodeling parameters:
  • anchor_column (str): The column in which the names of new items are stored.

  • new_events (dict): Mapping of new values based on values in the original row.

  • remove_parent_row (bool): If true, the original row that was split is removed.

Notes

  • In specifying onset and duration for the new row, you can give values or the names of columns as strings.

NAME = 'split_rows'
PARAMS = {'additionalProperties': False, 'properties': {'anchor_column': {'description': 'The column containing the keys for the new rows. (Original rows will have own keys.)', 'type': 'string'}, 'new_events': {'description': 'A map describing how the rows for the new codes will be created.', 'minProperties': 1, 'patternProperties': {'.*': {'additionalProperties': False, 'properties': {'copy_columns': {'description': 'List of columns whose values to copy for the new row.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'duration': {'description': 'List of items to add to compute the duration of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'onset_source': {'description': 'List of items to add to compute the onset time of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}}, 'required': ['onset_source', 'duration'], 'type': 'object'}}, 'type': 'object'}, 'remove_parent_row': {'description': 'If true, the row from which these rows were split is removed, otherwise it stays.', 'type': 'boolean'}}, 'required': ['anchor_column', 'new_events', 'remove_parent_row'], 'type': 'object'}
__init__(parameters)

Constructor for the split rows operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Split a row representing a particular event into multiple rows.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

TypeError – If bad onset or duration.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

HED-Specific Operations

Operations for working with HED-annotated data.

FactorHedTagsOp

class remodel.operations.factor_hed_tags_op.FactorHedTagsOp(parameters)

Bases: BaseOp

Append columns of factors based on column values to a columnar file.

Required remodeling parameters:
  • queries (list): Queries to be applied successively as filters.

Optional remodeling parameters:
  • expand_context (bool): Expand the context if True.

  • query_names (list): Column names for the query factors.

  • remove_types (list): Structural HED tags to be removed (such as Condition-variable or Task).

  • expand_context (bool): If true, expand the context based on Onset, Offset, and Duration.

Notes

  • If query names are not provided, query1, query2, … are used.

  • If query names are provided, the list must have same list as the number of queries.

  • When the context is expanded, the effect of events for temporal extent is accounted for.

NAME = 'factor_hed_tags'
PARAMS = {'additionalProperties': False, 'properties': {'expand_context': {'description': 'If true, the assembled HED tags include the effects of temporal extent (e.g., Onset).', 'type': 'boolean'}, 'queries': {'description': 'List of HED tag queries to compute one-hot factors for.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'query_names': {'description': 'Optional column names for the queries.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'remove_types': {'descriptions': 'List of type tags to remove from before querying (e.g., Condition-variable, Task).', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'replace_defs': {'description': 'If true, Def tags are replaced with definition contents.', 'type': 'boolean'}}, 'required': ['queries'], 'type': 'object'}
__init__(parameters)

Constructor for the factor HED tags operation.

Parameters:

parameters (dict) – Actual values of the parameters for the operation.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Create factor columns based on HED tag queries.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

DataFrame

Raises:

ValueError – If a name for a new query factor column is already a column.

static validate_input_data(parameters) list

Parse and valid the queries and return issues in parsing queries, if any.

Parameters:

parameters (dict) – Dictionary representing the actual operation values.

Returns:

List of issues in parsing queries.

Return type:

list

FactorHedTypeOp

class remodel.operations.factor_hed_type_op.FactorHedTypeOp(parameters)

Bases: BaseOp

Append to columnar file the factors computed from type variables.

Required remodeling parameters:
  • type_tag (str): HED tag used to find the factors (most commonly condition-variable).

Optional remodeling parameters:
  • type_values (list): If provided, specifies which factor values to include.

NAME = 'factor_hed_type'
PARAMS = {'additionalProperties': False, 'properties': {'type_tag': {'description': 'Type tag to use for computing factor vectors (e.g., Condition-variable or Task).', 'type': 'string'}, 'type_values': {'description': 'If provided, only compute one-hot factors for these values of the type tag.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['type_tag'], 'type': 'object'}
__init__(parameters)

Constructor for the factor HED type operation.

Parameters:

parameters (dict) – Actual values of the parameters for the operation.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Factor columns based on HED type and append to tabular data.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new DataFame with that includes the factors.

Return type:

DataFrame

Notes

  • If column_name is not a column in df, df is just returned.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeDefinitionsOp

class remodel.operations.summarize_definitions_op.SummarizeDefinitionsOp(parameters)

Bases: BaseOp

Summarize the definitions used in the dataset based on Def and Def-expand.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

Optional remodeling parameters:
  • append_timecode (bool): If False (default), the timecode is not appended to the summary filename.

The purpose is to produce a summary of the definitions used in a dataset.

NAME = 'summarize_definitions'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}
SUMMARY_TYPE = 'type_defs'
__init__(parameters)

Constructor for the summary of definitions used in the dataset.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Create summaries of definitions.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

a copy of df

Return type:

DataFrame

Side effect:

Updates the relevant summary.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedTagsOp

class remodel.operations.summarize_hed_tags_op.SummarizeHedTagsOp(parameters)

Bases: BaseOp

Summarize the HED tags in collection of tabular files.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

  • tags (dict): Specifies how to organize the tag output.

Optional remodeling parameters:
  • append_timecode (bool): If True, the timecode is appended to the base filename when summary is saved.

  • include_context (bool): If True, context of events is included in summary.

  • remove_types (list): A list of type tags such as Condition-variable or Task to exclude from summary.

  • replace_defs (bool): If True, the def tag is replaced by the contents of the definitions.

The purpose of this op is to produce a summary of the occurrences of HED tags organized in a specified manner.

Notes: The tags template is a dictionary whose keys are the organization titles (not necessarily tags) for the output and whose values are the tags, which if they or their children appear, they will be listed under that title.

NAME = 'summarize_hed_tags'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'include_context': {'description': 'If true, tags for events that unfold over time are counted at each intermediate time.', 'type': 'boolean'}, 'remove_types': {'description': 'A list of special tags such as Condition-variable whose influence is to be removed.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'replace_defs': {'description': 'If true, then the Def tags are replaced with actual definitions for the count.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'tags': {'description': 'A dictionary with the template for how output of tags should be organized.', 'patternProperties': {'.*': {'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'additionalProperties': False, 'minProperties': 1}, 'type': 'object'}}, 'required': ['summary_name', 'summary_filename', 'tags'], 'type': 'object'}
SUMMARY_TYPE = 'hed_tag_summary'
__init__(parameters)

Constructor for the summarize_hed_tags operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Summarize the HED tags present in the dataset.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the context.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedTypeOp

class remodel.operations.summarize_hed_type_op.SummarizeHedTypeOp(parameters)

Bases: BaseOp

Summarize a HED type tag in a collection of tabular files.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

  • type_tag (str):Type tag to get_summary (e.g. condition-variable or task tags).

Optional remodeling parameters:
  • append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.

The purpose of this op is to produce a summary of the occurrences of specified tag. This summary is often used with condition-variable to produce a summary of the experimental design.

NAME = 'summarize_hed_type'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'type_tag': {'description': 'Type tag (such as Condition-variable or Task to design summaries for..', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'type_tag'], 'type': 'object'}
SUMMARY_TYPE = 'hed_type_summary'
__init__(parameters)

Constructor for the summarize HED type operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Summarize a specified HED type variable such as Condition-variable.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be summarized.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Usually required unless event file has a HED column.

Returns:

A copy of df

Return type:

DataFrame

Side effect:

Updates the relevant summary.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedValidationOp

class remodel.operations.summarize_hed_validation_op.SummarizeHedValidationOp(parameters)

Bases: BaseOp

Validate the HED tags in a dataset and report errors.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

  • check_for_warnings (bool): If true include warnings as well as errors.

Optional remodeling parameters:
  • append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.

The purpose of this op is to produce a summary of the HED validation errors in a file.

NAME = 'summarize_hed_validation'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'check_for_warnings': {'description': 'If true warnings as well as errors are reported.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'check_for_warnings'], 'type': 'object'}
SUMMARY_TYPE = 'hed_validation'
__init__(parameters)

Constructor for the summarize HED validation operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Validate the dataframe with the accompanying sidecar, if any.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be validated.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Usually needed unless only HED tags in HED column of event file.

Returns:

A copy of df

Return type:

pd.DataFrame

Side effect:

Updates the relevant summary.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeSidecarFromEventsOp

class remodel.operations.summarize_sidecar_from_events_op.SummarizeSidecarFromEventsOp(parameters)

Bases: BaseOp

Create a JSON sidecar from column values in a collection of tabular files.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

Optional remodeling parameters:
  • append_timecode (bool):

  • skip_columns (list): Names of columns to skip in the summary.

  • value_columns (list): Names of columns to treat as value columns rather than categorical columns.

The purpose is to produce a JSON sidecar template for annotating a dataset with HED tags.

NAME = 'summarize_sidecar_from_events'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'type': 'boolean'}, 'skip_columns': {'description': 'List of columns to skip in generating the sidecar.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'List of columns to provide a single annotation with placeholder for the values.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}
SUMMARY_TYPE = 'events_to_sidecar'
__init__(parameters)

Constructor for summarize sidecar from events operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None)

Extract a sidecar from events file.

Parameters:
  • dispatcher (Dispatcher) – The dispatcher object for managing the operations.

  • df (DataFrame) – The tabular file to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the associated summary if applicable.

static validate_input_data(parameters)

Additional validation required of operation parameters not performed by JSON schema validator.

Analysis Operations

Operations for analyzing and summarizing tabular data.

SummarizeColumnNamesOp

class remodel.operations.summarize_column_names_op.SummarizeColumnNamesOp(parameters)

Bases: BaseOp

Summarize the column names in a collection of tabular files.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

Optional remodeling parameters:
  • append_timecode (bool): If False (default), the timecode is not appended to the summary filename.

The purpose is to check that all the tabular files have the same columns in same order.

NAME = 'summarize_column_names'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}
SUMMARY_TYPE = 'column_names'
__init__(parameters)

Constructor for summarize column names operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Create a column name summary for df.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the relevant summary.

static validate_input_data(parameters) list

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeColumnValuesOp

class remodel.operations.summarize_column_values_op.SummarizeColumnValuesOp(parameters)

Bases: BaseOp

Summarize the values in the columns of a columnar file.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

Optional remodeling parameters:
  • append_timecode (bool): (Optional: Default False) If True append timecodes to the summary filename.

  • max_categorical (int): Maximum number of unique values to include in summary for a categorical column.

  • skip_columns (list): Names of columns to skip in the summary.

  • value_columns (list): Names of columns to treat as value columns rather than categorical columns.

  • values_per_line (int): The number of values output per line in the summary.

The purpose is to produce a summary of the values in a tabular file.

NAME = 'summarize_column_values'
PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'max_categorical': {'description': 'Maximum number of unique column values to show in text description.', 'type': 'integer'}, 'skip_columns': {'description': 'List of columns to skip when creating the summary.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'Columns to be annotated with a single HED annotation and placeholder.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'values_per_line': {'description': 'Number of items per line to display in the text file.', 'type': 'integer'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}
SUMMARY_TYPE = 'column_values'
VALUES_PER_LINE = 5
MAX_CATEGORICAL = 50
__init__(parameters)

Constructor for the summarize column values operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

do_op(dispatcher, df, name, sidecar=None) DataFrame

Create a summary of the column values in df.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the relevant summary.

static validate_input_data(parameters) list

Additional validation required of operation parameters not performed by JSON schema validator.

Operation registry

The valid_operations module maintains a registry of all available operations.

remodel.operations.valid_operations.valid_operations = {operation_name: OperationClass}

Dictionary mapping operation names to their implementation classes. Each key is a string operation name used in JSON specifications, and each value is the corresponding operation class.