Operations¶

Remodeling and analysis operations for transforming tabular data.

Base classes¶

All operations inherit from these base classes.

BaseOp¶

class BaseOp(parameters)[source]¶

Bases: ABC

Base class for operations. All remodeling operations should extend this class.

abstract property NAME¶

abstract property PARAMS¶

abstractmethod do_op(dispatcher, df, name, sidecar=None)[source]¶

Base class method to be overridden by each operation.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The tabular file to be remodeled.
name (str) – Unique identifier for the data – often the original file path.
sidecar (Sidecar or file-like) – A JSON sidecar needed for HED operations.

abstractmethod static validate_input_data(parameters)[source]¶

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Example: A check to see whether two input arrays are the same length.

Notes: The minimum implementation should return an empty list to indicate no errors were found.: If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.

BaseSummary¶

class BaseSummary(sum_op)[source]¶

Bases: ABC

Abstract base class for summary contents. Should not be instantiated.

Parameters:: sum_op (BaseOp) – Operation corresponding to this summary.

DISPLAY_INDENT = ' '¶

INDIVIDUAL_SUMMARIES_PATH = 'individual_summaries'¶

static dump_summary(filename, summary)[source]¶

abstractmethod get_details_dict(summary_info)[source]¶

Return the summary-specific information.

Parameters:: summary_info (object) – Summary to return info from.
Returns:: dictionary with the results.
Return type:: dict

Notes

Abstract method be implemented by each individual summary.

Notes

The expected return value is a dictionary of the form:

{“Name”: “”, “Total events”: 0, “Total files”: 0, “Files”: [], “Specifics”: {}}”

get_individual(summary_details, separately=True)[source]¶

Return a dictionary of the individual file summaries.

Parameters:

summary_details (dict) – Dictionary of the individual file summaries.
separately (bool) – If True (the default), each individual summary has a header for separate output.

get_summary(individual_summaries='separate')[source]¶

Return a summary dictionary with the information.

Parameters:: individual_summaries (str) – “separate”, “consolidated”, or “none”
Returns:: Dictionary with “Dataset” and “Individual files” keys.
Return type:: dict

Notes: The individual_summaries value is processed as follows:

“separate” individual summaries are to be in separate files.
“consolidated” means that the individual summaries are in same file as overall summary.
“none” means that only the overall summary is produced.

get_summary_details(include_individual=True) → dict[source]¶

Return a dictionary with the details for individual files and the overall dataset.

Parameters:: include_individual (bool) – If True, summaries for individual files are included.
Returns:: A dictionary with ‘Dataset’ and ‘Individual files’ keys.
Return type:: dict

Notes

The ‘Dataset’ value is either a string or a dictionary with the overall summary.
The ‘Individual files’ value is dictionary whose keys are file names and values are
their corresponding summaries.

Users are expected to provide merge_all_info and get_details_dict functions to support this.

get_text_summary(individual_summaries='separate') → dict[source]¶

Return a complete text summary by assembling the individual pieces.

Parameters:: individual_summaries (str) – One of the values “separate”, “consolidated”, or “none”.
Returns:: Complete text summary.
Return type:: dict

Notes: The options are:

“none”: Just has “Dataset” key.
“consolidated” Has “Dataset” and “Individual files” keys with the values of each is a string.
“separate” Has “Dataset” and “Individual files” keys. The values of “Individual files” is a dict.

get_text_summary_details(include_individual=True) → dict[source]¶

Return a text summary of the information represented by this summary.

Parameters:: include_individual (bool) – If True (the default), individual summaries are in “Individual files”.
Returns:: Dictionary with “Dataset” and “Individual files” keys.
Return type:: dict

abstractmethod merge_all_info()[source]¶

Return merged information.

Returns:: Consolidated summary of information.
Return type:: object

Notes

Abstract method be implemented by each individual summary.

save(save_dir, file_formats=None, individual_summaries='separate', task_name='')[source]¶

Save the summaries using the format indicated.

Parameters:

save_dir (str) – Name of the directory to save the summaries in.
file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.txt’].
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

save_visualizations(save_dir, file_formats=None, individual_summaries='separate', task_name='')[source]¶

Save summary visualizations, if any, using the format indicated.

Parameters:

save_dir (str) – Name of the directory to save the summaries in.
file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.svg’].
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

abstractmethod update_summary(summary_dict)[source]¶

Method to update summary for a given tabular input.

Parameters:: summary_dict (dict)

Data transformation operations¶

Operations that modify or reorganize tabular data.

ConvertColumnsOp¶

class ConvertColumnsOp(parameters)[source]¶

Bases: BaseOp

Convert specified columns to have specified data type.

Required remodeling parameters:

column_names (list): The list of columns to convert.
convert_to (str): Name of type to convert to. (One of ‘str’, ‘int’, ‘float’, ‘fixed’.)

Optional remodeling parameters:

decimal_places (int): Number decimal places to keep (for fixed only).

Notes:

NAME = 'convert_columns'¶

PARAMS = {'additionalProperties': False, 'if': {'properties': {'convert_to': {'const': 'fixed'}}}, 'properties': {'column_names': {'description': 'List of names of the columns whose types are to be converted to the specified type.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'convert_to': {'description': 'Data type to convert the columns to.', 'enum': ['str', 'int', 'float', 'fixed'], 'type': 'string'}, 'decimal_places': {'description': 'The number of decimal points if converted to fixed.', 'type': 'integer'}}, 'required': ['column_names', 'convert_to'], 'then': {'required': ['decimal_places']}, 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None)[source]¶

Convert the specified column to a specified type.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new DataFrame with the factor columns appended.

Return type:

DataFrame

static validate_input_data(operations)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

FactorColumnOp¶

class FactorColumnOp(parameters)[source]¶

Bases: BaseOp

Append to tabular file columns of factors based on column values.

Required remodeling parameters:

column_name (str): The name of a column in the DataFrame to compute factors from.

Optional remodeling parameters

factor_names (list): Names to use as the factor columns.
factor_values (list): Values in the column column_name to create factors for.

Notes

If no factor_values are provided, factors are computed for each of the unique values in column_name column.
If factor_names are provided, then factor_values must also be provided and the two lists be the same size.

NAME = 'factor_column'¶

PARAMS = {'additionalProperties': False, 'dependentRequired': {'factor_names': ['factor_values']}, 'properties': {'column_name': {'description': 'Name of the column for which to create one-hot factors for unique values.', 'type': 'string'}, 'factor_names': {'description': 'Names of the resulting factor columns. If given must be same length as factor_values', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'factor_values': {'description': 'Specific unique column values to compute factors for (otherwise all unique values).', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Create factor columns based on values in a specified column.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new DataFrame with the factor columns appended.

Return type:

DataFrame

static validate_input_data(parameters)[source]¶: Check that factor_names and factor_values have same length if given.

MergeConsecutiveOp¶

class MergeConsecutiveOp(parameters)[source]¶

Bases: BaseOp

Merge consecutive rows of a columnar file with same column value.

Required remodeling parameters:

column_name (str): name of column whose consecutive values are to be compared (the merge column).
event_code (str or int or float): the particular value in the match column to be merged.
set_durations (bool): If true, set the duration of the merged event to the extent of the merged events.
ignore_missing (bool): If true, missing match_columns are ignored.

Optional remodeling parameters:

match_columns (list): A list of columns whose values have to be matched for two events to be the same.

Notes

This operation is meant for time-based tabular files that have an onset column.

NAME = 'merge_consecutive'¶

PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'The name of the column to check for repeated consecutive codes.', 'type': 'string'}, 'event_code': {'description': 'The event code to match for duplicates.', 'type': ['string', 'number']}, 'ignore_missing': {'description': 'If true, missing match columns are ignored.', 'type': 'boolean'}, 'match_columns': {'description': 'List of columns whose values must also match to be considered a repeat.', 'items': {'type': 'string'}, 'type': 'array'}, 'set_durations': {'description': 'If true, then the duration should be computed based on start of first to end of last.', 'type': 'boolean'}}, 'required': ['column_name', 'event_code', 'set_durations', 'ignore_missing'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Merge consecutive rows with the same column value.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError –

If dataframe does not have the anchor column and ignore_missing is False.
If a match column is missing and ignore_missing is False.
If the durations were to be set and the dataframe did not have an onset column.
If the durations were to be set and the dataframe did not have a duration column.

static validate_input_data(parameters)[source]¶

Verify that the column name is not in match columns.

Parameters:: parameters (dict) – Dictionary of parameters of actual implementation.

NumberGroupsOp¶

class NumberGroupsOp(parameters)[source]¶

Bases: BaseOp

Implementation in progress.

NAME = 'number_groups'¶

PARAMS = {'additionalProperties': False, 'properties': {'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}, 'source_column': {'type': 'string'}, 'start': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}, 'stop': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}}, 'required': ['number_column_name', 'source_column', 'start', 'stop'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None)[source]¶

Add numbers to groups of events in dataframe.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

NumberRowsOp¶

class NumberRowsOp(parameters)[source]¶

Bases: BaseOp

Implementation in progress.

NAME = 'number_rows'¶

PARAMS = {'additionalProperties': False, 'properties': {'match_value': {'additionalProperties': False, 'properties': {'column': {'type': 'string'}, 'value': {'type': ['string', 'number']}}, 'required': ['column', 'value'], 'type': 'object'}, 'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}}, 'required': ['number_column_name'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None)[source]¶

Add numbers events dataframe.

Parameters:

dispatcher (Dispatcher) – Manages operation I/O.
df (DataFrame) –
- The DataFrame to be remodeled.
name (str) –
- Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

RemapColumnsOp¶

class RemapColumnsOp(parameters)[source]¶

Bases: BaseOp

Map values in m columns in a columnar file into a new combinations in n columns.

Required remodeling parameters:

source_columns (list): The key columns to map (m key columns).
destination_columns (list): The destination columns to have the mapped values (n destination columns).
map_list (list): A list of lists with the mapping.
ignore_missing (bool): If True, entries whose key column values are not in map_list are ignored.

Optional remodeling parameters:

integer_sources (list): Source columns that should be treated as integers rather than strings.

Notes

Each list element list is of length m + n with the key columns followed by mapped columns.

TODO: Allow wildcards

NAME = 'remap_columns'¶

PARAMS = {'additionalProperties': False, 'properties': {'destination_columns': {'description': 'The columns to insert new values based on a key lookup of the source columns.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}, 'ignore_missing': {'description': 'If true, insert missing source columns in the result, filled with n/a, else error.', 'type': 'boolean'}, 'integer_sources': {'description': 'A list of source column names whose values are to be treated as integers.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'map_list': {'description': 'An array of k lists each with m+n entries corresponding to the k unique keys.', 'items': {'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'source_columns': {'description': 'The columns whose values are combined to provide the remap keys.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}}, 'required': ['source_columns', 'destination_columns', 'map_list', 'ignore_missing'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Remap new columns from combinations of others.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError –

If ignore_missing is False and source values from the data are not in the map.

static validate_input_data(parameters)[source]¶

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Example: A check to see whether two input arrays are the same length.

Notes: The minimum implementation should return an empty list to indicate no errors were found.: If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.

RemoveColumnsOp¶

class RemoveColumnsOp(parameters)[source]¶

Bases: BaseOp

Remove columns from a columnar file.

Required remodeling parameters:

column_names (list): The names of the columns to be removed.
ignore_missing (boolean): If True, names in column_names that are not columns in df should be ignored.

NAME = 'remove_columns'¶

PARAMS = {'additionalProperties': False, 'properties': {'column_names': {'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'type': 'boolean'}}, 'required': ['column_names', 'ignore_missing'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Remove indicated columns from a dataframe.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

pd.DataFrame

Raises:

KeyError –

If ignore_missing is False and a column not in the data is to be removed.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

RemoveRowsOp¶

class RemoveRowsOp(parameters)[source]¶

Bases: BaseOp

Remove rows from a columnar file based on the values in a specified row.

Required remodeling parameters:

column_name (str): The name of column to be tested.
remove_values (list): The values to test for row removal.

NAME = 'remove_rows'¶

PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'Name of the key column to determine which rows to remove.', 'type': 'string'}, 'remove_values': {'description': 'List of key values for rows to remove.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name', 'remove_values'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Remove rows with the values indicated in the column.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

RenameColumnsOp¶

class RenameColumnsOp(parameters)[source]¶

Bases: BaseOp

Rename columns in a tabular file.

Required remodeling parameters:

column_mapping (dict): The names of the columns to be renamed with values to be remapped to.
ignore_missing (bool): If true, the names in column_mapping that are not columns and should be ignored.

NAME = 'rename_columns'¶

PARAMS = {'additionalProperties': False, 'properties': {'column_mapping': {'description': 'Mapping between original column names and their respective new names.', 'minProperties': 1, 'patternProperties': {'.*': {'type': 'string'}}, 'type': 'object'}, 'ignore_missing': {'description': "If true ignore column_mapping keys that don't correspond to columns, otherwise error.", 'type': 'boolean'}}, 'required': ['column_mapping', 'ignore_missing'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Rename columns as specified in column_mapping dictionary.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

pd.Dataframe

Raises:

KeyError – When ignore_missing is False and column_mapping has columns not in the data.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

ReorderColumnsOp¶

class ReorderColumnsOp(parameters)[source]¶

Bases: BaseOp

Reorder columns in a columnar file.

Required parameters:

column_order (list): The names of the columns to be reordered.
ignore_missing (bool): If False and a column in column_order is not in df, skip the column.
keep_others (bool): If True, columns not in column_order are placed at end.

NAME = 'reorder_columns'¶

PARAMS = {'additionalProperties': False, 'properties': {'column_order': {'description': 'A list of column names in the order you wish them to be.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'description': "If true, ignore column_order columns that aren't in file, otherwise error.", 'type': 'boolean'}, 'keep_others': {'description': 'If true columns not in column_order are placed at end, otherwise ignored.', 'type': 'boolean'}}, 'required': ['column_order', 'ignore_missing', 'keep_others'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Reorder columns as specified in event dictionary.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError – When ignore_missing is false and column_order has columns not in the data.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SplitRowsOp¶

class SplitRowsOp(parameters)[source]¶

Bases: BaseOp

Split rows in a columnar file with onset and duration columns into multiple rows based on a specified column.

Required remodeling parameters:

anchor_column (str): The column in which the names of new items are stored.
new_events (dict): Mapping of new values based on values in the original row.
remove_parent_row (bool): If true, the original row that was split is removed.

Notes

In specifying onset and duration for the new row, you can give values or the names of columns as strings.

NAME = 'split_rows'¶

PARAMS = {'additionalProperties': False, 'properties': {'anchor_column': {'description': 'The column containing the keys for the new rows. (Original rows will have own keys.)', 'type': 'string'}, 'new_events': {'description': 'A map describing how the rows for the new codes will be created.', 'minProperties': 1, 'patternProperties': {'.*': {'additionalProperties': False, 'properties': {'copy_columns': {'description': 'List of columns whose values to copy for the new row.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'duration': {'description': 'List of items to add to compute the duration of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'onset_source': {'description': 'List of items to add to compute the onset time of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}}, 'required': ['onset_source', 'duration'], 'type': 'object'}}, 'type': 'object'}, 'remove_parent_row': {'description': 'If true, the row from which these rows were split is removed, otherwise it stays.', 'type': 'boolean'}}, 'required': ['anchor_column', 'new_events', 'remove_parent_row'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Split a row representing a particular event into multiple rows.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

TypeError – If bad onset or duration.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

HED-Specific Operations¶

Operations for working with HED-annotated data.

FactorHedTagsOp¶

class FactorHedTagsOp(parameters)[source]¶

Bases: BaseOp

Append columns of factors based on column values to a columnar file.

Required remodeling parameters:

queries (list): Queries to be applied successively as filters.

Optional remodeling parameters:

expand_context (bool): Expand the context if True.
query_names (list): Column names for the query factors.
remove_types (list): Structural HED tags to be removed (such as Condition-variable or Task).
expand_context (bool): If true, expand the context based on Onset, Offset, and Duration.

Notes

If query names are not provided, query1, query2, … are used.
If query names are provided, the list must have same list as the number of queries.
When the context is expanded, the effect of events for temporal extent is accounted for.

NAME = 'factor_hed_tags'¶

PARAMS = {'additionalProperties': False, 'properties': {'expand_context': {'description': 'If true, the assembled HED tags include the effects of temporal extent (e.g., Onset).', 'type': 'boolean'}, 'queries': {'description': 'List of HED tag queries to compute one-hot factors for.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'query_names': {'description': 'Optional column names for the queries.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'remove_types': {'descriptions': 'List of type tags to remove from before querying (e.g., Condition-variable, Task).', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'replace_defs': {'description': 'If true, Def tags are replaced with definition contents.', 'type': 'boolean'}}, 'required': ['queries'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Create factor columns based on HED tag queries.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new dataframe after processing.

Return type:

DataFrame

Raises:

ValueError – If a name for a new query factor column is already a column.

static validate_input_data(parameters) → list[source]¶

Parse and valid the queries and return issues in parsing queries, if any.

Parameters:: parameters (dict) – Dictionary representing the actual operation values.
Returns:: List of issues in parsing queries.
Return type:: list

FactorHedTypeOp¶

class FactorHedTypeOp(parameters)[source]¶

Bases: BaseOp

Append to columnar file the factors computed from type variables.

Required remodeling parameters:

type_tag (str): HED tag used to find the factors (most commonly condition-variable).

Optional remodeling parameters:

type_values (list): If provided, specifies which factor values to include.

NAME = 'factor_hed_type'¶

PARAMS = {'additionalProperties': False, 'properties': {'type_tag': {'description': 'Type tag to use for computing factor vectors (e.g., Condition-variable or Task).', 'type': 'string'}, 'type_values': {'description': 'If provided, only compute one-hot factors for these values of the type tag.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['type_tag'], 'type': 'object'}¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Factor columns based on HED type and append to tabular data.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A new DataFame with that includes the factors.

Return type:

DataFrame

Notes

If column_name is not a column in df, df is just returned.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeDefinitionsOp¶

class SummarizeDefinitionsOp(parameters)[source]¶

Bases: BaseOp

Summarize the definitions used in the dataset based on Def and Def-expand.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.

Optional remodeling parameters:

append_timecode (bool): If False (default), the timecode is not appended to the summary filename.

The purpose is to produce a summary of the definitions used in a dataset.

NAME = 'summarize_definitions'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶

SUMMARY_TYPE = 'type_defs'¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Create summaries of definitions.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

a copy of df

Return type:

DataFrame

Side effect:: Updates the relevant summary.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedTagsOp¶

class SummarizeHedTagsOp(parameters)[source]¶

Bases: BaseOp

Summarize the HED tags in collection of tabular files.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
tags (dict): Specifies how to organize the tag output.

Optional remodeling parameters:

append_timecode (bool): If True, the timecode is appended to the base filename when summary is saved.
include_context (bool): If True, context of events is included in summary.
remove_types (list): A list of type tags such as Condition-variable or Task to exclude from summary.
replace_defs (bool): If True, the def tag is replaced by the contents of the definitions.

The purpose of this op is to produce a summary of the occurrences of HED tags organized in a specified manner.

Notes: The tags template is a dictionary whose keys are the organization titles (not necessarily tags) for the output and whose values are the tags, which if they or their children appear, they will be listed under that title.

NAME = 'summarize_hed_tags'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'include_context': {'description': 'If true, tags for events that unfold over time are counted at each intermediate time.', 'type': 'boolean'}, 'remove_types': {'description': 'A list of special tags such as Condition-variable whose influence is to be removed.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'replace_defs': {'description': 'If true, then the Def tags are replaced with actual definitions for the count.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'tags': {'description': 'A dictionary with the template for how output of tags should be organized.', 'patternProperties': {'.*': {'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'additionalProperties': False, 'minProperties': 1}, 'type': 'object'}}, 'required': ['summary_name', 'summary_filename', 'tags'], 'type': 'object'}¶

SUMMARY_TYPE = 'hed_tag_summary'¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Summarize the HED tags present in the dataset.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:: Updates the context.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedTypeOp¶

class SummarizeHedTypeOp(parameters)[source]¶

Bases: BaseOp

Summarize a HED type tag in a collection of tabular files.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
type_tag (str):Type tag to get_summary (e.g. condition-variable or task tags).

Optional remodeling parameters:

append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.

The purpose of this op is to produce a summary of the occurrences of specified tag. This summary is often used with condition-variable to produce a summary of the experimental design.

NAME = 'summarize_hed_type'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'type_tag': {'description': 'Type tag (such as Condition-variable or Task to design summaries for..', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'type_tag'], 'type': 'object'}¶

SUMMARY_TYPE = 'hed_type_summary'¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Summarize a specified HED type variable such as Condition-variable.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be summarized.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Usually required unless event file has a HED column.

Returns:

A copy of df

Return type:

DataFrame

Side effect:: Updates the relevant summary.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeHedValidationOp¶

class SummarizeHedValidationOp(parameters)[source]¶

Bases: BaseOp

Validate the HED tags in a dataset and report errors.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
check_for_warnings (bool): If true include warnings as well as errors.

Optional remodeling parameters:

append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.

The purpose of this op is to produce a summary of the HED validation errors in a file.

NAME = 'summarize_hed_validation'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'check_for_warnings': {'description': 'If true warnings as well as errors are reported.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'check_for_warnings'], 'type': 'object'}¶

SUMMARY_TYPE = 'hed_validation'¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Validate the dataframe with the accompanying sidecar, if any.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be validated.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Usually needed unless only HED tags in HED column of event file.

Returns:

A copy of df

Return type:

pd.DataFrame

Side effect:: Updates the relevant summary.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeSidecarFromEventsOp¶

class SummarizeSidecarFromEventsOp(parameters)[source]¶

Bases: BaseOp

Create a JSON sidecar from column values in a collection of tabular files.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.

Optional remodeling parameters:

append_timecode (bool):
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.

The purpose is to produce a JSON sidecar template for annotating a dataset with HED tags.

NAME = 'summarize_sidecar_from_events'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'type': 'boolean'}, 'skip_columns': {'description': 'List of columns to skip in generating the sidecar.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'List of columns to provide a single annotation with placeholder for the values.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶

SUMMARY_TYPE = 'events_to_sidecar'¶

do_op(dispatcher, df, name, sidecar=None)[source]¶

Extract a sidecar from events file.

Parameters:

dispatcher (Dispatcher) – The dispatcher object for managing the operations.
df (DataFrame) – The tabular file to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:: Updates the associated summary if applicable.

static validate_input_data(parameters)[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

Analysis Operations¶

Operations for analyzing and summarizing tabular data.

SummarizeColumnNamesOp¶

class SummarizeColumnNamesOp(parameters)[source]¶

Bases: BaseOp

Summarize the column names in a collection of tabular files.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.

Optional remodeling parameters:

append_timecode (bool): If False (default), the timecode is not appended to the summary filename.

The purpose is to check that all the tabular files have the same columns in same order.

NAME = 'summarize_column_names'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶

SUMMARY_TYPE = 'column_names'¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Create a column name summary for df.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:: Updates the relevant summary.

static validate_input_data(parameters) → list[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeColumnValuesOp¶

class SummarizeColumnValuesOp(parameters)[source]¶

Bases: BaseOp

Summarize the values in the columns of a columnar file.

Required remodeling parameters:

summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.

Optional remodeling parameters:

append_timecode (bool): (Optional: Default False) If True append timecodes to the summary filename.
max_categorical (int): Maximum number of unique values to include in summary for a categorical column.
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.
values_per_line (int): The number of values output per line in the summary.

The purpose is to produce a summary of the values in a tabular file.

MAX_CATEGORICAL = 50¶

NAME = 'summarize_column_values'¶

PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'max_categorical': {'description': 'Maximum number of unique column values to show in text description.', 'type': 'integer'}, 'skip_columns': {'description': 'List of columns to skip when creating the summary.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'Columns to be annotated with a single HED annotation and placeholder.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'values_per_line': {'description': 'Number of items per line to display in the text file.', 'type': 'integer'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶

SUMMARY_TYPE = 'column_values'¶

VALUES_PER_LINE = 5¶

do_op(dispatcher, df, name, sidecar=None) → DataFrame[source]¶

Create a summary of the column values in df.

Parameters:

dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:: Updates the relevant summary.

static validate_input_data(parameters) → list[source]¶: Additional validation required of operation parameters not performed by JSON schema validator.

Operation registry¶

The valid_operations module maintains a registry of all available operations.

valid_operations = {operation_name: OperationClass}¶: Dictionary mapping operation names to their implementation classes. Each key is a string operation name used in JSON specifications, and each value is the corresponding operation class.