Operations¶
Remodeling and analysis operations for transforming tabular data.
Base classes¶
All operations inherit from these base classes.
BaseOp¶
- class BaseOp(parameters)[source]¶
Bases:
ABCBase class for operations. All remodeling operations should extend this class.
- abstract property NAME¶
- abstract property PARAMS¶
- abstractmethod do_op(dispatcher, df, name, sidecar=None)[source]¶
Base class method to be overridden by each operation.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The tabular file to be remodeled.
name (str) – Unique identifier for the data – often the original file path.
sidecar (Sidecar or file-like) – A JSON sidecar needed for HED operations.
- abstractmethod static validate_input_data(parameters)[source]¶
Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.
Example: A check to see whether two input arrays are the same length.
- Notes: The minimum implementation should return an empty list to indicate no errors were found.
If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.
BaseSummary¶
- class BaseSummary(sum_op)[source]¶
Bases:
ABCAbstract base class for summary contents. Should not be instantiated.
- Parameters:
sum_op (BaseOp) – Operation corresponding to this summary.
- DISPLAY_INDENT = ' '¶
- INDIVIDUAL_SUMMARIES_PATH = 'individual_summaries'¶
- abstractmethod get_details_dict(summary_info)[source]¶
Return the summary-specific information.
- Parameters:
summary_info (object) – Summary to return info from.
- Returns:
dictionary with the results.
- Return type:
Notes
Abstract method be implemented by each individual summary.
Notes
The expected return value is a dictionary of the form:
{“Name”: “”, “Total events”: 0, “Total files”: 0, “Files”: [], “Specifics”: {}}”
- get_individual(summary_details, separately=True)[source]¶
Return a dictionary of the individual file summaries.
- get_summary(individual_summaries='separate')[source]¶
Return a summary dictionary with the information.
- Parameters:
individual_summaries (str) – “separate”, “consolidated”, or “none”
- Returns:
Dictionary with “Dataset” and “Individual files” keys.
- Return type:
- Notes: The individual_summaries value is processed as follows:
“separate” individual summaries are to be in separate files.
“consolidated” means that the individual summaries are in same file as overall summary.
“none” means that only the overall summary is produced.
- get_summary_details(include_individual=True) dict[source]¶
Return a dictionary with the details for individual files and the overall dataset.
- Parameters:
include_individual (bool) – If True, summaries for individual files are included.
- Returns:
A dictionary with ‘Dataset’ and ‘Individual files’ keys.
- Return type:
Notes
The ‘Dataset’ value is either a string or a dictionary with the overall summary.
- The ‘Individual files’ value is dictionary whose keys are file names and values are
their corresponding summaries.
Users are expected to provide merge_all_info and get_details_dict functions to support this.
- get_text_summary(individual_summaries='separate') dict[source]¶
Return a complete text summary by assembling the individual pieces.
- Parameters:
individual_summaries (str) – One of the values “separate”, “consolidated”, or “none”.
- Returns:
Complete text summary.
- Return type:
- Notes: The options are:
“none”: Just has “Dataset” key.
“consolidated” Has “Dataset” and “Individual files” keys with the values of each is a string.
“separate” Has “Dataset” and “Individual files” keys. The values of “Individual files” is a dict.
- get_text_summary_details(include_individual=True) dict[source]¶
Return a text summary of the information represented by this summary.
- abstractmethod merge_all_info()[source]¶
Return merged information.
- Returns:
Consolidated summary of information.
- Return type:
Notes
Abstract method be implemented by each individual summary.
- save(save_dir, file_formats=None, individual_summaries='separate', task_name='')[source]¶
Save the summaries using the format indicated.
- Parameters:
save_dir (str) – Name of the directory to save the summaries in.
file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.txt’].
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.
- save_visualizations(save_dir, file_formats=None, individual_summaries='separate', task_name='')[source]¶
Save summary visualizations, if any, using the format indicated.
- Parameters:
save_dir (str) – Name of the directory to save the summaries in.
file_formats (list or None) – List of file formats to use for saving. If None, defaults to [‘.svg’].
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.
Data transformation operations¶
Operations that modify or reorganize tabular data.
ConvertColumnsOp¶
- class ConvertColumnsOp(parameters)[source]¶
Bases:
BaseOpConvert specified columns to have specified data type.
- Required remodeling parameters:
column_names (list): The list of columns to convert.
convert_to (str): Name of type to convert to. (One of ‘str’, ‘int’, ‘float’, ‘fixed’.)
- Optional remodeling parameters:
decimal_places (int): Number decimal places to keep (for fixed only).
Notes:
- NAME = 'convert_columns'¶
- PARAMS = {'additionalProperties': False, 'if': {'properties': {'convert_to': {'const': 'fixed'}}}, 'properties': {'column_names': {'description': 'List of names of the columns whose types are to be converted to the specified type.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'convert_to': {'description': 'Data type to convert the columns to.', 'enum': ['str', 'int', 'float', 'fixed'], 'type': 'string'}, 'decimal_places': {'description': 'The number of decimal points if converted to fixed.', 'type': 'integer'}}, 'required': ['column_names', 'convert_to'], 'then': {'required': ['decimal_places']}, 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None)[source]¶
Convert the specified column to a specified type.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.
- Returns:
A new DataFrame with the factor columns appended.
- Return type:
DataFrame
FactorColumnOp¶
- class FactorColumnOp(parameters)[source]¶
Bases:
BaseOpAppend to tabular file columns of factors based on column values.
- Required remodeling parameters:
column_name (str): The name of a column in the DataFrame to compute factors from.
- Optional remodeling parameters
factor_names (list): Names to use as the factor columns.
factor_values (list): Values in the column column_name to create factors for.
Notes
If no factor_values are provided, factors are computed for each of the unique values in column_name column.
If factor_names are provided, then factor_values must also be provided and the two lists be the same size.
- NAME = 'factor_column'¶
- PARAMS = {'additionalProperties': False, 'dependentRequired': {'factor_names': ['factor_values']}, 'properties': {'column_name': {'description': 'Name of the column for which to create one-hot factors for unique values.', 'type': 'string'}, 'factor_names': {'description': 'Names of the resulting factor columns. If given must be same length as factor_values', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'factor_values': {'description': 'Specific unique column values to compute factors for (otherwise all unique values).', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Create factor columns based on values in a specified column.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new DataFrame with the factor columns appended.
- Return type:
DataFrame
MergeConsecutiveOp¶
- class MergeConsecutiveOp(parameters)[source]¶
Bases:
BaseOpMerge consecutive rows of a columnar file with same column value.
- Required remodeling parameters:
column_name (str): name of column whose consecutive values are to be compared (the merge column).
event_code (str or int or float): the particular value in the match column to be merged.
set_durations (bool): If true, set the duration of the merged event to the extent of the merged events.
ignore_missing (bool): If true, missing match_columns are ignored.
- Optional remodeling parameters:
match_columns (list): A list of columns whose values have to be matched for two events to be the same.
Notes
This operation is meant for time-based tabular files that have an onset column.
- NAME = 'merge_consecutive'¶
- PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'The name of the column to check for repeated consecutive codes.', 'type': 'string'}, 'event_code': {'description': 'The event code to match for duplicates.', 'type': ['string', 'number']}, 'ignore_missing': {'description': 'If true, missing match columns are ignored.', 'type': 'boolean'}, 'match_columns': {'description': 'List of columns whose values must also match to be considered a repeat.', 'items': {'type': 'string'}, 'type': 'array'}, 'set_durations': {'description': 'If true, then the duration should be computed based on start of first to end of last.', 'type': 'boolean'}}, 'required': ['column_name', 'event_code', 'set_durations', 'ignore_missing'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Merge consecutive rows with the same column value.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
- Raises:
If dataframe does not have the anchor column and ignore_missing is False.
If a match column is missing and ignore_missing is False.
If the durations were to be set and the dataframe did not have an onset column.
If the durations were to be set and the dataframe did not have a duration column.
NumberGroupsOp¶
- class NumberGroupsOp(parameters)[source]¶
Bases:
BaseOpImplementation in progress.
- NAME = 'number_groups'¶
- PARAMS = {'additionalProperties': False, 'properties': {'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}, 'source_column': {'type': 'string'}, 'start': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}, 'stop': {'additionalProperties': False, 'properties': {'inclusion': {'enum': ['include', 'exclude'], 'type': 'string'}, 'values': {'type': 'array'}}, 'required': ['values', 'inclusion'], 'type': 'object'}}, 'required': ['number_column_name', 'source_column', 'start', 'stop'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None)[source]¶
Add numbers to groups of events in dataframe.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
NumberRowsOp¶
- class NumberRowsOp(parameters)[source]¶
Bases:
BaseOpImplementation in progress.
- NAME = 'number_rows'¶
- PARAMS = {'additionalProperties': False, 'properties': {'match_value': {'additionalProperties': False, 'properties': {'column': {'type': 'string'}, 'value': {'type': ['string', 'number']}}, 'required': ['column', 'value'], 'type': 'object'}, 'number_column_name': {'type': 'string'}, 'overwrite': {'type': 'boolean'}}, 'required': ['number_column_name'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None)[source]¶
Add numbers events dataframe.
- Parameters:
dispatcher (Dispatcher) – Manages operation I/O.
df (DataFrame) –
The DataFrame to be remodeled.
name (str) –
Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
RemapColumnsOp¶
- class RemapColumnsOp(parameters)[source]¶
Bases:
BaseOpMap values in m columns in a columnar file into a new combinations in n columns.
- Required remodeling parameters:
source_columns (list): The key columns to map (m key columns).
destination_columns (list): The destination columns to have the mapped values (n destination columns).
map_list (list): A list of lists with the mapping.
ignore_missing (bool): If True, entries whose key column values are not in map_list are ignored.
- Optional remodeling parameters:
integer_sources (list): Source columns that should be treated as integers rather than strings.
Notes
Each list element list is of length m + n with the key columns followed by mapped columns.
TODO: Allow wildcards
- NAME = 'remap_columns'¶
- PARAMS = {'additionalProperties': False, 'properties': {'destination_columns': {'description': 'The columns to insert new values based on a key lookup of the source columns.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}, 'ignore_missing': {'description': 'If true, insert missing source columns in the result, filled with n/a, else error.', 'type': 'boolean'}, 'integer_sources': {'description': 'A list of source column names whose values are to be treated as integers.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'map_list': {'description': 'An array of k lists each with m+n entries corresponding to the k unique keys.', 'items': {'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'source_columns': {'description': 'The columns whose values are combined to provide the remap keys.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}}, 'required': ['source_columns', 'destination_columns', 'map_list', 'ignore_missing'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Remap new columns from combinations of others.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
- Raises:
If ignore_missing is False and source values from the data are not in the map.
- static validate_input_data(parameters)[source]¶
Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.
Example: A check to see whether two input arrays are the same length.
- Notes: The minimum implementation should return an empty list to indicate no errors were found.
If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.
RemoveColumnsOp¶
- class RemoveColumnsOp(parameters)[source]¶
Bases:
BaseOpRemove columns from a columnar file.
- Required remodeling parameters:
column_names (list): The names of the columns to be removed.
ignore_missing (boolean): If True, names in column_names that are not columns in df should be ignored.
- NAME = 'remove_columns'¶
- PARAMS = {'additionalProperties': False, 'properties': {'column_names': {'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'type': 'boolean'}}, 'required': ['column_names', 'ignore_missing'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Remove indicated columns from a dataframe.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
pd.DataFrame
- Raises:
KeyError –
If ignore_missing is False and a column not in the data is to be removed.
RemoveRowsOp¶
- class RemoveRowsOp(parameters)[source]¶
Bases:
BaseOpRemove rows from a columnar file based on the values in a specified row.
- Required remodeling parameters:
column_name (str): The name of column to be tested.
remove_values (list): The values to test for row removal.
- NAME = 'remove_rows'¶
- PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'Name of the key column to determine which rows to remove.', 'type': 'string'}, 'remove_values': {'description': 'List of key values for rows to remove.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['column_name', 'remove_values'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Remove rows with the values indicated in the column.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
RenameColumnsOp¶
- class RenameColumnsOp(parameters)[source]¶
Bases:
BaseOpRename columns in a tabular file.
- Required remodeling parameters:
column_mapping (dict): The names of the columns to be renamed with values to be remapped to.
ignore_missing (bool): If true, the names in column_mapping that are not columns and should be ignored.
- NAME = 'rename_columns'¶
- PARAMS = {'additionalProperties': False, 'properties': {'column_mapping': {'description': 'Mapping between original column names and their respective new names.', 'minProperties': 1, 'patternProperties': {'.*': {'type': 'string'}}, 'type': 'object'}, 'ignore_missing': {'description': "If true ignore column_mapping keys that don't correspond to columns, otherwise error.", 'type': 'boolean'}}, 'required': ['column_mapping', 'ignore_missing'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Rename columns as specified in column_mapping dictionary.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
pd.Dataframe
- Raises:
KeyError – When ignore_missing is False and column_mapping has columns not in the data.
ReorderColumnsOp¶
- class ReorderColumnsOp(parameters)[source]¶
Bases:
BaseOpReorder columns in a columnar file.
- Required parameters:
column_order (list): The names of the columns to be reordered.
ignore_missing (bool): If False and a column in column_order is not in df, skip the column.
keep_others (bool): If True, columns not in column_order are placed at end.
- NAME = 'reorder_columns'¶
- PARAMS = {'additionalProperties': False, 'properties': {'column_order': {'description': 'A list of column names in the order you wish them to be.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'ignore_missing': {'description': "If true, ignore column_order columns that aren't in file, otherwise error.", 'type': 'boolean'}, 'keep_others': {'description': 'If true columns not in column_order are placed at end, otherwise ignored.', 'type': 'boolean'}}, 'required': ['column_order', 'ignore_missing', 'keep_others'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Reorder columns as specified in event dictionary.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
- Raises:
ValueError – When ignore_missing is false and column_order has columns not in the data.
SplitRowsOp¶
- class SplitRowsOp(parameters)[source]¶
Bases:
BaseOpSplit rows in a columnar file with onset and duration columns into multiple rows based on a specified column.
- Required remodeling parameters:
anchor_column (str): The column in which the names of new items are stored.
new_events (dict): Mapping of new values based on values in the original row.
remove_parent_row (bool): If true, the original row that was split is removed.
Notes
In specifying onset and duration for the new row, you can give values or the names of columns as strings.
- NAME = 'split_rows'¶
- PARAMS = {'additionalProperties': False, 'properties': {'anchor_column': {'description': 'The column containing the keys for the new rows. (Original rows will have own keys.)', 'type': 'string'}, 'new_events': {'description': 'A map describing how the rows for the new codes will be created.', 'minProperties': 1, 'patternProperties': {'.*': {'additionalProperties': False, 'properties': {'copy_columns': {'description': 'List of columns whose values to copy for the new row.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'duration': {'description': 'List of items to add to compute the duration of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'onset_source': {'description': 'List of items to add to compute the onset time of the new row.', 'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}}, 'required': ['onset_source', 'duration'], 'type': 'object'}}, 'type': 'object'}, 'remove_parent_row': {'description': 'If true, the row from which these rows were split is removed, otherwise it stays.', 'type': 'boolean'}}, 'required': ['anchor_column', 'new_events', 'remove_parent_row'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Split a row representing a particular event into multiple rows.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
- Raises:
TypeError – If bad onset or duration.
HED-Specific Operations¶
Operations for working with HED-annotated data.
FactorHedTypeOp¶
- class FactorHedTypeOp(parameters)[source]¶
Bases:
BaseOpAppend to columnar file the factors computed from type variables.
- Required remodeling parameters:
type_tag (str): HED tag used to find the factors (most commonly condition-variable).
- Optional remodeling parameters:
type_values (list): If provided, specifies which factor values to include.
- NAME = 'factor_hed_type'¶
- PARAMS = {'additionalProperties': False, 'properties': {'type_tag': {'description': 'Type tag to use for computing factor vectors (e.g., Condition-variable or Task).', 'type': 'string'}, 'type_values': {'description': 'If provided, only compute one-hot factors for these values of the type tag.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['type_tag'], 'type': 'object'}¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Factor columns based on HED type and append to tabular data.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.
- Returns:
A new DataFame with that includes the factors.
- Return type:
DataFrame
Notes
If column_name is not a column in df, df is just returned.
SummarizeDefinitionsOp¶
- class SummarizeDefinitionsOp(parameters)[source]¶
Bases:
BaseOpSummarize the definitions used in the dataset based on Def and Def-expand.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
- Optional remodeling parameters:
append_timecode (bool): If False (default), the timecode is not appended to the summary filename.
The purpose is to produce a summary of the definitions used in a dataset.
- NAME = 'summarize_definitions'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶
- SUMMARY_TYPE = 'type_defs'¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Create summaries of definitions.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Only needed for HED operations.
- Returns:
a copy of df
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
SummarizeHedTypeOp¶
- class SummarizeHedTypeOp(parameters)[source]¶
Bases:
BaseOpSummarize a HED type tag in a collection of tabular files.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
type_tag (str):Type tag to get_summary (e.g. condition-variable or task tags).
- Optional remodeling parameters:
append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.
The purpose of this op is to produce a summary of the occurrences of specified tag. This summary is often used with condition-variable to produce a summary of the experimental design.
- NAME = 'summarize_hed_type'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'type_tag': {'description': 'Type tag (such as Condition-variable or Task to design summaries for..', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'type_tag'], 'type': 'object'}¶
- SUMMARY_TYPE = 'hed_type_summary'¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Summarize a specified HED type variable such as Condition-variable.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be summarized.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Usually required unless event file has a HED column.
- Returns:
A copy of df
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
SummarizeHedValidationOp¶
- class SummarizeHedValidationOp(parameters)[source]¶
Bases:
BaseOpValidate the HED tags in a dataset and report errors.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
check_for_warnings (bool): If true include warnings as well as errors.
- Optional remodeling parameters:
append_timecode (bool): If true, the timecode is appended to the base filename when summary is saved.
The purpose of this op is to produce a summary of the HED validation errors in a file.
- NAME = 'summarize_hed_validation'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'check_for_warnings': {'description': 'If true warnings as well as errors are reported.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename', 'check_for_warnings'], 'type': 'object'}¶
- SUMMARY_TYPE = 'hed_validation'¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Validate the dataframe with the accompanying sidecar, if any.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be validated.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Usually needed unless only HED tags in HED column of event file.
- Returns:
A copy of df
- Return type:
pd.DataFrame
- Side effect:
Updates the relevant summary.
SummarizeSidecarFromEventsOp¶
- class SummarizeSidecarFromEventsOp(parameters)[source]¶
Bases:
BaseOpCreate a JSON sidecar from column values in a collection of tabular files.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
- Optional remodeling parameters:
append_timecode (bool):
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.
The purpose is to produce a JSON sidecar template for annotating a dataset with HED tags.
- NAME = 'summarize_sidecar_from_events'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'type': 'boolean'}, 'skip_columns': {'description': 'List of columns to skip in generating the sidecar.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'List of columns to provide a single annotation with placeholder for the values.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶
- SUMMARY_TYPE = 'events_to_sidecar'¶
- do_op(dispatcher, df, name, sidecar=None)[source]¶
Extract a sidecar from events file.
- Parameters:
dispatcher (Dispatcher) – The dispatcher object for managing the operations.
df (DataFrame) – The tabular file to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A copy of df.
- Return type:
DataFrame
- Side effect:
Updates the associated summary if applicable.
Analysis Operations¶
Operations for analyzing and summarizing tabular data.
SummarizeColumnNamesOp¶
- class SummarizeColumnNamesOp(parameters)[source]¶
Bases:
BaseOpSummarize the column names in a collection of tabular files.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
- Optional remodeling parameters:
append_timecode (bool): If False (default), the timecode is not appended to the summary filename.
The purpose is to check that all the tabular files have the same columns in same order.
- NAME = 'summarize_column_names'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶
- SUMMARY_TYPE = 'column_names'¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Create a column name summary for df.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A copy of df.
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
SummarizeColumnValuesOp¶
- class SummarizeColumnValuesOp(parameters)[source]¶
Bases:
BaseOpSummarize the values in the columns of a columnar file.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
- Optional remodeling parameters:
append_timecode (bool): (Optional: Default False) If True append timecodes to the summary filename.
max_categorical (int): Maximum number of unique values to include in summary for a categorical column.
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.
values_per_line (int): The number of values output per line in the summary.
The purpose is to produce a summary of the values in a tabular file.
- MAX_CATEGORICAL = 50¶
- NAME = 'summarize_column_values'¶
- PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'max_categorical': {'description': 'Maximum number of unique column values to show in text description.', 'type': 'integer'}, 'skip_columns': {'description': 'List of columns to skip when creating the summary.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'Columns to be annotated with a single HED annotation and placeholder.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'values_per_line': {'description': 'Number of items per line to display in the text file.', 'type': 'integer'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶
- SUMMARY_TYPE = 'column_values'¶
- VALUES_PER_LINE = 5¶
- do_op(dispatcher, df, name, sidecar=None) DataFrame[source]¶
Create a summary of the column values in df.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A copy of df.
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
Operation registry¶
The valid_operations module maintains a registry of all available operations.
- valid_operations = {operation_name: OperationClass}¶
Dictionary mapping operation names to their implementation classes. Each key is a string operation name used in JSON specifications, and each value is the corresponding operation class.