Tools

Utility functions and data processing tools for HED operations.

Analysis tools

EventManager

class EventManager(input_data, hed_schema, extra_defs=None)[source]

Bases: object

Manager of events of temporal extent.

static compress_strings(list_to_compress)[source]

Compress a list of lists of strings into a single str with comma-separated elements.

Parameters:

list_to_compress (list) – List of lists of HED str to turn into a list of single HED strings.

Returns:

List of same length as list_to_compress with each entry being a str.

Return type:

list

get_type_defs(types)[source]

Return a list of definition names (lower case) that correspond to any of the specified types.

Parameters:

types (list or None) – List of tags that are treated as types such as ‘Condition-variable’

Returns:

List of definition names (lower-case) that correspond to the specified types

Return type:

list

str_list_to_hed(str_list)[source]

Create a HedString object from a list of strings.

Parameters:

str_list (list) – A list of strings to be concatenated with commas and then converted.

Returns:

The converted list.

Return type:

Union[HedString, None]

unfold_context(remove_types=None)[source]

Unfold the event information into a tuple based on context.

Parameters:

remove_types (list or None) – List of types to remove. If None, defaults to empty list.

Returns:

Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.

Return type:

tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]

EventChecker

class EventChecker(hed_obj, line_number, original_line_number=None, error_handler=None)[source]

Bases: object

Validates that HED-annotated events meet quality requirements such as having a top-level event tag.

ACTION_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Done-indication', 'Imagined-action', 'Inappropriate-action', 'Incorrect-action', 'Indeterminate-action', 'Miss', 'Near-miss', 'Omitted-action', 'Ready-indication'}
ALL_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Cue', 'Distractor', 'Done-indication', 'Expected', 'Experimental-stimulus', 'Extraneous', 'Feedback', 'Go-signal', 'Imagined-action', 'Inappropriate-action', 'Incidental', 'Incorrect-action', 'Indeterminate-action', 'Instructional', 'Meaningful', 'Mishap', 'Miss', 'Near-miss', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Omitted-action', 'Participant-response', 'Penalty', 'Planned', 'Priming', 'Query', 'Ready-indication', 'Reward', 'Stop-signal', 'Target', 'Task-activity', 'Threat', 'Timed', 'Unexpected', 'Unplanned', 'Warning'}
EVENT_TAGS = {'Agent-action', 'Data-feature', 'Event', 'Experiment-control', 'Experiment-structure', 'Measurement-event', 'Sensory-event'}
NON_TASK_EVENTS = {'Data-feature', 'Experiment-control', 'Experiment-structure', 'Measurement-event'}
STIMULUS_ROLES = {'Distractor', 'Expected', 'Extraneous', 'Go-signal', 'Meaningful', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Penalty', 'Planned', 'Priming', 'Query', 'Reward', 'Stop-signal', 'Target', 'Threat', 'Timed', 'Unexpected', 'Unplanned'}
TASK_ROLES = {'Cue', 'Experimental-stimulus', 'Feedback', 'Incidental', 'Instructional', 'Mishap', 'Participant-response', 'Task-activity', 'Warning'}

EventsChecker

class EventsChecker(hed_schema, input_data, name=None)[source]

Bases: object

Class to check for event tag quality errors in an event file.

REMOVE_TYPES = ['Condition-variable', 'Task']
static get_error_lines(issues)[source]

Get the lines grouped by code.

Parameters:

issues (list) – A list of issues to check.

Returns:

A dict with keys that are error codes and values that are lists of line numbers.

Return type:

dict

static get_hed_source(hed_dict, value)[source]

Get the source of the HED string.

Parameters:

hed_dict (HedTag) – The HedTag object to get the source for.

Returns:

The source of the HED string.

Return type:

str

static get_issue_details(data_info, side_data)[source]

Get the source details for the issue.

Parameters:
  • data_info (pd.Series) – The row information from the original tsv.

  • side_data (pd.Series) – The sidecar data.

Returns:

The HED associated with the relevant columns.

Return type:

list

get_onset_lines(line)[source]

Get the lines in the input data with the same line numbers as the data_frame.

insert_issue_details(issues)[source]

Inserts issue details as part of the ‘message’ key for a list of issues.

Parameters:

issues (list) – List of issues to get details for.

validate_event_tags()[source]

Verify that the events in the HED strings validly represent events.

Returns:

each element is a dictionary with ‘code’ and ‘message’ keys,

Return type:

list

EventsSummary

class EventsSummary(hed_schema, file, sidecar=None, name=None)[source]

Bases: object

Summarizes HED event annotations for a tabular file, grouping tags by stimulus/response categories.

CUTOFF_TAGS = {'blue-color', 'brown-color', 'cyan-color', 'gray-color', 'green-color', 'orange-color', 'pink-color', 'purple-color', 'red-color', 'visual-presentation', 'white-color', 'yellow-color'}
EXCLUDED_PARENTS = {'data-marker', 'data-resolution', 'grayscale', 'hsv-color', 'informational-property', 'luminance', 'luminance-contrast', 'opacity', 'organizational-property', 'quantitative-value', 'relation', 'rgb-color', 'spatiotemporal-value', 'statistical-value', 'task-effect-evidence', 'task-relationship'}
FILTERED_TAGS = {'action', 'agent', 'agent-cognitive-state', 'agent-emotional-state', 'agent-physiological-state', 'agent-postural-state', 'agent-property', 'agent-state', 'agent-task-role', 'agent-trait', 'anatomical-item', 'auditory-attribute', 'auditory-device', 'biological-artifact', 'biological-item', 'body-part', 'categorical-class-value', 'categorical-judgment-value', 'categorical-level-value', 'categorical-location-value', 'categorical-orientation-value', 'categorical-value', 'computing-device', 'dara-source-type', 'data-property', 'data-value', 'data-variability-attribute', 'device', 'display-device', 'document', 'environmental-property', 'event', 'face-part', 'geometric-object', 'gustatory-attribute', 'head-part', 'input-device', 'io-device', 'item', 'language-item', 'lower-extremity-part', 'man-made-object', 'media', 'media-clip', 'move-body-part', 'natural-object', 'nonbiological-artifact', 'object', 'olfactory-attribute', 'output-device', 'physical-value', 'property', 'recording-device', 'sensory-attribute', 'sensory-presentation', 'sensory-property', 'spatial-property', 'spectral-property', 'tactile-attribute', 'task-action-type', 'task-attentional-demand', 'task-event-role', 'task-property', 'task-stimulus-role', 'temporal-property', 'torso-part', 'upper-extremity-part', 'visual-attribute', 'visualization'}
MATCH_TYPES = ['Experimental-stimulus', 'Participant-response', 'Cue', 'Feedback', 'Instructional', 'Sensory-event', 'Agent-action']
REMOVE_TYPES = ['Condition-variable', 'Task']
extract_tag_summary()[source]

Extract a summary of the tags in a given tabular input file.

Returns:

  • dict: A dictionary with the summary information - (str, list)

  • list: A set of tags that do not match any of the specified types but are not excluded.

Return type:

tuple[dict, list]

static match_tags(all_tags, key)[source]

Return True if any tag in all_tags has a short_base_tag matching key.

Parameters:
  • all_tags (list[HedTag]) – The tags to search.

  • key (str) – The short base tag name to look for.

Returns:

True if a match is found.

Return type:

bool

update_tags(tag_set, all_tags)[source]

Add the most-specific ancestor tag names from all_tags into tag_set, respecting cutoff categories.

Parameters:
  • tag_set (set) – The running set of tag terms to update.

  • all_tags (list[HedTag]) – Tags to process.

Returns:

The updated tag_set.

Return type:

set

HedTagManager

class HedTagManager(event_manager, remove_types=None)[source]

Bases: object

Manager for the HED tags from a columnar file.

get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]

Return a HED string object with the types removed.

Parameters:
  • hed_str (str) – Represents a HED string.

  • remove_types (bool) – If False (the default), do not remove the types managed by this manager.

  • remove_group (bool) – If False (the default), do not remove the group when removing a type tag, otherwise remove its enclosing group.

get_hed_objs(include_context=True, replace_defs=False)[source]

Return a list of HED string objects of same length as the tabular file.

Parameters:
  • include_context (bool) – If True (default), include the Event-context group in the HED string.

  • replace_defs (bool) – If True (default=False), replace the Def tags with Definition contents.

Returns:

list - List of HED strings of same length as tabular file.

HedTagCount

class HedTagCount(hed_tag, file_name)[source]

Bases: object

Counts for a particular HedTag in particular file.

get_empty()[source]

Return a copy of this entry with counts reset to zero.

Returns:

A new instance with the same tag name but zeroed event/file counts.

Return type:

HedTagCount

get_info(verbose=False) dict[source]

Return counts for this tag.

Parameters:

verbose (bool) – If False (the default) only number of files included, otherwise a list of files.

Returns:

Keys are ‘tag’, ‘events’, and ‘files’.

Return type:

dict

get_summary() dict[source]

Return a dictionary summary of the events and files for this tag.

Returns:

dictionary summary of events and files that contain this tag.

Return type:

dict

set_value(hed_tag)[source]

Update the tag term value counts for a HedTag.

Parameters:

hed_tag (HedTag or None) – Item to use to update the value counts.

HedTagCounts

class HedTagCounts(name, total_events=0)[source]

Bases: object

Counts of HED tags for a group of columnar files.

Parameters:
  • name (str) – An identifier for these counts (usually the filename of the tabular file).

  • total_events (int) – The total number of events in the columnar file.

static create_template(tags) dict[source]

Creates a dictionary with keys based on list of keys in tags dictionary.

Parameters:

tags (dict) – dictionary of tags and key lists.

Returns:

Dictionary with keys in key lists and values are empty lists.

Return type:

dict

Note: This class is used to organize the results of the tags based on a template for display.

get_summary() dict[source]

Return a summary object containing the tag count information of this summary.

Returns:

Keys are ‘name’, ‘files’, ‘total_events’, and ‘details’.

Return type:

dict

merge_tag_dicts(other_dict)[source]

Merge the information from another dictionary with this object’s tag dictionary.

Parameters:

other_dict (dict) – Dictionary of tag, HedTagCount to merge.

organize_tags(tag_template) tuple[source]

Organize tags into categories as specified by the tag_template.

Parameters:

tag_template (dict) – A dictionary whose keys are titles and values are lists of HED tags (str).

Returns:

A tuple containing two elements. - dict: Keys are tags (strings) and values are list of HedTagCount for items fitting template. - list: HedTagCount objects corresponding to tags that don’t fit the template.

Return type:

[tuple[dict, list]]

update_tag_counts(hed_string_obj, file_name)[source]

Update the tag counts based on a HedString object.

Parameters:
  • hed_string_obj (HedString) – The HED string whose tags should be counted.

  • file_name (str) – The name of the file corresponding to these counts.

HedTypeManager

class HedTypeManager(event_manager)[source]

Bases: object

Manager for type factors and type definitions.

add_type(type_name)[source]

Add a type variable to be managed by this manager.

Parameters:

type_name (str) – Type tag name of the type to be added.

get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]

Return a DataFrame of factor vectors for the indicated HED tag and values.

Parameters:
  • type_tag (str) – HED tag to retrieve factors for.

  • type_values (list or None) – The values of the tag to create factors for or None if all unique values.

  • factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

Union[pd.DataFrame, None]

get_type(type_tag)[source]

Returns the HedType variable associated with the type tag.

Parameters:

type_tag (str) – HED tag to retrieve the type for.

Returns:

the values associated with this type tag.

Return type:

Union[HedType, None]

get_type_def_names(type_var)[source]

Return the definitions associated with a particular type tag.

Parameters:

type_var (str) – The name of a type tag such as Condition-variable.

Returns:

Names of definitions that use this type.

Return type:

list

get_type_tag_factor(type_tag, type_value)[source]

Return the HedTypeFactors a specified value and extension.

Parameters:
  • type_tag (str or None) – HED tag for the type.

  • type_value (str or None) – Value of this tag to return the factors for.

summarize_all(as_json=False)[source]

Return a dictionary containing the summaries for the types managed by this manager.

Parameters:

as_json (bool) – If False (the default), return as an object otherwise return as a JSON string.

Returns:

Dictionary with the summary.

Return type:

Union[dict, str]

property types

Return a list of types managed by this manager.

Returns:

Type tags names.

Return type:

list

HedType

class HedType(event_manager, name, type_tag='condition-variable')[source]

Bases: object

Manager of a type variable and its associated context.

get_summary()[source]

Return a summary dict mapping each type-value name to its factor summary.

Returns:

Keys are type-value name strings; values are factor summary dicts.

Return type:

dict

get_type_def_names()[source]

Return the type defs names

get_type_factors(type_values=None, factor_encoding='one-hot')[source]

Create a dataframe with the indicated type tag values as factors.

Parameters:
  • type_values (list or None) – A list of values of type tags for which to generate factors.

  • factor_encoding (str) – Type of factor encoding (one-hot or categorical).

Returns:

Contains the specified factors associated with this type tag.

Return type:

pd.DataFrame

static get_type_list(type_tag, item)[source]

Find a list of the given type tag from a HedTag, HedGroup, or HedString.

Parameters:
  • type_tag (str) – a tag whose direct items you wish to remove

  • item (HedTag or HedGroup) – The item from which to extract condition type_variables.

Returns:

List of the items with this type_tag

Return type:

list

get_type_value_factors(type_value)[source]

Return the HedTypeFactors associated with type_name or None.

Parameters:

type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).

Returns:

Union[HedTypeFactors, None]

get_type_value_level_info(type_value)[source]

Return type variable corresponding to type_value.

Parameters:

type_value (str)

Returns:

get_type_value_names()[source]

Return the list of type-value names defined in this HedType.

Returns:

Lowercased type-value name strings.

Return type:

list[str]

property total_events

Return the total number of events in the associated event list.

Returns:

Number of events.

Return type:

int

property type_variables

Return the set of type-value names (keys) found in this HedType.

Returns:

Set of lowercased type-value name strings.

Return type:

set[str]

HedTypeDefs

class HedTypeDefs(definitions, type_tag='condition-variable')[source]

Bases: object

Manager for definitions associated with a type such as condition-variable.

Properties:

def_map (dict): keys are definition names, values are dict {type_values, description, tags}.

Example: A definition ‘famous-face-cond’ with contents:

‘(Condition-variable/Face-type,Description/A face that should be recognized.,(Image,(Face,Famous)))’

would have type_values [‘face_type’]. All items are strings not objects.

static extract_def_names(item, no_value=True)[source]

Return a list of Def values in item.

Parameters:
  • item (HedTag, HedGroup, or HedString) – An item containing a def tag.

  • no_value (bool) – If True, strip off extra values after the definition name.

Returns:

A list of definition names (as strings).

Return type:

list

get_type_values(item)[source]

Return a list of type_tag values in item.

Parameters:

item (HedTag, HedGroup, or HedString) – An item potentially containing def tags.

Returns:

A list of the unique values associated with this type

Return type:

list

static split_name(name, lowercase=True)[source]

Split a name/# or name/x into name, x.

Parameters:
  • name (str) – The extension or value portion of a tag.

  • lowercase (bool) – If True (default), return values are converted to lowercase.

Returns:

  • Name of the definition.

  • Value of the definition if it has one.

Return type:

tuple[str, str]

property type_def_names

Return list of names of definition that have this type-variable.

Returns:

definition names that have this type.

Return type:

list

property type_names

Return list of names of the type-variables associated with type definitions.

Returns:

type names associated with the type definitions

Return type:

list

HedTypeFactors

class HedTypeFactors(type_tag, type_value, number_elements)[source]

Bases: object

Holds index of positions for a variable type for A columnar file.

ALLOWED_ENCODINGS = ('categorical', 'one-hot')
get_factors(factor_encoding='one-hot')[source]

Return a DataFrame of factor vectors for this type factor.

Parameters:

factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

pd.DataFrame

get_summary()[source]

Return the summary of the type tag value as a dictionary.

Returns:

Contains the summary.

Return type:

dict

HedTypeCount

class HedTypeCount(type_value, type_tag, file_name=None)[source]

Bases: object

Manager of the counts of tags for one type tag such as Condition-variable or Task.

Parameters:
  • type_value (str) – The value of the variable to be counted.

  • type_tag (str) – The type of variable.

Examples

HedTypeCounts(‘SymmetricCond’, ‘condition-variable’) keeps counts of Condition-variable/Symmetric.

get_summary()[source]

Return the summary of one value of one type tag.

Returns:

Count information for one tag of one type.

Return type:

dict

to_dict()[source]

Return count information as a dictionary.

update(type_sum, file_id)[source]

Update the counts from a HedTypeValues.

Parameters:
  • type_sum (dict) – Information about the contents for a particular data file.

  • file_id (str or None) – Name of the file associated with the counts.

HedTypeCounts

class HedTypeCounts(name, type_tag)[source]

Bases: object

Manager for summaries of tag counts for columnar files.

add_descriptions(type_defs)[source]

Update this summary based on the type variable map.

Parameters:

type_defs (HedTypeDefs) – Contains the information about the value of a type.

get_summary()[source]

Return the information in the manager as a dictionary.

Returns:

Dict with keys ‘name’, ‘type_tag’, ‘files’, ‘total_events’, and ‘details’.

Return type:

dict

update(counts)[source]

Update count information based on counts in another HedTypeCounts.

Parameters:

counts (HedTypeCounts) – Information to use in the update.

update_summary(type_sum, total_events=0, file_id=None)[source]

Update this summary based on the type variable map.

Parameters:
  • type_sum (dict) – Contains the information about the value of a type.

  • total_events (int) – Total number of events processed.

  • file_id (str) – Unique identifier for the associated file.

TabularSummary

class TabularSummary(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]

Bases: object

Summarize the contents of columnar files.

extract_sidecar_template() dict[source]

Extract a BIDS sidecar-compatible dictionary.

Returns:

A sidecar template that can be converted to JSON.

Return type:

dict

static extract_summary(summary_info) TabularSummary[source]

Create a TabularSummary object from a serialized summary.

Parameters:

summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.

Returns:

contains the information in summary_info as a TabularSummary object.

Return type:

TabularSummary

static get_columns_info(dataframe, skip_cols=None) dict[str, dict][source]

Extract unique value counts for columns.

Parameters:
  • dataframe (DataFrame) – The DataFrame to be analyzed.

  • skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names (strings) and values that

are dictionaries of unique value counts.

Return type:

dict[str, dict]

get_number_unique(column_names=None) dict[source]

Return the number of unique values in columns.

Parameters:

column_names (list, None) – A list of column names to analyze or all columns if None.

Returns:

Column names are the keys and the number of unique values in the column are the values.

Return type:

dict

get_summary(as_json=False) dict | str[source]

Return the summary in dictionary format.

Parameters:

as_json (bool) – If False, return as a Python dictionary, otherwise convert to a JSON dictionary.

Returns:

A dictionary containing the summary information or a JSON string if as_json is True.

Return type:

Union[dict, str]

static make_combined_dicts(file_dictionary, skip_cols=None) tuple[TabularSummary, dict[str, TabularSummary]][source]

Return combined and individual summaries.

Parameters:
  • file_dictionary (FileDictionary) – Dictionary of file name keys and full path.

  • skip_cols (list) – Name of the column.

Returns:

  • A combined summary of all files in the dictionary.

  • A dictionary where keys are file names and values are individual TabularSummary objects.

Return type:

tuple[TabularSummary, dict[str, TabularSummary]]

update(data, name=None)[source]

Update the counts based on data (DataFrame, filename, or list of filenames).

Parameters:
  • data (DataFrame, str, or list) – DataFrame containing data to update.

  • name (str) – Name of the summary.

update_summary(tab_sum)[source]

Add TabularSummary values to this object.

Parameters:

tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

  • The value_cols and skip_cols are updated as long as they are not contradictory.

  • A new skip column cannot be used.

ColumnNameSummary

class ColumnNameSummary(name='')[source]

Bases: object

Summarize the unique column names in a dataset.

get_summary(as_json=False)[source]

Return summary as an object or in JSON.

Parameters:

as_json (bool) – If False (the default), return the underlying summary object, otherwise transform to JSON.

update(name, columns)[source]

Update the summary based on columns associated with a file.

Parameters:
  • name (str) – File name associated with the columns.

  • columns (list) – List of file names.

update_headers(column_names)[source]

Update the unique combinations of column names.

Parameters:

column_names (list) – List of column names to update.

FileDictionary

class FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]

Bases: object

A file dictionary keyed by entity pair indices.

Notes

  • The entities are identified as 0, 1, … depending on order in the base filename.

  • The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.

create_file_dict(file_list, key_indices, separator)[source]

Create new dict based on key indices.

Parameters:
  • file_list (list) – Paths of the files to include.

  • key_indices (tuple) – A tuple of integers representing order of entities for key.

  • separator (str) – The separator used between entities to form the key.

property file_dict

Dictionary of path values in this dictionary.

property file_list

List of path values in this dictionary.

get_file_path(key)[source]

Return file path corresponding to key.

Parameters:

key (str) – Key used to retrieve the file path.

Returns:

File path.

Return type:

str

iter_files()[source]

Iterator over the files in this dictionary.

Yields:

- str – Key into the dictionary. - file: File path.

key_diffs(other_dict)[source]

Return symmetric key difference with another dict.

Parameters:

other_dict (FileDictionary)

Returns:

The symmetric difference of the keys in this dictionary and the other one.

Return type:

list

property key_list

Keys in this dictionary.

static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]

Return a dictionary of files using entity keys.

Parameters:
  • file_list (list) – Paths to files to use.

  • key_indices (tuple) – Positions of entities to use for key.

  • separator (str) – Separator character used to construct key.

Returns:

Key is based on key indices and value is a full path.

Return type:

dict

static make_key(key_string, indices=(0, 2), separator='_')[source]

Create a key from specified entities.

Parameters:
  • key_string (str) – The string from which to extract the key (usually a filename or path).

  • indices (tuple) – Positions of entity pairs to use as key.

  • separator (str) – Separator between entity pairs in the created key.

Returns:

The created key.

Return type:

str

property name

Name of this dictionary.

output_files(title=None)[source]

Return a string with the output of the list.

Parameters:

title (None, str) – Optional title.

Returns:

The dictionary in string form.

Return type:

str

KeyMap

class KeyMap(key_cols, target_cols=None, name='')[source]

Bases: object

A map of unique column values for remapping columns.

key_cols

A list of column names that will be hashed into the keys for the map.

Type:

list

target_cols

Optional list of column names that will be inserted into data and later remapped.

Type:

list or None

name

An optional name of this remap for identification purposes.

Type:

str

Notes: This mapping converts all columns in the mapping to strings. The remapping does not support other types of columns.

property columns

Return the column names of the columns managed by this map.

Returns:

Column names of the columns managed by this map.

Return type:

list

make_template(additional_cols=None, show_counts=True)[source]

Return a dataframe template.

Parameters:
  • additional_cols (list or None) – Optional list of additional columns to append to the returned dataframe.

  • show_counts (bool) – If True, number of times each key combination appears is in first column and values are sorted in descending order by.

Returns:

A dataframe containing the template.

Return type:

DataFrame

Raises:

HedFileError – If additional columns are not disjoint from the key columns.

Notes

  • The template consists of the unique key columns in this map plus additional columns.

remap(data)[source]

Remap the columns of a dataframe or columnar file.

Parameters:

data (DataFrame, str) – Columnar data (either DataFrame or filename) whose columns are to be remapped.

Returns:

  • New dataframe with columns remapped.

  • List of row numbers that had no correspondence in the mapping.

Return type:

tuple [DataFrame, list]

Raises:

HedFileError – If data is missing some of the key columns.

static remove_quotes(df, columns=None)[source]

Remove quotes from the specified columns and convert to string.

Parameters:
  • df (Dataframe) – Dataframe to process by removing quotes.

  • columns (list) – List of column names. If None, all columns are used.

Notes

  • Replacement is done in place.

resort()[source]

Sort the col_map in place by the key columns.

update(data, allow_missing=True)[source]

Update the existing map with information from data.

Parameters:
  • data (DataFrame or str) – DataFrame or filename of an events file or event map.

  • allow_missing (bool) – If True allow missing keys and add as n/a columns.

Raises:

HedFileError – If there are missing keys and allow_missing is False.

TemporalEvent

class TemporalEvent(contents, start_index, start_time)[source]

Bases: object

A single event process with starting and ending times.

Note: the contents have the Onset and duration removed.

set_end(end_index, end_time)[source]

Set end time information for an event process.

Parameters:
  • end_index (int) – Position of ending event marker corresponding to the end of this event process.

  • end_time (float) – Ending time of the event (usually in seconds).

Annotation utilities

Utilities to facilitate annotation of events in BIDS.

check_df_columns(df, required_cols=('column_name', 'column_value', 'description', 'HED')) list[str][source]

Return a list of the specified columns that are missing from a dataframe.

Parameters:
  • df (DataFrame) – Spreadsheet to check the columns of.

  • required_cols (tuple) – List of column names that must be present.

Returns:

List of column names that are missing.

Return type:

list[str]

df_to_hed(dataframe, description_tag=True) dict[source]

Create sidecar-like dictionary from a 4-column dataframe.

Parameters:
  • dataframe (DataFrame) – A four-column Pandas DataFrame with specific columns.

  • description_tag (bool) – If True description tag is included.

Returns:

A dictionary compatible with BIDS JSON tabular file that includes HED.

Return type:

dict

Notes

  • The DataFrame must have the columns with names: column_name, column_value, description, and HED.

extract_tags(hed_string, search_tag) tuple[str, list[str]][source]

Extract all instances of specified tag from a tag_string.

Parameters:
  • hed_string (str) – Tag string from which to extract tag.

  • search_tag (str) – HED tag to extract.

Returns:

tuple[str, list[str]
  • Tag string without the tags.

  • A list of the tags that were extracted, for example descriptions.

generate_sidecar_entry(column_name, column_values=None) dict[source]

Create a sidecar column dictionary for column.

Parameters:
  • column_name (str) – Name of the column.

  • column_values – List of column values.

hed_to_df(sidecar_dict, col_names=None) DataFrame[source]

Return a 4-column dataframe of HED portions of sidecar.

Parameters:
  • sidecar_dict (dict) – A dictionary conforming to BIDS JSON events sidecar format.

  • col_names (list, None) – A list of the cols to include in the flattened sidecar.

Returns:

Four-column spreadsheet representing HED portion of sidecar.

Return type:

DataFrame

Notes

  • The returned DataFrame has columns: column_name, column_value, description, and HED.

merge_hed_dict(sidecar_dict, hed_dict)[source]

Update a JSON sidecar based on the hed_dict values.

Parameters:
  • sidecar_dict (dict) – Dictionary representation of a BIDS JSON sidecar.

  • hed_dict (dict) – Dictionary derived from a dataframe representation of HED in sidecar.

series_to_factor(series) list[int][source]

Convert a series to an integer factor list.

Parameters:

series (pd.Series) – Series to be converted to a list.

Returns:

list[int] - contains 0’s and 1’s, empty, ‘n/a’ and np.nan are converted to 0.

str_to_tabular(tsv_str, sidecar=None) TabularInput[source]

Return a TabularInput a tsv string.

Parameters:
  • tsv_str (str) – A string representing a tabular input.

  • sidecar – An optional Sidecar object.

strs_to_hed_objs(hed_strings, hed_schema) list[HedString] | None[source]

Returns a list of HedString objects from a list of strings.

Parameters:
  • hed_strings (string or list) – String or strings representing HED annotations.

  • hed_schema (HedSchema or HedSchemaGroup) – Schema version for the strings.

Returns:

A list of HedString objects or None.

Return type:

Union[list[HedString], None]

strs_to_sidecar(sidecar_strings) Sidecar | None[source]

Return a Sidecar from a sidecar as string or as a list of sidecars as strings.

Parameters:

sidecar_strings (string or list) – String or strings representing sidecars.

Returns:

the merged sidecar from the list.

Return type:

Union[Sidecar, None]

to_factor(data, column=None) list[int][source]

Convert data to an integer factor list.

Parameters:
  • data (Series or DataFrame) – Series or DataFrame to be converted to a list.

  • column (str, optional) – Column name if DataFrame, otherwise column 0 is used.

Returns:

A list containing 0’s and 1’s. Empty, ‘n/a’, and np.nan values are converted to 0.

Return type:

list[int]

to_strlist(obj_list) list[str][source]

Convert objects in a list to strings, preserving None values.

Parameters:

obj_list (list) – A list of objects that are None or have a str method.

Returns:

A list with the objects converted to strings. None values are preserved as empty strings.

Return type:

list[str]

BIDS tools

BidsDataset

class BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]

Bases: object

A BIDS dataset representation primarily focused on HED evaluation.

root_path

Real root path of the BIDS dataset.

Type:

str

schema

The schema used for evaluation.

Type:

HedSchema or HedSchemaGroup

file_groups

A dictionary of BidsFileGroup objects with a given file suffix.

Type:

dict

get_file_group(suffix)[source]

Return the file group of files with the specified suffix.

Parameters:

suffix (str) – Suffix of the BidsFileGroup to be returned.

Returns:

The requested tabular group.

Return type:

Union[BidsFileGroup, None]

get_summary()[source]

Return an abbreviated summary of the dataset.

validate(check_for_warnings=False, schema=None)[source]

Validate the dataset.

Parameters:
  • check_for_warnings (bool) – If True, check for warnings.

  • schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.

Returns:

List of issues encountered during validation. Each issue is a dictionary.

Return type:

list

BidsFile

class BidsFile(file_path)[source]

Bases: object

A BIDS file with entity dictionary.

file_path

Real path of the file.

Type:

str

suffix

Suffix part of the filename.

Type:

str

ext

Extension (including the .).

Type:

str

entity_dict

Dictionary of entity-names (keys) and entity-values (values).

Type:

dict

Notes

  • This class may hold the merged sidecar giving metadata for this file as well as contents.

clear_contents()[source]

Set the contents attribute of this object to None.

property contents

Return the current contents of this object.

get_entity(entity_name)[source]

Return the entity value for the specified entity.

Parameters:

entity_name (str) – Name of the BIDS entity, for example task, run, or sub.

Returns:

Entity value if any, otherwise None.

Return type:

Union[str, None]

get_key(entities=None)[source]

Return a key for this BIDS file given a list of entities.

Parameters:

entities (tuple) – A tuple of strings representing entities.

Returns:

A key based on this object.

Return type:

str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this object.

Parameters:
  • content_info (Any) – JSON dictionary The contents appropriate for this object.

  • overwrite (bool) – If False and the contents are not empty, do nothing.

Notes

  • Do not set if the contents are already set and no_overwrite is True.

BidsFileGroup

class BidsFileGroup(root_path, file_list, suffix='events')[source]

Bases: object

Container for BIDS files with a specified suffix.

suffix

The file suffix specifying the class of file represented in this group (e.g., events).

Type:

str

sidecar_dict

A dictionary of sidecars associated with this suffix .

Type:

dict

datafile_dict

A dictionary with values either BidsTabularFile or BidsTimeseriesFile.

Type:

dict

sidecar_dir_dict

Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.

Type:

dict

static create_file_group(root_path, file_list, suffix)[source]

Construct a BidsFileGroup from a list of files sharing the given suffix.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • file_list (list[str]) – List of file paths belonging to this suffix group.

  • suffix (str) – BIDS file suffix identifying this group (e.g. events).

Returns:

The constructed group, or None if it contains no sidecars or data files.

Return type:

BidsFileGroup or None

get_task_names()[source]

Return a sorted list of unique task names found in the file group’s TSV and JSON filenames.

Returns:

Sorted list of unique task name strings (the xxxx portion of task-xxxx entities).

Return type:

list

Notes

  • Parses both sidecar_dict and datafile_dict file paths.

  • The BIDS task- entity is matched case-insensitively.

summarize(value_cols=None, skip_cols=None)[source]

Return a BidsTabularSummary of group files.

Parameters:
  • value_cols (list) – Column names designated as value columns.

  • skip_cols (list) – Column names designated as columns to skip.

Returns:

A summary of the number of values in different columns if tabular group.

Return type:

Union[TabularSummary, None]

Notes

  • The columns that are not value_cols or skip_col are summarized by counting

the number of times each unique value appears in that column.

validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]

Validate the sidecars and datafiles and return a list of issues.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • check_for_warnings (bool) – If True, include warnings in the check.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate the datafiles and return an error list.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

Notes: This will clear the contents of the datafiles if they were not previously set.

validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate merged sidecars.

Parameters:
Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

BidsSidecarFile

class BidsSidecarFile(file_path)[source]

Bases: BidsFile

A BIDS sidecar file.

clear_contents()

Set the contents attribute of this object to None.

property contents

Return the current contents of this object.

get_entity(entity_name)

Return the entity value for the specified entity.

Parameters:

entity_name (str) – Name of the BIDS entity, for example task, run, or sub.

Returns:

Entity value if any, otherwise None.

Return type:

Union[str, None]

get_key(entities=None)

Return a key for this BIDS file given a list of entities.

Parameters:

entities (tuple) – A tuple of strings representing entities.

Returns:

A key based on this object.

Return type:

str

Notes

If entities is None, then the file path is used as the key.

static is_hed(json_dict)[source]

Return True if the json has HED.

Parameters:

json_dict (dict) – A dictionary representing a JSON file or merged file.

Returns:

True if the dictionary has HED or HED_assembled as a first or second-level key.

Return type:

bool

is_sidecar_for(obj)[source]

Return True if this is a sidecar for obj.

Parameters:

obj (BidsFile) – A BidsFile object to check.

Returns:

True if this is a BIDS parent of obj and False otherwise.

Return type:

bool

Notes

  • A sidecar is a sidecar for itself.

static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]

Merge a list of sidecars into a single sidecar.

Parameters:
  • sidecar_list (list) – A list of Sidecar objects.

  • name (str) – The name of the merged sidecar.

Returns:

A sidecar constructed from the merged list.

Return type:

Union[Sidecar, None]

set_contents(content_info=None, name='unknown', overwrite=False)[source]

Set the contents of the sidecar.

Parameters:
  • content_info (dict, or None) – If None, create a Sidecar from the object’s file-path.

  • name (str) – The name of the sidecar.

  • overwrite (bool) – If True, overwrite contents if already set.

Notes

  • The handling of content_info is as follows:
    • None: This object’s file_path is used.

    • dict: This is interpreted as a JSON dictionary.

BidsTabularFile

class BidsTabularFile(file_path)[source]

Bases: BidsFile

A BIDS tabular file including its associated sidecar.

clear_contents()

Set the contents attribute of this object to None.

property contents

Return the current contents of this object.

get_entity(entity_name)

Return the entity value for the specified entity.

Parameters:

entity_name (str) – Name of the BIDS entity, for example task, run, or sub.

Returns:

Entity value if any, otherwise None.

Return type:

Union[str, None]

get_key(entities=None)

Return a key for this BIDS file given a list of entities.

Parameters:

entities (tuple) – A tuple of strings representing entities.

Returns:

A key based on this object.

Return type:

str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.

Parameters:
  • content_info (None) – This always uses the internal file_path to create the contents.

  • overwrite (bool) – If False (The Default), do not overwrite existing contents if any.

set_sidecar(sidecar)[source]

Set the sidecar for this tabular file.

Parameters:

sidecar (Sidecar) – The sidecar for this tabular file.

BIDS utilities

BIDS utility functions for schema loading, sidecar merging, and inheritance chain resolution.

get_candidates(source_dir, tsv_file_dict)[source]

Return sidecar JSON files in source_dir that are applicable to tsv_file_dict.

Parameters:
  • source_dir (str) – Directory to search for candidate sidecar files.

  • tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

Absolute paths to matching sidecar JSON files.

Return type:

list[str]

get_merged_sidecar(root_path, tsv_file)[source]

Return a merged sidecar dict following BIDS inheritance rules for a given TSV file.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • tsv_file (str) – Path to the TSV file whose inherited sidecars should be merged.

Returns:

Merged sidecar dictionary. Keys from closer (more specific) sidecar files take precedence.

Return type:

dict

get_schema_from_description(root_path)[source]

Load the HED schema version declared in the BIDS dataset_description.json.

Parameters:

root_path (str) – Root path of the BIDS dataset.

Returns:

The loaded schema, or None if loading fails.

Return type:

HedSchema or None

group_by_suffix(file_list)[source]

Group files by suffix.

Parameters:

file_list (list) – List of file paths.

Returns:

Dictionary with suffixes as keys and file lists as values.

Return type:

dict

matches_criteria(json_file_dict, tsv_file_dict)[source]

Return True if a candidate sidecar JSON file applies to the given TSV file.

A sidecar applies when its extension is .json, its suffix matches the TSV, and all BIDS entities in the JSON filename have equal values in the TSV filename.

Parameters:
  • json_file_dict (dict) – Parsed BIDS filename dict for the candidate JSON file.

  • tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

True if the sidecar is applicable.

Return type:

bool

parse_bids_filename(file_path)[source]

Split a filename into BIDS-relevant components.

Parameters:

file_path (str) – Path to be parsed.

Returns:

Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.

Return type:

dict

Notes

  • Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.

update_entity(name_dict, entity)[source]

Update the dictionary with a new entity.

Parameters:
  • name_dict (dict) – Dictionary of entities.

  • entity (str) – Entity to be added.

walk_back(root_path, file_path)[source]

Yield inherited sidecar file paths from the directory of file_path back toward root_path.

Traverses parent directories from the file’s location up to root_path, yielding any sidecar JSON files that apply to the given TSV according to BIDS inheritance rules.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • file_path (str) – Path to the data file whose applicable sidecars should be found.

Yields:

str – Absolute paths of applicable sidecar JSON files, from nearest to farthest.

Utility functions

DataFrame utilities

Data handling utilities involving dataframes.

add_columns(df, column_list, value='n/a')[source]

Add specified columns to df if not there.

Parameters:
  • df (DataFrame) – Pandas dataframe.

  • column_list (list) – List of columns to append to the dataframe.

  • value (str) – Default fill value for the column.

check_match(ds1, ds2, numeric=False)[source]

Check two Pandas data series have the same values.

Parameters:
  • ds1 (DataSeries) – Pandas data series to check.

  • ds2 (DataSeries) – Pandas data series to check.

  • numeric (bool) – If True, treat as numeric and do close-to comparison.

Returns:

Error messages indicating the mismatch or empty if the series match.

Return type:

list

delete_columns(df, column_list)[source]

Delete the specified columns from a dataframe.

Parameters:
  • df (DataFrame) – Pandas dataframe from which to delete columns.

  • column_list (list) – List of candidate column names for deletion.

Notes

  • The deletion of columns is done in place.

  • This does not raise an error if df does not have a column in the list.

delete_rows_by_column(df, value, column_list=None)[source]

Delete rows where columns have this value.

Parameters:
  • df (DataFrame) – Pandas dataframe from which to delete rows.

  • value (str) – Specified value to indicate row should be deleted.

  • column_list (list) – List of columns to search for value.

Notes

  • All values are converted to string before testing.

  • Deletion is done in place.

get_eligible_values(values, values_included)[source]

Return a list of the items from values that are in values_included or None if no values_included.

Parameters:
  • values (list) – List of strings against which to test.

  • values_included (list) – List of items to be selected from values if they are present.

Returns:

list of selected values or None if values_included is empty or None.

Return type:

list

get_key_hash(key_tuple)[source]

Calculate a hash key for tuple of values.

Parameters:

key_tuple (tuple, list) – The key values in the correct order for lookup.

Returns:

A hash key for the tuple.

Return type:

int

get_new_dataframe(data)[source]

Get a new dataframe representing a tsv file.

Parameters:

data (DataFrame or str) – DataFrame or filename representing a tsv file.

Returns:

A dataframe containing the contents of the tsv file or if data was

a DataFrame to start with, a new copy of the DataFrame.

Return type:

DataFrame

Raises:

HedFileError

  • A filename is given, and it cannot be read into a Dataframe.

get_row_hash(row, key_list)[source]

Get a hash key from key column values for row.

Parameters:
  • row (DataSeries)

  • key_list (list)

Returns:

Hash key constructed from the entries of row in the columns specified by key_list.

Return type:

str

Raises:

HedFileError

  • If row doesn’t have all the columns in key_list HedFileError is raised.

get_value_dict(tsv_path, key_col='file_basename', value_col='sampling_rate')[source]

Get a dictionary of two columns of a dataframe.

Parameters:
  • tsv_path (str) – Path to a tsv file with a header row to be read into a DataFrame.

  • key_col (str) – Name of the column which should be the key.

  • value_col (str) – Name of the column which should be the value.

Returns:

Dictionary with key_col values as the keys and the corresponding value_col values as the values.

Return type:

dict

Raises:

HedFileError – When tsv_path does not correspond to a file that can be read into a DataFrame.

make_info_dataframe(col_info, selected_col)[source]

Get a dataframe from selected columns.

Parameters:
  • col_info (dict) – Dictionary of dictionaries of column values and counts.

  • selected_col (str) – Name of the column used as top level key for col_info.

Returns:

A two-column dataframe with first column containing values from the

dictionary whose key is selected_col and whose second column are the corresponding counts. The returned value is None if selected_col is not a top-level key in col_info.

Return type:

dataframe

reorder_columns(data, col_order, skip_missing=True)[source]

Create a new dataframe with columns reordered.

Parameters:
  • data (DataFrame, str) – Dataframe or filename of dataframe whose columns are to be reordered.

  • col_order (list) – List of column names in desired order.

  • skip_missing (bool) – If true, col_order columns missing from data are skipped, otherwise error.

Returns:

A new reordered dataframe.

Return type:

DataFrame

Raises:
  • HedFileError – If col_order contains columns not in data and skip_missing is False.

  • If data corresponds to a filename from which a dataframe cannot be created.

replace_na(df)[source]

Replace (in place) the n/a with np.nan taking care of categorical columns.

replace_values(df, values=None, replace_value='n/a', column_list=None)[source]

Replace string values in specified columns.

Parameters:
  • df (DataFrame) – Dataframe whose values will be replaced.

  • values (list, None) – List of strings to replace. If None, only empty strings are replaced.

  • replace_value (str) – String replacement value.

  • column_list (list, None) – List of columns in which to do replacement. If None all columns are processed.

Returns:

number of values replaced.

Return type:

int

separate_values(values, target_values)[source]

Get target values from the target_values list.

Parameters:
  • values (list) – List of values to be tested.

  • target_values – List of desired values.

File/IO utilities

Utilities for generating and handling file names.

check_filename(test_file, name_prefix=None, name_suffix=None, extensions=None)[source]

Return True if correct extension, suffix, and prefix.

Parameters:
  • test_file (str) – Path of filename to test.

  • name_prefix (list, str, None) – An optional name_prefix or list of prefixes to accept for the base filename.

  • name_suffix (list, str, None) – An optional name_suffix or list of suffixes to accept for the base file name.

  • extensions (list, str, None) – An optional extension or list of extensions to accept for the extensions.

Returns:

True if file has the appropriate format.

Return type:

bool

Notes

  • Everything is converted to lower case prior to testing so this test should be case-insensitive.

  • None indicates that all are accepted.

clean_filename(filename)[source]

Replace invalid characters with under-bars.

Parameters:

filename (str) – source filename.

Returns:

The filename with anything but alphanumeric, period, hyphens, and under-bars removed.

Return type:

str

extract_suffix_path(path, prefix_path)[source]

Return the suffix of path after prefix path has been removed.

Parameters:
  • path (str)

  • prefix_path (str)

Returns:

Suffix path.

Return type:

str

Notes

  • This function is useful for creating files within BIDS datasets.

get_allowed(value, allowed_values=None, starts_with=True)[source]

Return the portion of the value that matches a value in allowed_values or None if no match.

Parameters:
  • value (str) – value to be matched.

  • allowed_values (list, str, or None) – Values to match.

  • starts_with (bool) – If True match is done at beginning of string, otherwise the end.

Returns:

portion of value that matches the various allowed_values.

Return type:

Union[str,list]

Notes

  • match is done in lower case.

get_alphanumeric_path(pathname, replace_char='_')[source]

Replace sequences of non-alphanumeric characters in string (usually a path) with specified character.

Parameters:
  • pathname (str) – A string usually representing a pathname, but could be any string.

  • replace_char (str) – Replacement character(s).

Returns:

New string with characters replaced.

Return type:

str

get_basename(file_path)[source]

Return the base filename (without extension) for the given path.

Parameters:

file_path (str) – Path to a file.

Returns:

The filename stem, e.g. sub-01_task-rest_events for sub-01_task-rest_events.tsv.

Return type:

str

get_file_list(root_path, name_prefix=None, name_suffix=None, extensions=None, exclude_dirs=None)[source]

Return paths satisfying various conditions.

Parameters:
  • root_path (str) – Full path of the directory tree to be traversed (no ending slash).

  • name_prefix (list, str, None) – An optional prefix for the base filename.

  • name_suffix (list, str, None) – An optional suffix for the base filename.

  • extensions (list, None) – A list of extensions to be selected.

  • exclude_dirs (list, None) – A list of paths to be excluded.

Returns:

The full paths.

Return type:

list

Notes: Exclude directories are paths relative to the root path.

get_filtered_by_element(file_list, elements)[source]

Filter a file list by whether the base names have a substring matching any of the members of elements.

Parameters:
  • file_list (list) – List of file paths to be filtered.

  • elements (list) – List of strings to use as filename filters.

Returns:

The list only containing file paths whose filenames match a filter.

Return type:

list

get_filtered_list(file_list, name_prefix=None, name_suffix=None, extensions=None)[source]

Get list of filenames satisfying the criteria.

Everything is converted to lower case prior to testing so this test should be case-insensitive.

Parameters:
  • file_list (list) – List of files to test.

  • name_prefix (str) – Optional name_prefix for the base filename.

  • name_suffix (str) – Optional name_suffix for the base filename.

  • extensions – Optional list of file extensions (allows two periods (.tsv.gz)).

get_full_extension(filename)[source]

Return the full extension of a file, including the period.

Parameters:

filename (str) – The filename to be parsed.

Returns:

  • File name without extension

  • Full extension

Return type:

Tuple[str, str]

get_path_components(root_path, this_path)[source]

Get a list of the remaining components after root path.

Parameters:
  • root_path (str) – A path (no trailing separator).

  • this_path (str) – The path of a file or directory descendant of root_path.

Returns:

A list with the remaining elements directory components to the file.

Return type:

Union[list, None]

Notes: this_path must be a descendant of root_path.

get_task_dict(files)[source]

Return a dictionary of the tasks that appear in the file names of a list of files.

Parameters:

files (list) – List of filenames to be separated by task.

Returns:

dictionary of filenames keyed by task name.

Return type:

dict

get_task_from_file(file_path)[source]

Returns the task name entity from a BIDS-type file path.

Parameters:

file_path (str) – File path.

Returns:

The task name or an empty string.

Return type:

str

get_timestamp()[source]

Return a timestamp string suitable for using in filenames.

Returns:

Represents the current time.

Return type:

str

get_unique_suffixes(file_paths, extensions=None)[source]

Get unique suffixes from file paths with specified extensions.

Parameters:
  • file_paths (list) – List of file paths to process.

  • extensions (list or None) – List of file extensions to filter. If None, defaults to [‘.json’, ‘.tsv’].

Returns:

Set of unique suffixes found.

Return type:

set

separate_by_ext(file_paths)[source]

Separate a list of files into tsv and json files.

Parameters:

file_paths (list) – A list of file paths.

Returns:

key is extension and value is list of files with that extension.

Return type:

dict

Schema utilities

Utilities

flatten_schema(hed_schema, skip_non_tag=False)[source]

Returns a 3-column dataframe representing a schema.

Parameters:
  • hed_schema (HedSchema) – the schema to flatten

  • skip_non_tag (bool) – Skips all sections except tag

Returns:

Represents a HED schema in flattened form.

Return type:

DataFrame