Tools

Utility functions and data processing tools for HED operations.

Analysis tools

EventManager

class hed.tools.analysis.event_manager.EventManager(input_data, hed_schema, extra_defs=None)[source]

Bases: object

Manager of events of temporal extent.

__init__(input_data, hed_schema, extra_defs=None)[source]

Create an event manager for an events file. Manages events of temporal extent.

Parameters:
  • input_data (TabularInput) – Represents an events file with its sidecar.

  • hed_schema (HedSchema) – HED schema used.

  • extra_defs (DefinitionDict) – Extra definitions not included in the input_data information.

Raises:

HedFileError – If there are any unmatched offsets.

Notes: Keeps the events of temporal extend by their starting index in events file. These events are separated from the rest of the annotations, which are contained in self.hed_strings.

unfold_context(remove_types=None)[source]

Unfold the event information into a tuple based on context.

Parameters:

remove_types (list or None) – List of types to remove. If None, defaults to empty list.

Returns:

Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.

Return type:

tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]

str_list_to_hed(str_list)[source]

Create a HedString object from a list of strings.

Parameters:

str_list (list) – A list of strings to be concatenated with commas and then converted.

Returns:

The converted list.

Return type:

Union[HedString, None]

get_type_defs(types)[source]

Return a list of definition names (lower case) that correspond to any of the specified types.

Parameters:

types (list or None) – List of tags that are treated as types such as ‘Condition-variable’

Returns:

List of definition names (lower-case) that correspond to the specified types

Return type:

list

static compress_strings(list_to_compress)[source]

Compress a list of lists of strings into a single str with comma-separated elements.

Parameters:

list_to_compress (list) – List of lists of HED str to turn into a list of single HED strings.

Returns:

List of same length as list_to_compress with each entry being a str.

Return type:

list

EventChecker

class hed.tools.analysis.event_checker.EventChecker(hed_obj, line_number, original_line_number=None, error_handler=None)[source]

Bases: object

Validates that HED-annotated events meet quality requirements such as having a top-level event tag.

EVENT_TAGS = {'Agent-action', 'Data-feature', 'Event', 'Experiment-control', 'Experiment-structure', 'Measurement-event', 'Sensory-event'}
NON_TASK_EVENTS = {'Data-feature', 'Experiment-control', 'Experiment-structure', 'Measurement-event'}
TASK_ROLES = {'Cue', 'Experimental-stimulus', 'Feedback', 'Incidental', 'Instructional', 'Mishap', 'Participant-response', 'Task-activity', 'Warning'}
ACTION_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Done-indication', 'Imagined-action', 'Inappropriate-action', 'Incorrect-action', 'Indeterminate-action', 'Miss', 'Near-miss', 'Omitted-action', 'Ready-indication'}
STIMULUS_ROLES = {'Distractor', 'Expected', 'Extraneous', 'Go-signal', 'Meaningful', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Penalty', 'Planned', 'Priming', 'Query', 'Reward', 'Stop-signal', 'Target', 'Threat', 'Timed', 'Unexpected', 'Unplanned'}
ALL_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Cue', 'Distractor', 'Done-indication', 'Expected', 'Experimental-stimulus', 'Extraneous', 'Feedback', 'Go-signal', 'Imagined-action', 'Inappropriate-action', 'Incidental', 'Incorrect-action', 'Indeterminate-action', 'Instructional', 'Meaningful', 'Mishap', 'Miss', 'Near-miss', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Omitted-action', 'Participant-response', 'Penalty', 'Planned', 'Priming', 'Query', 'Ready-indication', 'Reward', 'Stop-signal', 'Target', 'Task-activity', 'Threat', 'Timed', 'Unexpected', 'Unplanned', 'Warning'}
__init__(hed_obj, line_number, original_line_number=None, error_handler=None)[source]

Constructor for the EventChecker class.

Parameters:
  • hed_obj (HedString) – The HED string to check.

  • line_number (int or None) – The index of the HED string in the file.

  • original_line_number (int or None) – The original line number in the file.

  • error_handler (ErrorHandler) – The ErrorHandler object to use for error handling.

EventsChecker

class hed.tools.analysis.event_checker.EventsChecker(hed_schema, input_data, name=None)[source]

Bases: object

Class to check for event tag quality errors in an event file.

REMOVE_TYPES = ['Condition-variable', 'Task']
__init__(hed_schema, input_data, name=None)[source]

Constructor for the EventChecker class.

Parameters:
  • hed_schema (HedSchema) – The HedSchema object to check.

  • input_data (TabularInput) – The input data object to check.

  • name (str) – The name to display for this file for error purposes.

validate_event_tags()[source]

Verify that the events in the HED strings validly represent events.

Returns:

each element is a dictionary with ‘code’ and ‘message’ keys,

Return type:

list

insert_issue_details(issues)[source]

Inserts issue details as part of the ‘message’ key for a list of issues.

Parameters:

issues (list) – List of issues to get details for.

static get_issue_details(data_info, side_data)[source]

Get the source details for the issue.

Parameters:
  • data_info (pd.Series) – The row information from the original tsv.

  • side_data (pd.Series) – The sidecar data.

Returns:

The HED associated with the relevant columns.

Return type:

list

static get_hed_source(hed_dict, value)[source]

Get the source of the HED string.

Parameters:

hed_dict (HedTag) – The HedTag object to get the source for.

Returns:

The source of the HED string.

Return type:

str

get_onset_lines(line)[source]

Get the lines in the input data with the same line numbers as the data_frame.

static get_error_lines(issues)[source]

Get the lines grouped by code.

Parameters:

issues (list) – A list of issues to check.

Returns:

A dict with keys that are error codes and values that are lists of line numbers.

Return type:

dict

EventsSummary

class hed.tools.analysis.events_summary.EventsSummary(hed_schema, file, sidecar=None, name=None)[source]

Bases: object

Summarizes HED event annotations for a tabular file, grouping tags by stimulus/response categories.

REMOVE_TYPES = ['Condition-variable', 'Task']
MATCH_TYPES = ['Experimental-stimulus', 'Participant-response', 'Cue', 'Feedback', 'Instructional', 'Sensory-event', 'Agent-action']
EXCLUDED_PARENTS = {'data-marker', 'data-resolution', 'grayscale', 'hsv-color', 'informational-property', 'luminance', 'luminance-contrast', 'opacity', 'organizational-property', 'quantitative-value', 'relation', 'rgb-color', 'spatiotemporal-value', 'statistical-value', 'task-effect-evidence', 'task-relationship'}
CUTOFF_TAGS = {'blue-color', 'brown-color', 'cyan-color', 'gray-color', 'green-color', 'orange-color', 'pink-color', 'purple-color', 'red-color', 'visual-presentation', 'white-color', 'yellow-color'}
FILTERED_TAGS = {'action', 'agent', 'agent-cognitive-state', 'agent-emotional-state', 'agent-physiological-state', 'agent-postural-state', 'agent-property', 'agent-state', 'agent-task-role', 'agent-trait', 'anatomical-item', 'auditory-attribute', 'auditory-device', 'biological-artifact', 'biological-item', 'body-part', 'categorical-class-value', 'categorical-judgment-value', 'categorical-level-value', 'categorical-location-value', 'categorical-orientation-value', 'categorical-value', 'computing-device', 'dara-source-type', 'data-property', 'data-value', 'data-variability-attribute', 'device', 'display-device', 'document', 'environmental-property', 'event', 'face-part', 'geometric-object', 'gustatory-attribute', 'head-part', 'input-device', 'io-device', 'item', 'language-item', 'lower-extremity-part', 'man-made-object', 'media', 'media-clip', 'move-body-part', 'natural-object', 'nonbiological-artifact', 'object', 'olfactory-attribute', 'output-device', 'physical-value', 'property', 'recording-device', 'sensory-attribute', 'sensory-presentation', 'sensory-property', 'spatial-property', 'spectral-property', 'tactile-attribute', 'task-action-type', 'task-attentional-demand', 'task-event-role', 'task-property', 'task-stimulus-role', 'temporal-property', 'torso-part', 'upper-extremity-part', 'visual-attribute', 'visualization'}
__init__(hed_schema, file, sidecar=None, name=None)[source]

Constructor for the HedString class.

extract_tag_summary()[source]

Extract a summary of the tags in a given tabular input file.

Returns:

  • dict: A dictionary with the summary information - (str, list)

  • list: A set of tags that do not match any of the specified types but are not excluded.

Return type:

tuple[dict, list]

static match_tags(all_tags, key)[source]

Return True if any tag in all_tags has a short_base_tag matching key.

Parameters:
  • all_tags (list[HedTag]) – The tags to search.

  • key (str) – The short base tag name to look for.

Returns:

True if a match is found.

Return type:

bool

update_tags(tag_set, all_tags)[source]

Add the most-specific ancestor tag names from all_tags into tag_set, respecting cutoff categories.

Parameters:
  • tag_set (set) – The running set of tag terms to update.

  • all_tags (list[HedTag]) – Tags to process.

Returns:

The updated tag_set.

Return type:

set

HedTagManager

class hed.tools.analysis.hed_tag_manager.HedTagManager(event_manager, remove_types=None)[source]

Bases: object

Manager for the HED tags from a columnar file.

__init__(event_manager, remove_types=None)[source]

Create a tag manager for one tabular file.

Parameters:
  • event_manager (EventManager) – an event manager for the tabular file.

  • remove_types (list or None) – List of type tags (such as condition-variable) to remove. If None, defaults to empty list.

get_hed_objs(include_context=True, replace_defs=False)[source]

Return a list of HED string objects of same length as the tabular file.

Parameters:
  • include_context (bool) – If True (default), include the Event-context group in the HED string.

  • replace_defs (bool) – If True (default=False), replace the Def tags with Definition contents.

Returns:

list - List of HED strings of same length as tabular file.

get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]

Return a HED string object with the types removed.

Parameters:
  • hed_str (str) – Represents a HED string.

  • remove_types (bool) – If False (the default), do not remove the types managed by this manager.

  • remove_group (bool) – If False (the default), do not remove the group when removing a type tag, otherwise remove its enclosing group.

HedTagCount

class hed.tools.analysis.hed_tag_counts.HedTagCount(hed_tag, file_name)[source]

Bases: object

Counts for a particular HedTag in particular file.

__init__(hed_tag, file_name)[source]
Parameters:
  • hed_tag (HedTag) – The HedTag to keep track of.

  • file_name (str) – Name of the file associated with the tag.

set_value(hed_tag)[source]

Update the tag term value counts for a HedTag.

Parameters:

hed_tag (HedTag or None) – Item to use to update the value counts.

get_info(verbose=False) dict[source]

Return counts for this tag.

Parameters:

verbose (bool) – If False (the default) only number of files included, otherwise a list of files.

Returns:

Keys are ‘tag’, ‘events’, and ‘files’.

Return type:

dict

get_summary() dict[source]

Return a dictionary summary of the events and files for this tag.

Returns:

dictionary summary of events and files that contain this tag.

Return type:

dict

get_empty()[source]

Return a copy of this entry with counts reset to zero.

Returns:

A new instance with the same tag name but zeroed event/file counts.

Return type:

HedTagCount

HedTagCounts

class hed.tools.analysis.hed_tag_counts.HedTagCounts(name, total_events=0)[source]

Bases: object

Counts of HED tags for a group of columnar files.

Parameters:
  • name (str) – An identifier for these counts (usually the filename of the tabular file).

  • total_events (int) – The total number of events in the columnar file.

__init__(name, total_events=0)[source]
update_tag_counts(hed_string_obj, file_name)[source]

Update the tag counts based on a HedString object.

Parameters:
  • hed_string_obj (HedString) – The HED string whose tags should be counted.

  • file_name (str) – The name of the file corresponding to these counts.

organize_tags(tag_template) tuple[source]

Organize tags into categories as specified by the tag_template.

Parameters:

tag_template (dict) – A dictionary whose keys are titles and values are lists of HED tags (str).

Returns:

A tuple containing two elements. - dict: Keys are tags (strings) and values are list of HedTagCount for items fitting template. - list: HedTagCount objects corresponding to tags that don’t fit the template.

Return type:

[tuple[dict, list]]

merge_tag_dicts(other_dict)[source]

Merge the information from another dictionary with this object’s tag dictionary.

Parameters:

other_dict (dict) – Dictionary of tag, HedTagCount to merge.

get_summary() dict[source]

Return a summary object containing the tag count information of this summary.

Returns:

Keys are ‘name’, ‘files’, ‘total_events’, and ‘details’.

Return type:

dict

static create_template(tags) dict[source]

Creates a dictionary with keys based on list of keys in tags dictionary.

Parameters:

tags (dict) – dictionary of tags and key lists.

Returns:

Dictionary with keys in key lists and values are empty lists.

Return type:

dict

Note: This class is used to organize the results of the tags based on a template for display.

HedTypeManager

class hed.tools.analysis.hed_type_manager.HedTypeManager(event_manager)[source]

Bases: object

Manager for type factors and type definitions.

__init__(event_manager)[source]

Create a variable manager for one tabular file for all type variables.

Parameters:

event_manager (EventManager) – An event manager for the tabular file.

Raises:

HedFileError – On errors such as unmatched onsets or missing definitions.

property types

Return a list of types managed by this manager.

Returns:

Type tags names.

Return type:

list

add_type(type_name)[source]

Add a type variable to be managed by this manager.

Parameters:

type_name (str) – Type tag name of the type to be added.

get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]

Return a DataFrame of factor vectors for the indicated HED tag and values.

Parameters:
  • type_tag (str) – HED tag to retrieve factors for.

  • type_values (list or None) – The values of the tag to create factors for or None if all unique values.

  • factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

Union[pd.DataFrame, None]

get_type(type_tag)[source]

Returns the HedType variable associated with the type tag.

Parameters:

type_tag (str) – HED tag to retrieve the type for.

Returns:

the values associated with this type tag.

Return type:

Union[HedType, None]

get_type_tag_factor(type_tag, type_value)[source]

Return the HedTypeFactors a specified value and extension.

Parameters:
  • type_tag (str or None) – HED tag for the type.

  • type_value (str or None) – Value of this tag to return the factors for.

get_type_def_names(type_var)[source]

Return the definitions associated with a particular type tag.

Parameters:

type_var (str) – The name of a type tag such as Condition-variable.

Returns:

Names of definitions that use this type.

Return type:

list

summarize_all(as_json=False)[source]

Return a dictionary containing the summaries for the types managed by this manager.

Parameters:

as_json (bool) – If False (the default), return as an object otherwise return as a JSON string.

Returns:

Dictionary with the summary.

Return type:

Union[dict, str]

HedType

class hed.tools.analysis.hed_type.HedType(event_manager, name, type_tag='condition-variable')[source]

Bases: object

Manager of a type variable and its associated context.

__init__(event_manager, name, type_tag='condition-variable')[source]

Create a variable manager for one type-variable for one tabular file.

Parameters:
  • event_manager (EventManager) – Event manager instance

  • name (str) – Name of the tabular file as a unique identifier.

  • type_tag (str) – Lowercase short form of the tag to be managed.

Raises:

HedFileError – On errors such as unmatched onsets or missing definitions.

property total_events

Return the total number of events in the associated event list.

Returns:

Number of events.

Return type:

int

get_type_value_factors(type_value)[source]

Return the HedTypeFactors associated with type_name or None.

Parameters:

type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).

Returns:

Union[HedTypeFactors, None]

get_type_value_level_info(type_value)[source]

Return type variable corresponding to type_value.

Parameters:

type_value (str)

Returns:

property type_variables

Return the set of type-value names (keys) found in this HedType.

Returns:

Set of lowercased type-value name strings.

Return type:

set[str]

get_type_def_names()[source]

Return the type defs names

get_type_value_names()[source]

Return the list of type-value names defined in this HedType.

Returns:

Lowercased type-value name strings.

Return type:

list[str]

get_summary()[source]

Return a summary dict mapping each type-value name to its factor summary.

Returns:

Keys are type-value name strings; values are factor summary dicts.

Return type:

dict

get_type_factors(type_values=None, factor_encoding='one-hot')[source]

Create a dataframe with the indicated type tag values as factors.

Parameters:
  • type_values (list or None) – A list of values of type tags for which to generate factors.

  • factor_encoding (str) – Type of factor encoding (one-hot or categorical).

Returns:

Contains the specified factors associated with this type tag.

Return type:

pd.DataFrame

static get_type_list(type_tag, item)[source]

Find a list of the given type tag from a HedTag, HedGroup, or HedString.

Parameters:
  • type_tag (str) – a tag whose direct items you wish to remove

  • item (HedTag or HedGroup) – The item from which to extract condition type_variables.

Returns:

List of the items with this type_tag

Return type:

list

HedTypeDefs

class hed.tools.analysis.hed_type_defs.HedTypeDefs(definitions, type_tag='condition-variable')[source]

Bases: object

Manager for definitions associated with a type such as condition-variable.

Properties:

def_map (dict): keys are definition names, values are dict {type_values, description, tags}.

Example: A definition ‘famous-face-cond’ with contents:

‘(Condition-variable/Face-type,Description/A face that should be recognized.,(Image,(Face,Famous)))’

would have type_values [‘face_type’]. All items are strings not objects.

__init__(definitions, type_tag='condition-variable')[source]

Create a definition manager for a type of variable.

Parameters:
  • definitions (dict or DefinitionDict) – A dictionary of DefinitionEntry objects.

  • type_tag (str) – Lower-case HED tag string representing the type managed.

get_type_values(item)[source]

Return a list of type_tag values in item.

Parameters:

item (HedTag, HedGroup, or HedString) – An item potentially containing def tags.

Returns:

A list of the unique values associated with this type

Return type:

list

property type_def_names

Return list of names of definition that have this type-variable.

Returns:

definition names that have this type.

Return type:

list

property type_names

Return list of names of the type-variables associated with type definitions.

Returns:

type names associated with the type definitions

Return type:

list

static extract_def_names(item, no_value=True)[source]

Return a list of Def values in item.

Parameters:
  • item (HedTag, HedGroup, or HedString) – An item containing a def tag.

  • no_value (bool) – If True, strip off extra values after the definition name.

Returns:

A list of definition names (as strings).

Return type:

list

static split_name(name, lowercase=True)[source]

Split a name/# or name/x into name, x.

Parameters:
  • name (str) – The extension or value portion of a tag.

  • lowercase (bool) – If True (default), return values are converted to lowercase.

Returns:

  • Name of the definition.

  • Value of the definition if it has one.

Return type:

tuple[str, str]

HedTypeFactors

class hed.tools.analysis.hed_type_factors.HedTypeFactors(type_tag, type_value, number_elements)[source]

Bases: object

Holds index of positions for a variable type for A columnar file.

ALLOWED_ENCODINGS = ('categorical', 'one-hot')
__init__(type_tag, type_value, number_elements)[source]

Constructor for HedTypeFactors.

Parameters:
  • type_tag (str) – Lowercase string corresponding to a HED tag which has a takes value child.

  • type_value (str) – The value of the type summarized by this class.

  • number_elements (int) – Number of elements in the data column

get_factors(factor_encoding='one-hot')[source]

Return a DataFrame of factor vectors for this type factor.

Parameters:

factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

pd.DataFrame

get_summary()[source]

Return the summary of the type tag value as a dictionary.

Returns:

Contains the summary.

Return type:

dict

HedTypeCount

class hed.tools.analysis.hed_type_counts.HedTypeCount(type_value, type_tag, file_name=None)[source]

Bases: object

Manager of the counts of tags for one type tag such as Condition-variable or Task.

Parameters:
  • type_value (str) – The value of the variable to be counted.

  • type_tag (str) – The type of variable.

Examples

HedTypeCounts(‘SymmetricCond’, ‘condition-variable’) keeps counts of Condition-variable/Symmetric.

__init__(type_value, type_tag, file_name=None)[source]
update(type_sum, file_id)[source]

Update the counts from a HedTypeValues.

Parameters:
  • type_sum (dict) – Information about the contents for a particular data file.

  • file_id (str or None) – Name of the file associated with the counts.

to_dict()[source]

Return count information as a dictionary.

get_summary()[source]

Return the summary of one value of one type tag.

Returns:

Count information for one tag of one type.

Return type:

dict

HedTypeCounts

class hed.tools.analysis.hed_type_counts.HedTypeCounts(name, type_tag)[source]

Bases: object

Manager for summaries of tag counts for columnar files.

__init__(name, type_tag)[source]
update_summary(type_sum, total_events=0, file_id=None)[source]

Update this summary based on the type variable map.

Parameters:
  • type_sum (dict) – Contains the information about the value of a type.

  • total_events (int) – Total number of events processed.

  • file_id (str) – Unique identifier for the associated file.

add_descriptions(type_defs)[source]

Update this summary based on the type variable map.

Parameters:

type_defs (HedTypeDefs) – Contains the information about the value of a type.

update(counts)[source]

Update count information based on counts in another HedTypeCounts.

Parameters:

counts (HedTypeCounts) – Information to use in the update.

get_summary()[source]

Return the information in the manager as a dictionary.

Returns:

Dict with keys ‘name’, ‘type_tag’, ‘files’, ‘total_events’, and ‘details’.

Return type:

dict

TabularSummary

class hed.tools.analysis.tabular_summary.TabularSummary(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]

Bases: object

Summarize the contents of columnar files.

__init__(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]

Constructor for a BIDS tabular file summary.

Parameters:
  • value_cols (list, None) – List of columns to be treated as value columns.

  • skip_cols (list, None) – List of columns to be skipped.

  • name (str) – Name associated with the dictionary.

  • categorical_limit (int, None) – Maximum number of unique values to store for a categorical column.

__str__()[source]

Return a str version of this summary.

extract_sidecar_template() dict[source]

Extract a BIDS sidecar-compatible dictionary.

Returns:

A sidecar template that can be converted to JSON.

Return type:

dict

get_summary(as_json=False) dict | str[source]

Return the summary in dictionary format.

Parameters:

as_json (bool) – If False, return as a Python dictionary, otherwise convert to a JSON dictionary.

Returns:

A dictionary containing the summary information or a JSON string if as_json is True.

Return type:

Union[dict, str]

get_number_unique(column_names=None) dict[source]

Return the number of unique values in columns.

Parameters:

column_names (list, None) – A list of column names to analyze or all columns if None.

Returns:

Column names are the keys and the number of unique values in the column are the values.

Return type:

dict

update(data, name=None)[source]

Update the counts based on data (DataFrame, filename, or list of filenames).

Parameters:
  • data (DataFrame, str, or list) – DataFrame containing data to update.

  • name (str) – Name of the summary.

update_summary(tab_sum)[source]

Add TabularSummary values to this object.

Parameters:

tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

  • The value_cols and skip_cols are updated as long as they are not contradictory.

  • A new skip column cannot be used.

static extract_summary(summary_info) TabularSummary[source]

Create a TabularSummary object from a serialized summary.

Parameters:

summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.

Returns:

contains the information in summary_info as a TabularSummary object.

Return type:

TabularSummary

static get_columns_info(dataframe, skip_cols=None) dict[str, dict][source]

Extract unique value counts for columns.

Parameters:
  • dataframe (DataFrame) – The DataFrame to be analyzed.

  • skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names (strings) and values that

are dictionaries of unique value counts.

Return type:

dict[str, dict]

static make_combined_dicts(file_dictionary, skip_cols=None) tuple[TabularSummary, dict[str, TabularSummary]][source]

Return combined and individual summaries.

Parameters:
  • file_dictionary (FileDictionary) – Dictionary of file name keys and full path.

  • skip_cols (list) – Name of the column.

Returns:

  • A combined summary of all files in the dictionary.

  • A dictionary where keys are file names and values are individual TabularSummary objects.

Return type:

tuple[TabularSummary, dict[str, TabularSummary]]

ColumnNameSummary

class hed.tools.analysis.column_name_summary.ColumnNameSummary(name='')[source]

Bases: object

Summarize the unique column names in a dataset.

__init__(name='')[source]
update(name, columns)[source]

Update the summary based on columns associated with a file.

Parameters:
  • name (str) – File name associated with the columns.

  • columns (list) – List of file names.

update_headers(column_names)[source]

Update the unique combinations of column names.

Parameters:

column_names (list) – List of column names to update.

get_summary(as_json=False)[source]

Return summary as an object or in JSON.

Parameters:

as_json (bool) – If False (the default), return the underlying summary object, otherwise transform to JSON.

FileDictionary

class hed.tools.analysis.file_dictionary.FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]

Bases: object

A file dictionary keyed by entity pair indices.

Notes

  • The entities are identified as 0, 1, … depending on order in the base filename.

  • The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.

__init__(collection_name, file_list, key_indices=(0, 2), separator='_')[source]

Create a dictionary with full paths as values.

Parameters:
  • collection_name (str) – Name of the file collection for reference.

  • file_list (list, None) – List containing full paths of files of interest.

  • key_indices (tuple, None) – List of order of key-value pieces to assemble for the key.

  • separator (str) – Character used to separate pieces of key name.

Notes

  • This dictionary is used for cross listing BIDS style files for different studies.

Examples

If key_indices is (0, 2), the key generated for /tmp/sub-001_task-FaceCheck_run-01_events.tsv is sub_001_run-01.

property name

Name of this dictionary.

property key_list

Keys in this dictionary.

property file_dict

Dictionary of path values in this dictionary.

property file_list

List of path values in this dictionary.

create_file_dict(file_list, key_indices, separator)[source]

Create new dict based on key indices.

Parameters:
  • file_list (list) – Paths of the files to include.

  • key_indices (tuple) – A tuple of integers representing order of entities for key.

  • separator (str) – The separator used between entities to form the key.

get_file_path(key)[source]

Return file path corresponding to key.

Parameters:

key (str) – Key used to retrieve the file path.

Returns:

File path.

Return type:

str

iter_files()[source]

Iterator over the files in this dictionary.

Yields:

- str – Key into the dictionary. - file: File path.

key_diffs(other_dict)[source]

Return symmetric key difference with another dict.

Parameters:

other_dict (FileDictionary)

Returns:

The symmetric difference of the keys in this dictionary and the other one.

Return type:

list

output_files(title=None)[source]

Return a string with the output of the list.

Parameters:

title (None, str) – Optional title.

Returns:

The dictionary in string form.

Return type:

str

static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]

Return a dictionary of files using entity keys.

Parameters:
  • file_list (list) – Paths to files to use.

  • key_indices (tuple) – Positions of entities to use for key.

  • separator (str) – Separator character used to construct key.

Returns:

Key is based on key indices and value is a full path.

Return type:

dict

static make_key(key_string, indices=(0, 2), separator='_')[source]

Create a key from specified entities.

Parameters:
  • key_string (str) – The string from which to extract the key (usually a filename or path).

  • indices (tuple) – Positions of entity pairs to use as key.

  • separator (str) – Separator between entity pairs in the created key.

Returns:

The created key.

Return type:

str

KeyMap

class hed.tools.analysis.key_map.KeyMap(key_cols, target_cols=None, name='')[source]

Bases: object

A map of unique column values for remapping columns.

key_cols

A list of column names that will be hashed into the keys for the map.

Type:

list

target_cols

Optional list of column names that will be inserted into data and later remapped.

Type:

list or None

name

An optional name of this remap for identification purposes.

Type:

str

Notes: This mapping converts all columns in the mapping to strings. The remapping does not support other types of columns.

__init__(key_cols, target_cols=None, name='')[source]

Information for remapping columns of tabular files.

Parameters:
  • key_cols (list) – List of columns to be replaced (assumed in the DataFrame).

  • target_cols (list) – List of replacement columns (assumed to not be in the DataFrame).

  • name (str) – Name associated with this remap (usually a pathname of the events file).

property columns

Return the column names of the columns managed by this map.

Returns:

Column names of the columns managed by this map.

Return type:

list

make_template(additional_cols=None, show_counts=True)[source]

Return a dataframe template.

Parameters:
  • additional_cols (list or None) – Optional list of additional columns to append to the returned dataframe.

  • show_counts (bool) – If True, number of times each key combination appears is in first column and values are sorted in descending order by.

Returns:

A dataframe containing the template.

Return type:

DataFrame

Raises:

HedFileError – If additional columns are not disjoint from the key columns.

Notes

  • The template consists of the unique key columns in this map plus additional columns.

remap(data)[source]

Remap the columns of a dataframe or columnar file.

Parameters:

data (DataFrame, str) – Columnar data (either DataFrame or filename) whose columns are to be remapped.

Returns:

  • New dataframe with columns remapped.

  • List of row numbers that had no correspondence in the mapping.

Return type:

tuple [DataFrame, list]

Raises:

HedFileError – If data is missing some of the key columns.

resort()[source]

Sort the col_map in place by the key columns.

update(data, allow_missing=True)[source]

Update the existing map with information from data.

Parameters:
  • data (DataFrame or str) – DataFrame or filename of an events file or event map.

  • allow_missing (bool) – If True allow missing keys and add as n/a columns.

Raises:

HedFileError – If there are missing keys and allow_missing is False.

static remove_quotes(df, columns=None)[source]

Remove quotes from the specified columns and convert to string.

Parameters:
  • df (Dataframe) – Dataframe to process by removing quotes.

  • columns (list) – List of column names. If None, all columns are used.

Notes

  • Replacement is done in place.

TemporalEvent

class hed.tools.analysis.temporal_event.TemporalEvent(contents, start_index, start_time)[source]

Bases: object

A single event process with starting and ending times.

Note: the contents have the Onset and duration removed.

__init__(contents, start_index, start_time)[source]
set_end(end_index, end_time)[source]

Set end time information for an event process.

Parameters:
  • end_index (int) – Position of ending event marker corresponding to the end of this event process.

  • end_time (float) – Ending time of the event (usually in seconds).

__str__()[source]

Return a string representation of this event process.

Returns:

A string representation of this event process.

Return type:

str

Annotation utilities

Utilities to facilitate annotation of events in BIDS.

hed.tools.analysis.annotation_util.check_df_columns(df, required_cols=('column_name', 'column_value', 'description', 'HED')) list[str][source]

Return a list of the specified columns that are missing from a dataframe.

Parameters:
  • df (DataFrame) – Spreadsheet to check the columns of.

  • required_cols (tuple) – List of column names that must be present.

Returns:

List of column names that are missing.

Return type:

list[str]

hed.tools.analysis.annotation_util.df_to_hed(dataframe, description_tag=True) dict[source]

Create sidecar-like dictionary from a 4-column dataframe.

Parameters:
  • dataframe (DataFrame) – A four-column Pandas DataFrame with specific columns.

  • description_tag (bool) – If True description tag is included.

Returns:

A dictionary compatible with BIDS JSON tabular file that includes HED.

Return type:

dict

Notes

  • The DataFrame must have the columns with names: column_name, column_value, description, and HED.

hed.tools.analysis.annotation_util.extract_tags(hed_string, search_tag) tuple[str, list[str]][source]

Extract all instances of specified tag from a tag_string.

Parameters:
  • hed_string (str) – Tag string from which to extract tag.

  • search_tag (str) – HED tag to extract.

Returns:

tuple[str, list[str]
  • Tag string without the tags.

  • A list of the tags that were extracted, for example descriptions.

hed.tools.analysis.annotation_util.generate_sidecar_entry(column_name, column_values=None) dict[source]

Create a sidecar column dictionary for column.

Parameters:
  • column_name (str) – Name of the column.

  • column_values – List of column values.

hed.tools.analysis.annotation_util.hed_to_df(sidecar_dict, col_names=None) DataFrame[source]

Return a 4-column dataframe of HED portions of sidecar.

Parameters:
  • sidecar_dict (dict) – A dictionary conforming to BIDS JSON events sidecar format.

  • col_names (list, None) – A list of the cols to include in the flattened sidecar.

Returns:

Four-column spreadsheet representing HED portion of sidecar.

Return type:

DataFrame

Notes

  • The returned DataFrame has columns: column_name, column_value, description, and HED.

hed.tools.analysis.annotation_util.merge_hed_dict(sidecar_dict, hed_dict)[source]

Update a JSON sidecar based on the hed_dict values.

Parameters:
  • sidecar_dict (dict) – Dictionary representation of a BIDS JSON sidecar.

  • hed_dict (dict) – Dictionary derived from a dataframe representation of HED in sidecar.

hed.tools.analysis.annotation_util.series_to_factor(series) list[int][source]

Convert a series to an integer factor list.

Parameters:

series (pd.Series) – Series to be converted to a list.

Returns:

list[int] - contains 0’s and 1’s, empty, ‘n/a’ and np.nan are converted to 0.

hed.tools.analysis.annotation_util.str_to_tabular(tsv_str, sidecar=None) TabularInput[source]

Return a TabularInput a tsv string.

Parameters:
  • tsv_str (str) – A string representing a tabular input.

  • sidecar – An optional Sidecar object.

hed.tools.analysis.annotation_util.strs_to_hed_objs(hed_strings, hed_schema) list[HedString] | None[source]

Returns a list of HedString objects from a list of strings.

Parameters:
  • hed_strings (string or list) – String or strings representing HED annotations.

  • hed_schema (HedSchema or HedSchemaGroup) – Schema version for the strings.

Returns:

A list of HedString objects or None.

Return type:

Union[list[HedString], None]

hed.tools.analysis.annotation_util.strs_to_sidecar(sidecar_strings) Sidecar | None[source]

Return a Sidecar from a sidecar as string or as a list of sidecars as strings.

Parameters:

sidecar_strings (string or list) – String or strings representing sidecars.

Returns:

the merged sidecar from the list.

Return type:

Union[Sidecar, None]

hed.tools.analysis.annotation_util.to_factor(data, column=None) list[int][source]

Convert data to an integer factor list.

Parameters:
  • data (Series or DataFrame) – Series or DataFrame to be converted to a list.

  • column (str, optional) – Column name if DataFrame, otherwise column 0 is used.

Returns:

A list containing 0’s and 1’s. Empty, ‘n/a’, and np.nan values are converted to 0.

Return type:

list[int]

hed.tools.analysis.annotation_util.to_strlist(obj_list) list[str][source]

Convert objects in a list to strings, preserving None values.

Parameters:

obj_list (list) – A list of objects that are None or have a str method.

Returns:

A list with the objects converted to strings. None values are preserved as empty strings.

Return type:

list[str]

BIDS tools

BidsDataset

class hed.tools.bids.bids_dataset.BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]

Bases: object

A BIDS dataset representation primarily focused on HED evaluation.

root_path

Real root path of the BIDS dataset.

Type:

str

schema

The schema used for evaluation.

Type:

HedSchema or HedSchemaGroup

file_groups

A dictionary of BidsFileGroup objects with a given file suffix.

Type:

dict

__init__(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]

Constructor for a BIDS dataset.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • schema (HedSchema or HedSchemaGroup) – A schema that overrides the one specified in dataset.

  • suffixes (list or None) – File name suffixes of items to include. If not provided, defaults to [‘events’, ‘participants’]. If None or empty list, includes all files.

  • exclude_dirs (list or None) – Directory names to exclude from traversal. If not provided, defaults to [‘sourcedata’, ‘derivatives’, ‘code’, ‘stimuli’]. If None or empty list, no directories are excluded.

get_file_group(suffix)[source]

Return the file group of files with the specified suffix.

Parameters:

suffix (str) – Suffix of the BidsFileGroup to be returned.

Returns:

The requested tabular group.

Return type:

Union[BidsFileGroup, None]

validate(check_for_warnings=False, schema=None)[source]

Validate the dataset.

Parameters:
  • check_for_warnings (bool) – If True, check for warnings.

  • schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.

Returns:

List of issues encountered during validation. Each issue is a dictionary.

Return type:

list

get_summary()[source]

Return an abbreviated summary of the dataset.

BidsFile

class hed.tools.bids.bids_file.BidsFile(file_path)[source]

Bases: object

A BIDS file with entity dictionary.

file_path

Real path of the file.

Type:

str

suffix

Suffix part of the filename.

Type:

str

ext

Extension (including the .).

Type:

str

entity_dict

Dictionary of entity-names (keys) and entity-values (values).

Type:

dict

Notes

  • This class may hold the merged sidecar giving metadata for this file as well as contents.

__init__(file_path)[source]

Constructor for a file path.

Parameters:

file_path (str) – Full path of the file.

property contents

Return the current contents of this object.

clear_contents()[source]

Set the contents attribute of this object to None.

get_entity(entity_name)[source]

Return the entity value for the specified entity.

Parameters:

entity_name (str) – Name of the BIDS entity, for example task, run, or sub.

Returns:

Entity value if any, otherwise None.

Return type:

Union[str, None]

get_key(entities=None)[source]

Return a key for this BIDS file given a list of entities.

Parameters:

entities (tuple) – A tuple of strings representing entities.

Returns:

A key based on this object.

Return type:

str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this object.

Parameters:
  • content_info (Any) – JSON dictionary The contents appropriate for this object.

  • overwrite (bool) – If False and the contents are not empty, do nothing.

Notes

  • Do not set if the contents are already set and no_overwrite is True.

__str__()[source]

Return a string representation of this object.

BidsFileGroup

class hed.tools.bids.bids_file_group.BidsFileGroup(root_path, file_list, suffix='events')[source]

Bases: object

Container for BIDS files with a specified suffix.

suffix

The file suffix specifying the class of file represented in this group (e.g., events).

Type:

str

sidecar_dict

A dictionary of sidecars associated with this suffix .

Type:

dict

datafile_dict

A dictionary with values either BidsTabularFile or BidsTimeseriesFile.

Type:

dict

sidecar_dir_dict

Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.

Type:

dict

__init__(root_path, file_list, suffix='events')[source]

Constructor for a BidsFileGroup.

Parameters:
  • file_list (list) – List of paths to the relevant tsv and json files.

  • suffix (str) – Suffix indicating the type this group represents (e.g. events, or channels, etc.).

summarize(value_cols=None, skip_cols=None)[source]

Return a BidsTabularSummary of group files.

Parameters:
  • value_cols (list) – Column names designated as value columns.

  • skip_cols (list) – Column names designated as columns to skip.

Returns:

A summary of the number of values in different columns if tabular group.

Return type:

Union[TabularSummary, None]

Notes

  • The columns that are not value_cols or skip_col are summarized by counting

the number of times each unique value appears in that column.

get_task_names()[source]

Return a sorted list of unique task names found in the file group’s TSV and JSON filenames.

Returns:

Sorted list of unique task name strings (the xxxx portion of task-xxxx entities).

Return type:

list

Notes

  • Parses both sidecar_dict and datafile_dict file paths.

  • The BIDS task- entity is matched case-insensitively.

validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]

Validate the sidecars and datafiles and return a list of issues.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • check_for_warnings (bool) – If True, include warnings in the check.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate merged sidecars.

Parameters:
Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate the datafiles and return an error list.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

Notes: This will clear the contents of the datafiles if they were not previously set.

static create_file_group(root_path, file_list, suffix)[source]

Construct a BidsFileGroup from a list of files sharing the given suffix.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • file_list (list[str]) – List of file paths belonging to this suffix group.

  • suffix (str) – BIDS file suffix identifying this group (e.g. events).

Returns:

The constructed group, or None if it contains no sidecars or data files.

Return type:

BidsFileGroup or None

BidsSidecarFile

class hed.tools.bids.bids_sidecar_file.BidsSidecarFile(file_path)[source]

Bases: BidsFile

A BIDS sidecar file.

__init__(file_path)[source]

Constructs a bids sidecar from a file.

Parameters:

file_path (str) – The real path of the sidecar.

is_sidecar_for(obj)[source]

Return True if this is a sidecar for obj.

Parameters:

obj (BidsFile) – A BidsFile object to check.

Returns:

True if this is a BIDS parent of obj and False otherwise.

Return type:

bool

Notes

  • A sidecar is a sidecar for itself.

set_contents(content_info=None, name='unknown', overwrite=False)[source]

Set the contents of the sidecar.

Parameters:
  • content_info (dict, or None) – If None, create a Sidecar from the object’s file-path.

  • name (str) – The name of the sidecar.

  • overwrite (bool) – If True, overwrite contents if already set.

Notes

  • The handling of content_info is as follows:
    • None: This object’s file_path is used.

    • dict: This is interpreted as a JSON dictionary.

static is_hed(json_dict)[source]

Return True if the json has HED.

Parameters:

json_dict (dict) – A dictionary representing a JSON file or merged file.

Returns:

True if the dictionary has HED or HED_assembled as a first or second-level key.

Return type:

bool

static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]

Merge a list of sidecars into a single sidecar.

Parameters:
  • sidecar_list (list) – A list of Sidecar objects.

  • name (str) – The name of the merged sidecar.

Returns:

A sidecar constructed from the merged list.

Return type:

Union[Sidecar, None]

BidsTabularFile

class hed.tools.bids.bids_tabular_file.BidsTabularFile(file_path)[source]

Bases: BidsFile

A BIDS tabular file including its associated sidecar.

__init__(file_path)[source]

Constructor for a BIDS tabular file.

Parameters:

file_path (str) – Path of the tabular file.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.

Parameters:
  • content_info (None) – This always uses the internal file_path to create the contents.

  • overwrite (bool) – If False (The Default), do not overwrite existing contents if any.

set_sidecar(sidecar)[source]

Set the sidecar for this tabular file.

Parameters:

sidecar (Sidecar) – The sidecar for this tabular file.

BIDS utilities

BIDS utility functions for schema loading, sidecar merging, and inheritance chain resolution.

hed.tools.bids.bids_util.get_schema_from_description(root_path)[source]

Load the HED schema version declared in the BIDS dataset_description.json.

Parameters:

root_path (str) – Root path of the BIDS dataset.

Returns:

The loaded schema, or None if loading fails.

Return type:

HedSchema or None

hed.tools.bids.bids_util.group_by_suffix(file_list)[source]

Group files by suffix.

Parameters:

file_list (list) – List of file paths.

Returns:

Dictionary with suffixes as keys and file lists as values.

Return type:

dict

hed.tools.bids.bids_util.parse_bids_filename(file_path)[source]

Split a filename into BIDS-relevant components.

Parameters:

file_path (str) – Path to be parsed.

Returns:

Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.

Return type:

dict

Notes

  • Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.

hed.tools.bids.bids_util.update_entity(name_dict, entity)[source]

Update the dictionary with a new entity.

Parameters:
  • name_dict (dict) – Dictionary of entities.

  • entity (str) – Entity to be added.

hed.tools.bids.bids_util.get_merged_sidecar(root_path, tsv_file)[source]

Return a merged sidecar dict following BIDS inheritance rules for a given TSV file.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • tsv_file (str) – Path to the TSV file whose inherited sidecars should be merged.

Returns:

Merged sidecar dictionary. Keys from closer (more specific) sidecar files take precedence.

Return type:

dict

hed.tools.bids.bids_util.walk_back(root_path, file_path)[source]

Yield inherited sidecar file paths from the directory of file_path back toward root_path.

Traverses parent directories from the file’s location up to root_path, yielding any sidecar JSON files that apply to the given TSV according to BIDS inheritance rules.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • file_path (str) – Path to the data file whose applicable sidecars should be found.

Yields:

str – Absolute paths of applicable sidecar JSON files, from nearest to farthest.

hed.tools.bids.bids_util.get_candidates(source_dir, tsv_file_dict)[source]

Return sidecar JSON files in source_dir that are applicable to tsv_file_dict.

Parameters:
  • source_dir (str) – Directory to search for candidate sidecar files.

  • tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

Absolute paths to matching sidecar JSON files.

Return type:

list[str]

hed.tools.bids.bids_util.matches_criteria(json_file_dict, tsv_file_dict)[source]

Return True if a candidate sidecar JSON file applies to the given TSV file.

A sidecar applies when its extension is .json, its suffix matches the TSV, and all BIDS entities in the JSON filename have equal values in the TSV filename.

Parameters:
  • json_file_dict (dict) – Parsed BIDS filename dict for the candidate JSON file.

  • tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

True if the sidecar is applicable.

Return type:

bool

Utility functions

DataFrame utilities

Data handling utilities involving dataframes.

hed.tools.util.data_util.add_columns(df, column_list, value='n/a')[source]

Add specified columns to df if not there.

Parameters:
  • df (DataFrame) – Pandas dataframe.

  • column_list (list) – List of columns to append to the dataframe.

  • value (str) – Default fill value for the column.

hed.tools.util.data_util.check_match(ds1, ds2, numeric=False)[source]

Check two Pandas data series have the same values.

Parameters:
  • ds1 (DataSeries) – Pandas data series to check.

  • ds2 (DataSeries) – Pandas data series to check.

  • numeric (bool) – If True, treat as numeric and do close-to comparison.

Returns:

Error messages indicating the mismatch or empty if the series match.

Return type:

list

hed.tools.util.data_util.delete_columns(df, column_list)[source]

Delete the specified columns from a dataframe.

Parameters:
  • df (DataFrame) – Pandas dataframe from which to delete columns.

  • column_list (list) – List of candidate column names for deletion.

Notes

  • The deletion of columns is done in place.

  • This does not raise an error if df does not have a column in the list.

hed.tools.util.data_util.delete_rows_by_column(df, value, column_list=None)[source]

Delete rows where columns have this value.

Parameters:
  • df (DataFrame) – Pandas dataframe from which to delete rows.

  • value (str) – Specified value to indicate row should be deleted.

  • column_list (list) – List of columns to search for value.

Notes

  • All values are converted to string before testing.

  • Deletion is done in place.

hed.tools.util.data_util.get_eligible_values(values, values_included)[source]

Return a list of the items from values that are in values_included or None if no values_included.

Parameters:
  • values (list) – List of strings against which to test.

  • values_included (list) – List of items to be selected from values if they are present.

Returns:

list of selected values or None if values_included is empty or None.

Return type:

list

hed.tools.util.data_util.get_key_hash(key_tuple)[source]

Calculate a hash key for tuple of values.

Parameters:

key_tuple (tuple, list) – The key values in the correct order for lookup.

Returns:

A hash key for the tuple.

Return type:

int

hed.tools.util.data_util.get_new_dataframe(data)[source]

Get a new dataframe representing a tsv file.

Parameters:

data (DataFrame or str) – DataFrame or filename representing a tsv file.

Returns:

A dataframe containing the contents of the tsv file or if data was

a DataFrame to start with, a new copy of the DataFrame.

Return type:

DataFrame

Raises:

HedFileError

  • A filename is given, and it cannot be read into a Dataframe.

hed.tools.util.data_util.get_row_hash(row, key_list)[source]

Get a hash key from key column values for row.

Parameters:
  • row (DataSeries)

  • key_list (list)

Returns:

Hash key constructed from the entries of row in the columns specified by key_list.

Return type:

str

Raises:

HedFileError

  • If row doesn’t have all the columns in key_list HedFileError is raised.

hed.tools.util.data_util.get_value_dict(tsv_path, key_col='file_basename', value_col='sampling_rate')[source]

Get a dictionary of two columns of a dataframe.

Parameters:
  • tsv_path (str) – Path to a tsv file with a header row to be read into a DataFrame.

  • key_col (str) – Name of the column which should be the key.

  • value_col (str) – Name of the column which should be the value.

Returns:

Dictionary with key_col values as the keys and the corresponding value_col values as the values.

Return type:

dict

Raises:

HedFileError – When tsv_path does not correspond to a file that can be read into a DataFrame.

hed.tools.util.data_util.make_info_dataframe(col_info, selected_col)[source]

Get a dataframe from selected columns.

Parameters:
  • col_info (dict) – Dictionary of dictionaries of column values and counts.

  • selected_col (str) – Name of the column used as top level key for col_info.

Returns:

A two-column dataframe with first column containing values from the

dictionary whose key is selected_col and whose second column are the corresponding counts. The returned value is None if selected_col is not a top-level key in col_info.

Return type:

dataframe

hed.tools.util.data_util.replace_na(df)[source]

Replace (in place) the n/a with np.nan taking care of categorical columns.

hed.tools.util.data_util.replace_values(df, values=None, replace_value='n/a', column_list=None)[source]

Replace string values in specified columns.

Parameters:
  • df (DataFrame) – Dataframe whose values will be replaced.

  • values (list, None) – List of strings to replace. If None, only empty strings are replaced.

  • replace_value (str) – String replacement value.

  • column_list (list, None) – List of columns in which to do replacement. If None all columns are processed.

Returns:

number of values replaced.

Return type:

int

hed.tools.util.data_util.reorder_columns(data, col_order, skip_missing=True)[source]

Create a new dataframe with columns reordered.

Parameters:
  • data (DataFrame, str) – Dataframe or filename of dataframe whose columns are to be reordered.

  • col_order (list) – List of column names in desired order.

  • skip_missing (bool) – If true, col_order columns missing from data are skipped, otherwise error.

Returns:

A new reordered dataframe.

Return type:

DataFrame

Raises:
  • HedFileError – If col_order contains columns not in data and skip_missing is False.

  • If data corresponds to a filename from which a dataframe cannot be created.

hed.tools.util.data_util.separate_values(values, target_values)[source]

Get target values from the target_values list.

Parameters:
  • values (list) – List of values to be tested.

  • target_values – List of desired values.

File/IO utilities

Utilities for generating and handling file names.

hed.tools.util.io_util.check_filename(test_file, name_prefix=None, name_suffix=None, extensions=None)[source]

Return True if correct extension, suffix, and prefix.

Parameters:
  • test_file (str) – Path of filename to test.

  • name_prefix (list, str, None) – An optional name_prefix or list of prefixes to accept for the base filename.

  • name_suffix (list, str, None) – An optional name_suffix or list of suffixes to accept for the base file name.

  • extensions (list, str, None) – An optional extension or list of extensions to accept for the extensions.

Returns:

True if file has the appropriate format.

Return type:

bool

Notes

  • Everything is converted to lower case prior to testing so this test should be case-insensitive.

  • None indicates that all are accepted.

hed.tools.util.io_util.get_allowed(value, allowed_values=None, starts_with=True)[source]

Return the portion of the value that matches a value in allowed_values or None if no match.

Parameters:
  • value (str) – value to be matched.

  • allowed_values (list, str, or None) – Values to match.

  • starts_with (bool) – If True match is done at beginning of string, otherwise the end.

Returns:

portion of value that matches the various allowed_values.

Return type:

Union[str,list]

Notes

  • match is done in lower case.

hed.tools.util.io_util.get_alphanumeric_path(pathname, replace_char='_')[source]

Replace sequences of non-alphanumeric characters in string (usually a path) with specified character.

Parameters:
  • pathname (str) – A string usually representing a pathname, but could be any string.

  • replace_char (str) – Replacement character(s).

Returns:

New string with characters replaced.

Return type:

str

hed.tools.util.io_util.get_full_extension(filename)[source]

Return the full extension of a file, including the period.

Parameters:

filename (str) – The filename to be parsed.

Returns:

  • File name without extension

  • Full extension

Return type:

Tuple[str, str]

hed.tools.util.io_util.get_unique_suffixes(file_paths, extensions=None)[source]

Get unique suffixes from file paths with specified extensions.

Parameters:
  • file_paths (list) – List of file paths to process.

  • extensions (list or None) – List of file extensions to filter. If None, defaults to [‘.json’, ‘.tsv’].

Returns:

Set of unique suffixes found.

Return type:

set

hed.tools.util.io_util.extract_suffix_path(path, prefix_path)[source]

Return the suffix of path after prefix path has been removed.

Parameters:
  • path (str)

  • prefix_path (str)

Returns:

Suffix path.

Return type:

str

Notes

  • This function is useful for creating files within BIDS datasets.

hed.tools.util.io_util.clean_filename(filename)[source]

Replace invalid characters with under-bars.

Parameters:

filename (str) – source filename.

Returns:

The filename with anything but alphanumeric, period, hyphens, and under-bars removed.

Return type:

str

hed.tools.util.io_util.get_basename(file_path)[source]

Return the base filename (without extension) for the given path.

Parameters:

file_path (str) – Path to a file.

Returns:

The filename stem, e.g. sub-01_task-rest_events for sub-01_task-rest_events.tsv.

Return type:

str

hed.tools.util.io_util.get_filtered_by_element(file_list, elements)[source]

Filter a file list by whether the base names have a substring matching any of the members of elements.

Parameters:
  • file_list (list) – List of file paths to be filtered.

  • elements (list) – List of strings to use as filename filters.

Returns:

The list only containing file paths whose filenames match a filter.

Return type:

list

hed.tools.util.io_util.get_filtered_list(file_list, name_prefix=None, name_suffix=None, extensions=None)[source]

Get list of filenames satisfying the criteria.

Everything is converted to lower case prior to testing so this test should be case-insensitive.

Parameters:
  • file_list (list) – List of files to test.

  • name_prefix (str) – Optional name_prefix for the base filename.

  • name_suffix (str) – Optional name_suffix for the base filename.

  • extensions – Optional list of file extensions (allows two periods (.tsv.gz)).

hed.tools.util.io_util.get_file_list(root_path, name_prefix=None, name_suffix=None, extensions=None, exclude_dirs=None)[source]

Return paths satisfying various conditions.

Parameters:
  • root_path (str) – Full path of the directory tree to be traversed (no ending slash).

  • name_prefix (list, str, None) – An optional prefix for the base filename.

  • name_suffix (list, str, None) – An optional suffix for the base filename.

  • extensions (list, None) – A list of extensions to be selected.

  • exclude_dirs (list, None) – A list of paths to be excluded.

Returns:

The full paths.

Return type:

list

Notes: Exclude directories are paths relative to the root path.

hed.tools.util.io_util.get_path_components(root_path, this_path)[source]

Get a list of the remaining components after root path.

Parameters:
  • root_path (str) – A path (no trailing separator).

  • this_path (str) – The path of a file or directory descendant of root_path.

Returns:

A list with the remaining elements directory components to the file.

Return type:

Union[list, None]

Notes: this_path must be a descendant of root_path.

hed.tools.util.io_util.get_timestamp()[source]

Return a timestamp string suitable for using in filenames.

Returns:

Represents the current time.

Return type:

str

hed.tools.util.io_util.get_task_from_file(file_path)[source]

Returns the task name entity from a BIDS-type file path.

Parameters:

file_path (str) – File path.

Returns:

The task name or an empty string.

Return type:

str

hed.tools.util.io_util.get_task_dict(files)[source]

Return a dictionary of the tasks that appear in the file names of a list of files.

Parameters:

files (list) – List of filenames to be separated by task.

Returns:

dictionary of filenames keyed by task name.

Return type:

dict

hed.tools.util.io_util.separate_by_ext(file_paths)[source]

Separate a list of files into tsv and json files.

Parameters:

file_paths (list) – A list of file paths.

Returns:

key is extension and value is list of files with that extension.

Return type:

dict

Schema utilities

Utilities

hed.tools.util.schema_util.flatten_schema(hed_schema, skip_non_tag=False)[source]

Returns a 3-column dataframe representing a schema.

Parameters:
  • hed_schema (HedSchema) – the schema to flatten

  • skip_non_tag (bool) – Skips all sections except tag

Returns:

Represents a HED schema in flattened form.

Return type:

DataFrame