Tools

Utility functions and data processing tools for HED operations.

Analysis Tools

EventManager

class hed.tools.analysis.event_manager.EventManager(input_data, hed_schema, extra_defs=None)[source]

Bases: object

Manager of events of temporal extent.

__init__(input_data, hed_schema, extra_defs=None)[source]

Create an event manager for an events file. Manages events of temporal extent.

Parameters:
  • input_data (TabularInput) – Represents an events file with its sidecar.

  • hed_schema (HedSchema) – HED schema used.

  • extra_defs (DefinitionDict) – Extra definitions not included in the input_data information.

Raises:

HedFileError – If there are any unmatched offsets.

Notes: Keeps the events of temporal extend by their starting index in events file. These events are separated from the rest of the annotations, which are contained in self.hed_strings.

unfold_context(remove_types=[])[source]

Unfold the event information into a tuple based on context.

Parameters:

remove_types (list) – List of types to remove.

Returns:

Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.

Return type:

tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]

str_list_to_hed(str_list)[source]

Create a HedString object from a list of strings.

Parameters:

str_list (list) – A list of strings to be concatenated with commas and then converted.

Returns:

The converted list.

Return type:

Union[HedString, None]

get_type_defs(types)[source]

Return a list of definition names (lower case) that correspond to any of the specified types.

Parameters:

types (list or None) – List of tags that are treated as types such as ‘Condition-variable’

Returns:

List of definition names (lower-case) that correspond to the specified types

Return type:

list

static compress_strings(list_to_compress)[source]

Compress a list of lists of strings into a single str with comma-separated elements.

Parameters:

list_to_compress (list) – List of lists of HED str to turn into a list of single HED strings.

Returns:

List of same length as list_to_compress with each entry being a str.

Return type:

list

HedTagManager

class hed.tools.analysis.hed_tag_manager.HedTagManager(event_manager, remove_types=[], extra_defs=None)[source]

Bases: object

Manager for the HED tags from a columnar file.

__init__(event_manager, remove_types=[], extra_defs=None)[source]

Create a tag manager for one tabular file.

Parameters:
  • event_manager (EventManager) – an event manager for the tabular file.

  • remove_types (list or None) – List of type tags (such as condition-variable) to remove.

get_hed_objs(include_context=True, replace_defs=False)[source]

Return a list of HED string objects of same length as the tabular file.

Parameters:
  • include_context (bool) – If True (default), include the Event-context group in the HED string.

  • replace_defs (bool) – If True (default=False), replace the Def tags with Definition contents.

Returns:

list - List of HED strings of same length as tabular file.

get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]

Return a HED string object with the types removed.

Parameters:
  • hed_str (str) – Represents a HED string.

  • remove_types (bool) – If False (the default), do not remove the types managed by this manager.

  • remove_group (bool) – If False (the default), do not remove the group when removing a type tag, otherwise remove its enclosing group.

HedTypeManager

class hed.tools.analysis.hed_type_manager.HedTypeManager(event_manager)[source]

Bases: object

Manager for type factors and type definitions.

__init__(event_manager)[source]

Create a variable manager for one tabular file for all type variables.

Parameters:

event_manager (EventManager) – An event manager for the tabular file.

Raises:

HedFileError – On errors such as unmatched onsets or missing definitions.

property types

Return a list of types managed by this manager.

Returns:

Type tags names.

Return type:

list

add_type(type_name)[source]

Add a type variable to be managed by this manager.

Parameters:

type_name (str) – Type tag name of the type to be added.

get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]

Return a DataFrame of factor vectors for the indicated HED tag and values.

Parameters:
  • type_tag (str) – HED tag to retrieve factors for.

  • type_values (list or None) – The values of the tag to create factors for or None if all unique values.

  • factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

Union[pd.DataFrame, None]

get_type(type_tag)[source]

Returns the HedType variable associated with the type tag.

Parameters:

type_tag (str) – HED tag to retrieve the type for.

Returns:

the values associated with this type tag.

Return type:

Union[HedType, None]

get_type_tag_factor(type_tag, type_value)[source]

Return the HedTypeFactors a specified value and extension.

Parameters:
  • type_tag (str or None) – HED tag for the type.

  • type_value (str or None) – Value of this tag to return the factors for.

get_type_def_names(type_var)[source]

Return the definitions associated with a particular type tag.

Parameters:

type_var (str) – The name of a type tag such as Condition-variable.

Returns:

Names of definitions that use this type.

Return type:

list

summarize_all(as_json=False)[source]

Return a dictionary containing the summaries for the types managed by this manager.

Parameters:

as_json (bool) – If False (the default), return as an object otherwise return as a JSON string.

Returns:

Dictionary with the summary.

Return type:

Union[dict, str]

TabularSummary

class hed.tools.analysis.tabular_summary.TabularSummary(value_cols=None, skip_cols=None, name='')[source]

Bases: object

Summarize the contents of columnar files.

__init__(value_cols=None, skip_cols=None, name='')[source]

Constructor for a BIDS tabular file summary.

Parameters:
  • value_cols (list, None) – List of columns to be treated as value columns.

  • skip_cols (list, None) – List of columns to be skipped.

  • name (str) – Name associated with the dictionary.

__str__()[source]

Return a str version of this summary.

extract_sidecar_template() dict[source]

Extract a BIDS sidecar-compatible dictionary.

Returns:

A sidecar template that can be converted to JSON.

Return type:

dict

get_summary(as_json=False) dict | str[source]

Return the summary in dictionary format.

Parameters:

as_json (bool) – If False, return as a Python dictionary, otherwise convert to a JSON dictionary.

Returns:

A dictionary containing the summary information or a JSON string if as_json is True.

Return type:

Union[dict, str]

get_number_unique(column_names=None) dict[source]

Return the number of unique values in columns.

Parameters:

column_names (list, None) – A list of column names to analyze or all columns if None.

Returns:

Column names are the keys and the number of unique values in the column are the values.

Return type:

dict

update(data, name=None)[source]

Update the counts based on data.

Parameters:
  • data (DataFrame, str, or list) – DataFrame containing data to update.

  • name (str) – Name of the summary.

update_summary(tab_sum)[source]

Add TabularSummary values to this object.

Parameters:

tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

  • The value_cols and skip_cols are updated as long as they are not contradictory.

  • A new skip column cannot be used.

static extract_summary(summary_info) TabularSummary[source]

Create a TabularSummary object from a serialized summary.

Parameters:

summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.

Returns:

contains the information in summary_info as a TabularSummary object.

Return type:

TabularSummary

static get_columns_info(dataframe, skip_cols=None) dict[str, dict][source]

Extract unique value counts for columns.

Parameters:
  • dataframe (DataFrame) – The DataFrame to be analyzed.

  • skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names (strings) and values that

are dictionaries of unique value counts.

Return type:

dict[str, dict]

static make_combined_dicts(file_dictionary, skip_cols=None) tuple[TabularSummary, dict[str, TabularSummary]][source]

Return combined and individual summaries.

Parameters:
  • file_dictionary (FileDictionary) – Dictionary of file name keys and full path.

  • skip_cols (list) – Name of the column.

Returns:

  • A combined summary of all files in the dictionary.

  • A dictionary where keys are file names and values are individual TabularSummary objects.

Return type:

tuple[TabularSummary, dict[str, TabularSummary]]

HedType

class hed.tools.analysis.hed_type.HedType(event_manager, name, type_tag='condition-variable')[source]

Bases: object

Manager of a type variable and its associated context.

__init__(event_manager, name, type_tag='condition-variable')[source]

Create a variable manager for one type-variable for one tabular file.

Parameters:
  • event_manager (EventManager) – Event manager instance

  • name (str) – Name of the tabular file as a unique identifier.

  • type_tag (str) – Lowercase short form of the tag to be managed.

Raises:

HedFileError – On errors such as unmatched onsets or missing definitions.

property total_events
get_type_value_factors(type_value)[source]

Return the HedTypeFactors associated with type_name or None.

Parameters:

type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).

Returns:

Union[HedTypeFactors, None]

get_type_value_level_info(type_value)[source]

Return type variable corresponding to type_value.

Parameters:

type_value (str)

Returns:

property type_variables
get_type_def_names()[source]

Return the type defs names

get_type_value_names()[source]
get_summary()[source]
get_type_factors(type_values=None, factor_encoding='one-hot')[source]

Create a dataframe with the indicated type tag values as factors.

Parameters:
  • type_values (list or None) – A list of values of type tags for which to generate factors.

  • factor_encoding (str) – Type of factor encoding (one-hot or categorical).

Returns:

Contains the specified factors associated with this type tag.

Return type:

pd.DataFrame

static get_type_list(type_tag, item)[source]

Find a list of the given type tag from a HedTag, HedGroup, or HedString.

Parameters:
  • type_tag (str) – a tag whose direct items you wish to remove

  • item (HedTag or HedGroup) – The item from which to extract condition type_variables.

Returns:

List of the items with this type_tag

Return type:

list

FileDictionary

class hed.tools.analysis.file_dictionary.FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]

Bases: object

A file dictionary keyed by entity pair indices.

Notes

  • The entities are identified as 0, 1, … depending on order in the base filename.

  • The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.

__init__(collection_name, file_list, key_indices=(0, 2), separator='_')[source]

Create a dictionary with full paths as values.

Parameters:
  • collection_name (str) – Name of the file collection for reference.

  • file_list (list, None) – List containing full paths of files of interest.

  • key_indices (tuple, None) – List of order of key-value pieces to assemble for the key.

  • separator (str) – Character used to separate pieces of key name.

Notes

  • This dictionary is used for cross listing BIDS style files for different studies.

Examples

If key_indices is (0, 2), the key generated for /tmp/sub-001_task-FaceCheck_run-01_events.tsv is sub_001_run-01.

property name

Name of this dictionary.

property key_list

Keys in this dictionary.

property file_dict

Dictionary of path values in this dictionary.

property file_list

List of path values in this dictionary.

create_file_dict(file_list, key_indices, separator)[source]

Create new dict based on key indices.

Parameters:
  • file_list (list) – Paths of the files to include.

  • key_indices (tuple) – A tuple of integers representing order of entities for key.

  • separator (str) – The separator used between entities to form the key.

get_file_path(key)[source]

Return file path corresponding to key.

Parameters:

key (str) – Key used to retrieve the file path.

Returns:

File path.

Return type:

str

iter_files()[source]

Iterator over the files in this dictionary.

Yields:

- str – Key into the dictionary. - file: File path.

key_diffs(other_dict)[source]

Return symmetric key difference with another dict.

Parameters:

other_dict (FileDictionary)

Returns:

The symmetric difference of the keys in this dictionary and the other one.

Return type:

list

output_files(title=None, logger=None)[source]

Return a string with the output of the list.

Parameters:
  • title (None, str) – Optional title.

  • logger (HedLogger) – Optional HED logger for recording.

Returns:

The dictionary in string form.

Return type:

str

Notes

  • The logger is updated if available.

static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]

Return a dictionary of files using entity keys.

Parameters:
  • file_list (list) – Paths to files to use.

  • key_indices (tuple) – Positions of entities to use for key.

  • separator (str) – Separator character used to construct key.

Returns:

Key is based on key indices and value is a full path.

Return type:

dict

static make_key(key_string, indices=(0, 2), separator='_')[source]

Create a key from specified entities.

Parameters:
  • key_string (str) – The string from which to extract the key (usually a filename or path).

  • indices (tuple) – Positions of entity pairs to use as key.

  • separator (str) – Separator between entity pairs in the created key.

Returns:

The created key.

Return type:

str

BIDS Tools

BidsDataset

class hed.tools.bids.bids_dataset.BidsDataset(root_path, schema=None, suffixes=['events', 'participants'], exclude_dirs=['sourcedata', 'derivatives', 'code', 'stimuli'])[source]

Bases: object

A BIDS dataset representation primarily focused on HED evaluation.

root_path

Real root path of the BIDS dataset.

Type:

str

schema

The schema used for evaluation.

Type:

HedSchema or HedSchemaGroup

file_groups

A dictionary of BidsFileGroup objects with a given file suffix.

Type:

dict

__init__(root_path, schema=None, suffixes=['events', 'participants'], exclude_dirs=['sourcedata', 'derivatives', 'code', 'stimuli'])[source]

Constructor for a BIDS dataset.

Parameters:
  • root_path (str) – Root path of the BIDS dataset.

  • schema (HedSchema or HedSchemaGroup) – A schema that overrides the one specified in dataset.

  • suffixes (list or None) – File name suffixes of items to include. If None or empty, then [‘_events’, ‘participants’] is assumed.

  • exclude_dirs=['sourcedata'

  • 'derivatives'

  • 'code'

  • 'phenotype']

get_file_group(suffix)[source]

Return the file group of files with the specified suffix.

Parameters:

suffix (str) – Suffix of the BidsFileGroup to be returned.

Returns:

The requested tabular group.

Return type:

Union[BidsFileGroup, None]

validate(check_for_warnings=False, schema=None)[source]

Validate the dataset.

Parameters:
  • check_for_warnings (bool) – If True, check for warnings.

  • schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.

Returns:

List of issues encountered during validation. Each issue is a dictionary.

Return type:

list

get_summary()[source]

Return an abbreviated summary of the dataset.

BidsFile

class hed.tools.bids.bids_file.BidsFile(file_path)[source]

Bases: object

A BIDS file with entity dictionary.

file_path

Real path of the file.

Type:

str

suffix

Suffix part of the filename.

Type:

str

ext

Extension (including the .).

Type:

str

entity_dict

Dictionary of entity-names (keys) and entity-values (values).

Type:

dict

Notes

  • This class may hold the merged sidecar giving metadata for this file as well as contents.

__init__(file_path)[source]

Constructor for a file path.

Parameters:

file_path (str) – Full path of the file.

property contents

Return the current contents of this object.

clear_contents()[source]

Set the contents attribute of this object to None.

get_entity(entity_name)[source]

Return the entity value for the specified entity.

Parameters:

entity_name (str) – Name of the BIDS entity, for example task, run, or sub.

Returns:

Entity value if any, otherwise None.

Return type:

Union[str, None]

get_key(entities=None)[source]

Return a key for this BIDS file given a list of entities.

Parameters:

entities (tuple) – A tuple of strings representing entities.

Returns:

A key based on this object.

Return type:

str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this object.

Parameters:
  • content_info (Any) – JSON dictionary The contents appropriate for this object.

  • overwrite (bool) – If False and the contents are not empty, do nothing.

Notes

  • Do not set if the contents are already set and no_overwrite is True.

__str__()[source]

Return a string representation of this object.

BidsFileGroup

class hed.tools.bids.bids_file_group.BidsFileGroup(root_path, file_list, suffix='events')[source]

Bases: object

Container for BIDS files with a specified suffix.

suffix

The file suffix specifying the class of file represented in this group (e.g., events).

Type:

str

sidecar_dict

A dictionary of sidecars associated with this suffix .

Type:

dict

datafile_dict

A dictionary with values either BidsTabularFile or BidsTimeseriesFile.

Type:

dict

sidecar_dir_dict

Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.

Type:

dict

__init__(root_path, file_list, suffix='events')[source]

Constructor for a BidsFileGroup.

Parameters:
  • file_list (list) – List of paths to the relevant tsv and json files.

  • suffix (str) – Suffix indicating the type this group represents (e.g. events, or channels, etc.).

summarize(value_cols=None, skip_cols=None)[source]

Return a BidsTabularSummary of group files.

Parameters:
  • value_cols (list) – Column names designated as value columns.

  • skip_cols (list) – Column names designated as columns to skip.

Returns:

A summary of the number of values in different columns if tabular group.

Return type:

Union[TabularSummary, None]

Notes

  • The columns that are not value_cols or skip_col are summarized by counting

the number of times each unique value appears in that column.

validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]

Validate the sidecars and datafiles and return a list of issues.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • check_for_warnings (bool) – If True, include warnings in the check.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate merged sidecars.

Parameters:
Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]

Validate the datafiles and return an error list.

Parameters:
  • hed_schema (HedSchema) – Schema to apply to the validation.

  • extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.

  • error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

Notes: This will clear the contents of the datafiles if they were not previously set.

static create_file_group(root_path, file_list, suffix)[source]

BidsSidecarFile

class hed.tools.bids.bids_sidecar_file.BidsSidecarFile(file_path)[source]

Bases: BidsFile

A BIDS sidecar file.

__init__(file_path)[source]

Constructs a bids sidecar from a file.

Parameters:

file_path (str) – The real path of the sidecar.

is_sidecar_for(obj)[source]

Return True if this is a sidecar for obj.

Parameters:

obj (BidsFile) – A BidsFile object to check.

Returns:

True if this is a BIDS parent of obj and False otherwise.

Return type:

bool

Notes

  • A sidecar is a sidecar for itself.

set_contents(content_info=None, name='unknown', overwrite=False)[source]

Set the contents of the sidecar.

Parameters:
  • content_info (dict, or None) – If None, create a Sidecar from the object’s file-path.

  • name (str) – The name of the sidecar.

  • overwrite (bool) – If True, overwrite contents if already set.

Notes

  • The handling of content_info is as follows:
    • None: This object’s file_path is used.

    • dict: This is interpreted as a JSON dictionary.

static is_hed(json_dict)[source]

Return True if the json has HED.

Parameters:

json_dict (dict) – A dictionary representing a JSON file or merged file.

Returns:

True if the dictionary has HED or HED_assembled as a first or second-level key.

Return type:

bool

static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]

Merge a list of sidecars into a single sidecar.

Parameters:
  • sidecar_list (list) – A list of Sidecar objects.

  • name (str) – The name of the merged sidecar.

Returns:

A sidecar constructed from the merged list.

Return type:

Union[Sidecar, None]

BidsTabularFile

class hed.tools.bids.bids_tabular_file.BidsTabularFile(file_path)[source]

Bases: BidsFile

A BIDS tabular file including its associated sidecar.

__init__(file_path)[source]

Constructor for a BIDS tabular file.

Parameters:

file_path (str) – Path of the tabular file.

set_contents(content_info=None, overwrite=False)[source]

Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.

Parameters:
  • content_info (None) – This always uses the internal file_path to create the contents.

  • overwrite (bool) – If False (The Default), do not overwrite existing contents if any.

set_sidecar(sidecar)[source]

Set the sidecar for this tabular file.

Parameters:

sidecar (Sidecar) – The sidecar for this tabular file.

BIDS Utilities

hed.tools.bids.bids_util.get_schema_from_description(root_path)[source]
hed.tools.bids.bids_util.group_by_suffix(file_list)[source]

Group files by suffix.

Parameters:

file_list (list) – List of file paths.

Returns:

Dictionary with suffixes as keys and file lists as values.

Return type:

dict

hed.tools.bids.bids_util.parse_bids_filename(file_path)[source]

Split a filename into BIDS-relevant components.

Parameters:

file_path (str) – Path to be parsed.

Returns:

Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.

Return type:

dict

Notes

  • Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.

hed.tools.bids.bids_util.update_entity(name_dict, entity)[source]

Update the dictionary with a new entity.

Parameters:
  • name_dict (dict) – Dictionary of entities.

  • entity (str) – Entity to be added.

hed.tools.bids.bids_util.get_merged_sidecar(root_path, tsv_file)[source]
hed.tools.bids.bids_util.walk_back(root_path, file_path)[source]
hed.tools.bids.bids_util.get_candidates(source_dir, tsv_file_dict)[source]
hed.tools.bids.bids_util.matches_criteria(json_file_dict, tsv_file_dict)[source]