Tools¶

Utility functions and data processing tools for HED operations.

Analysis Tools¶

EventManager¶

class hed.tools.analysis.event_manager.EventManager(input_data, hed_schema, extra_defs=None)[source]¶

Bases: object

Manager of events of temporal extent.

__init__(input_data, hed_schema, extra_defs=None)[source]¶

Create an event manager for an events file. Manages events of temporal extent.

Parameters:

input_data (TabularInput) – Represents an events file with its sidecar.
hed_schema (HedSchema) – HED schema used.
extra_defs (DefinitionDict) – Extra definitions not included in the input_data information.

Raises:

HedFileError – If there are any unmatched offsets.

Notes: Keeps the events of temporal extend by their starting index in events file. These events are separated from the rest of the annotations, which are contained in self.hed_strings.

unfold_context(remove_types=None)[source]¶

Unfold the event information into a tuple based on context.

Parameters:: remove_types (list or None) – List of types to remove. If None, defaults to empty list.
Returns:: Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.
Return type:: tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]

str_list_to_hed(str_list)[source]¶

Create a HedString object from a list of strings.

Parameters:: str_list (list) – A list of strings to be concatenated with commas and then converted.
Returns:: The converted list.
Return type:: Union[HedString, None]

get_type_defs(types)[source]¶

Return a list of definition names (lower case) that correspond to any of the specified types.

Parameters:: types (list or None) – List of tags that are treated as types such as ‘Condition-variable’
Returns:: List of definition names (lower-case) that correspond to the specified types
Return type:: list

static compress_strings(list_to_compress)[source]¶

Compress a list of lists of strings into a single str with comma-separated elements.

Parameters:: list_to_compress (list) – List of lists of HED str to turn into a list of single HED strings.
Returns:: List of same length as list_to_compress with each entry being a str.
Return type:: list

HedTagManager¶

class hed.tools.analysis.hed_tag_manager.HedTagManager(event_manager, remove_types=None)[source]¶

Bases: object

Manager for the HED tags from a columnar file.

__init__(event_manager, remove_types=None)[source]¶

Create a tag manager for one tabular file.

Parameters:

event_manager (EventManager) – an event manager for the tabular file.
remove_types (list or None) – List of type tags (such as condition-variable) to remove. If None, defaults to empty list.

get_hed_objs(include_context=True, replace_defs=False)[source]¶

Return a list of HED string objects of same length as the tabular file.

Parameters:

include_context (bool) – If True (default), include the Event-context group in the HED string.
replace_defs (bool) – If True (default=False), replace the Def tags with Definition contents.

Returns:

list - List of HED strings of same length as tabular file.

get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]¶

Return a HED string object with the types removed.

Parameters:

hed_str (str) – Represents a HED string.
remove_types (bool) – If False (the default), do not remove the types managed by this manager.
remove_group (bool) – If False (the default), do not remove the group when removing a type tag, otherwise remove its enclosing group.

HedTypeManager¶

class hed.tools.analysis.hed_type_manager.HedTypeManager(event_manager)[source]¶

Bases: object

Manager for type factors and type definitions.

__init__(event_manager)[source]¶

Create a variable manager for one tabular file for all type variables.

Parameters:: event_manager (EventManager) – An event manager for the tabular file.
Raises:: HedFileError – On errors such as unmatched onsets or missing definitions.

property types¶

Return a list of types managed by this manager.

Returns:: Type tags names.
Return type:: list

add_type(type_name)[source]¶

Add a type variable to be managed by this manager.

Parameters:: type_name (str) – Type tag name of the type to be added.

get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]¶

Return a DataFrame of factor vectors for the indicated HED tag and values.

Parameters:

type_tag (str) – HED tag to retrieve factors for.
type_values (list or None) – The values of the tag to create factors for or None if all unique values.
factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

Union[pd.DataFrame, None]

get_type(type_tag)[source]¶

Returns the HedType variable associated with the type tag.

Parameters:: type_tag (str) – HED tag to retrieve the type for.
Returns:: the values associated with this type tag.
Return type:: Union[HedType, None]

get_type_tag_factor(type_tag, type_value)[source]¶

Return the HedTypeFactors a specified value and extension.

Parameters:

type_tag (str or None) – HED tag for the type.
type_value (str or None) – Value of this tag to return the factors for.

get_type_def_names(type_var)[source]¶

Return the definitions associated with a particular type tag.

Parameters:: type_var (str) – The name of a type tag such as Condition-variable.
Returns:: Names of definitions that use this type.
Return type:: list

summarize_all(as_json=False)[source]¶

Return a dictionary containing the summaries for the types managed by this manager.

Parameters:: as_json (bool) – If False (the default), return as an object otherwise return as a JSON string.
Returns:: Dictionary with the summary.
Return type:: Union[dict, str]

TabularSummary¶

class hed.tools.analysis.tabular_summary.TabularSummary(value_cols=None, skip_cols=None, name='')[source]¶

Bases: object

Summarize the contents of columnar files.

__init__(value_cols=None, skip_cols=None, name='')[source]¶

Constructor for a BIDS tabular file summary.

Parameters:

value_cols (list, None) – List of columns to be treated as value columns.
skip_cols (list, None) – List of columns to be skipped.
name (str) – Name associated with the dictionary.

__str__()[source]¶: Return a str version of this summary.

extract_sidecar_template() → dict[source]¶

Extract a BIDS sidecar-compatible dictionary.

Returns:: A sidecar template that can be converted to JSON.
Return type:: dict

get_summary(as_json=False) → dict | str[source]¶

Return the summary in dictionary format.

Parameters:: as_json (bool) – If False, return as a Python dictionary, otherwise convert to a JSON dictionary.
Returns:: A dictionary containing the summary information or a JSON string if as_json is True.
Return type:: Union[dict, str]

get_number_unique(column_names=None) → dict[source]¶

Return the number of unique values in columns.

Parameters:: column_names (list, None) – A list of column names to analyze or all columns if None.
Returns:: Column names are the keys and the number of unique values in the column are the values.
Return type:: dict

update(data, name=None)[source]¶

Update the counts based on data.

Parameters:

data (DataFrame, str, or list) – DataFrame containing data to update.
name (str) – Name of the summary.

update_summary(tab_sum)[source]¶

Add TabularSummary values to this object.

Parameters:: tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

The value_cols and skip_cols are updated as long as they are not contradictory.
A new skip column cannot be used.

static extract_summary(summary_info) → TabularSummary[source]¶

Create a TabularSummary object from a serialized summary.

Parameters:: summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.
Returns:: contains the information in summary_info as a TabularSummary object.
Return type:: TabularSummary

static get_columns_info(dataframe, skip_cols=None) → dict[str, dict][source]¶

Extract unique value counts for columns.

Parameters:

dataframe (DataFrame) – The DataFrame to be analyzed.
skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names (strings) and values that: are dictionaries of unique value counts.

Return type:

dict[str, dict]

static make_combined_dicts(file_dictionary, skip_cols=None) → tuple[TabularSummary, dict[str, TabularSummary]][source]¶

Return combined and individual summaries.

Parameters:

file_dictionary (FileDictionary) – Dictionary of file name keys and full path.
skip_cols (list) – Name of the column.

Returns:

A combined summary of all files in the dictionary.
A dictionary where keys are file names and values are individual TabularSummary objects.

Return type:

tuple[TabularSummary, dict[str, TabularSummary]]

HedType¶

class hed.tools.analysis.hed_type.HedType(event_manager, name, type_tag='condition-variable')[source]¶

Bases: object

Manager of a type variable and its associated context.

__init__(event_manager, name, type_tag='condition-variable')[source]¶

Create a variable manager for one type-variable for one tabular file.

Parameters:

event_manager (EventManager) – Event manager instance
name (str) – Name of the tabular file as a unique identifier.
type_tag (str) – Lowercase short form of the tag to be managed.

Raises:

HedFileError – On errors such as unmatched onsets or missing definitions.

property total_events¶

get_type_value_factors(type_value)[source]¶

Return the HedTypeFactors associated with type_name or None.

Parameters:: type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).
Returns:: Union[HedTypeFactors, None]

get_type_value_level_info(type_value)[source]¶

Return type variable corresponding to type_value.

Parameters:: type_value (str)

Returns:

property type_variables¶

get_type_def_names()[source]¶: Return the type defs names

get_type_value_names()[source]¶

get_summary()[source]¶

get_type_factors(type_values=None, factor_encoding='one-hot')[source]¶

Create a dataframe with the indicated type tag values as factors.

Parameters:

type_values (list or None) – A list of values of type tags for which to generate factors.
factor_encoding (str) – Type of factor encoding (one-hot or categorical).

Returns:

Contains the specified factors associated with this type tag.

Return type:

pd.DataFrame

static get_type_list(type_tag, item)[source]¶

Find a list of the given type tag from a HedTag, HedGroup, or HedString.

Parameters:

type_tag (str) – a tag whose direct items you wish to remove
item (HedTag or HedGroup) – The item from which to extract condition type_variables.

Returns:

List of the items with this type_tag

Return type:

list

FileDictionary¶

class hed.tools.analysis.file_dictionary.FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶

Bases: object

A file dictionary keyed by entity pair indices.

Notes

The entities are identified as 0, 1, … depending on order in the base filename.
The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.

__init__(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶

Create a dictionary with full paths as values.

Parameters:

collection_name (str) – Name of the file collection for reference.
file_list (list, None) – List containing full paths of files of interest.
key_indices (tuple, None) – List of order of key-value pieces to assemble for the key.
separator (str) – Character used to separate pieces of key name.

Notes

This dictionary is used for cross listing BIDS style files for different studies.

Examples

If key_indices is (0, 2), the key generated for /tmp/sub-001_task-FaceCheck_run-01_events.tsv is sub_001_run-01.

property name¶: Name of this dictionary.

property key_list¶: Keys in this dictionary.

property file_dict¶: Dictionary of path values in this dictionary.

property file_list¶: List of path values in this dictionary.

create_file_dict(file_list, key_indices, separator)[source]¶

Create new dict based on key indices.

Parameters:

file_list (list) – Paths of the files to include.
key_indices (tuple) – A tuple of integers representing order of entities for key.
separator (str) – The separator used between entities to form the key.

get_file_path(key)[source]¶

Return file path corresponding to key.

Parameters:: key (str) – Key used to retrieve the file path.
Returns:: File path.
Return type:: str

iter_files()[source]¶

Iterator over the files in this dictionary.

Yields:: - str – Key into the dictionary. - file: File path.

key_diffs(other_dict)[source]¶

Return symmetric key difference with another dict.

Parameters:: other_dict (FileDictionary)
Returns:: The symmetric difference of the keys in this dictionary and the other one.
Return type:: list

output_files(title=None, logger=None)[source]¶

Return a string with the output of the list.

Parameters:

title (None, str) – Optional title.
logger (HedLogger) – Optional HED logger for recording.

Returns:

The dictionary in string form.

Return type:

str

Notes

The logger is updated if available.

static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]¶

Return a dictionary of files using entity keys.

Parameters:

file_list (list) – Paths to files to use.
key_indices (tuple) – Positions of entities to use for key.
separator (str) – Separator character used to construct key.

Returns:

Key is based on key indices and value is a full path.

Return type:

dict

static make_key(key_string, indices=(0, 2), separator='_')[source]¶

Create a key from specified entities.

Parameters:

key_string (str) – The string from which to extract the key (usually a filename or path).
indices (tuple) – Positions of entity pairs to use as key.
separator (str) – Separator between entity pairs in the created key.

Returns:

The created key.

Return type:

str

BIDS Tools¶

BidsDataset¶

class hed.tools.bids.bids_dataset.BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶

Bases: object

A BIDS dataset representation primarily focused on HED evaluation.

root_path¶

Real root path of the BIDS dataset.

Type:: str

schema¶

The schema used for evaluation.

Type:: HedSchema or HedSchemaGroup

file_groups¶

A dictionary of BidsFileGroup objects with a given file suffix.

Type:: dict

__init__(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶

Constructor for a BIDS dataset.

Parameters:

root_path (str) – Root path of the BIDS dataset.
schema (HedSchema or HedSchemaGroup) – A schema that overrides the one specified in dataset.
suffixes (list or None) – File name suffixes of items to include. If not provided, defaults to [‘events’, ‘participants’]. If None or empty list, includes all files.
exclude_dirs (list or None) – Directory names to exclude from traversal. If not provided, defaults to [‘sourcedata’, ‘derivatives’, ‘code’, ‘stimuli’]. If None or empty list, no directories are excluded.

get_file_group(suffix)[source]¶

Return the file group of files with the specified suffix.

Parameters:: suffix (str) – Suffix of the BidsFileGroup to be returned.
Returns:: The requested tabular group.
Return type:: Union[BidsFileGroup, None]

validate(check_for_warnings=False, schema=None)[source]¶

Validate the dataset.

Parameters:

check_for_warnings (bool) – If True, check for warnings.
schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.

Returns:

List of issues encountered during validation. Each issue is a dictionary.

Return type:

list

get_summary()[source]¶: Return an abbreviated summary of the dataset.

BidsFile¶

class hed.tools.bids.bids_file.BidsFile(file_path)[source]¶

Bases: object

A BIDS file with entity dictionary.

file_path¶

Real path of the file.

Type:: str

suffix¶

Suffix part of the filename.

Type:: str

ext¶

Extension (including the .).

Type:: str

entity_dict¶

Dictionary of entity-names (keys) and entity-values (values).

Type:: dict

Notes

This class may hold the merged sidecar giving metadata for this file as well as contents.

__init__(file_path)[source]¶

Constructor for a file path.

Parameters:: file_path (str) – Full path of the file.

property contents¶: Return the current contents of this object.

clear_contents()[source]¶: Set the contents attribute of this object to None.

get_entity(entity_name)[source]¶

Return the entity value for the specified entity.

Parameters:: entity_name (str) – Name of the BIDS entity, for example task, run, or sub.
Returns:: Entity value if any, otherwise None.
Return type:: Union[str, None]

get_key(entities=None)[source]¶

Return a key for this BIDS file given a list of entities.

Parameters:: entities (tuple) – A tuple of strings representing entities.
Returns:: A key based on this object.
Return type:: str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]¶

Set the contents of this object.

Parameters:

content_info (Any) – JSON dictionary The contents appropriate for this object.
overwrite (bool) – If False and the contents are not empty, do nothing.

Notes

Do not set if the contents are already set and no_overwrite is True.

__str__()[source]¶: Return a string representation of this object.

BidsFileGroup¶

class hed.tools.bids.bids_file_group.BidsFileGroup(root_path, file_list, suffix='events')[source]¶

Bases: object

Container for BIDS files with a specified suffix.

suffix¶

The file suffix specifying the class of file represented in this group (e.g., events).

Type:: str

sidecar_dict¶

A dictionary of sidecars associated with this suffix .

Type:: dict

datafile_dict¶

A dictionary with values either BidsTabularFile or BidsTimeseriesFile.

Type:: dict

sidecar_dir_dict¶

Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.

Type:: dict

__init__(root_path, file_list, suffix='events')[source]¶

Constructor for a BidsFileGroup.

Parameters:

file_list (list) – List of paths to the relevant tsv and json files.
suffix (str) – Suffix indicating the type this group represents (e.g. events, or channels, etc.).

summarize(value_cols=None, skip_cols=None)[source]¶

Return a BidsTabularSummary of group files.

Parameters:

value_cols (list) – Column names designated as value columns.
skip_cols (list) – Column names designated as columns to skip.

Returns:

A summary of the number of values in different columns if tabular group.

Return type:

Union[TabularSummary, None]

Notes

The columns that are not value_cols or skip_col are summarized by counting

the number of times each unique value appears in that column.

validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]¶

Validate the sidecars and datafiles and return a list of issues.

Parameters:

hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
check_for_warnings (bool) – If True, include warnings in the check.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶

Validate merged sidecars.

Parameters:

hed_schema (HedSchema) – HED schema for validation.
extra_def_dicts (DefinitionDict) – Extra definitions.
error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶

Validate the datafiles and return an error list.

Parameters:

hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

list

Notes: This will clear the contents of the datafiles if they were not previously set.

static create_file_group(root_path, file_list, suffix)[source]¶

BidsSidecarFile¶

class hed.tools.bids.bids_sidecar_file.BidsSidecarFile(file_path)[source]¶

Bases: BidsFile

A BIDS sidecar file.

__init__(file_path)[source]¶

Constructs a bids sidecar from a file.

Parameters:: file_path (str) – The real path of the sidecar.

is_sidecar_for(obj)[source]¶

Return True if this is a sidecar for obj.

Parameters:: obj (BidsFile) – A BidsFile object to check.
Returns:: True if this is a BIDS parent of obj and False otherwise.
Return type:: bool

Notes

A sidecar is a sidecar for itself.

set_contents(content_info=None, name='unknown', overwrite=False)[source]¶

Set the contents of the sidecar.

Parameters:

content_info (dict, or None) – If None, create a Sidecar from the object’s file-path.
name (str) – The name of the sidecar.
overwrite (bool) – If True, overwrite contents if already set.

Notes

The handling of content_info is as follows:
- None: This object’s file_path is used.
- dict: This is interpreted as a JSON dictionary.

static is_hed(json_dict)[source]¶

Return True if the json has HED.

Parameters:: json_dict (dict) – A dictionary representing a JSON file or merged file.
Returns:: True if the dictionary has HED or HED_assembled as a first or second-level key.
Return type:: bool

static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]¶

Merge a list of sidecars into a single sidecar.

Parameters:

sidecar_list (list) – A list of Sidecar objects.
name (str) – The name of the merged sidecar.

Returns:

A sidecar constructed from the merged list.

Return type:

Union[Sidecar, None]

BidsTabularFile¶

class hed.tools.bids.bids_tabular_file.BidsTabularFile(file_path)[source]¶

Bases: BidsFile

A BIDS tabular file including its associated sidecar.

__init__(file_path)[source]¶

Constructor for a BIDS tabular file.

Parameters:: file_path (str) – Path of the tabular file.

set_contents(content_info=None, overwrite=False)[source]¶

Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.

Parameters:

content_info (None) – This always uses the internal file_path to create the contents.
overwrite (bool) – If False (The Default), do not overwrite existing contents if any.

set_sidecar(sidecar)[source]¶

Set the sidecar for this tabular file.

Parameters:: sidecar (Sidecar) – The sidecar for this tabular file.

BIDS Utilities¶

hed.tools.bids.bids_util.get_schema_from_description(root_path)[source]¶

hed.tools.bids.bids_util.group_by_suffix(file_list)[source]¶

Group files by suffix.

Parameters:: file_list (list) – List of file paths.
Returns:: Dictionary with suffixes as keys and file lists as values.
Return type:: dict

hed.tools.bids.bids_util.parse_bids_filename(file_path)[source]¶

Split a filename into BIDS-relevant components.

Parameters:: file_path (str) – Path to be parsed.
Returns:: Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.
Return type:: dict

Notes

Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.

hed.tools.bids.bids_util.update_entity(name_dict, entity)[source]¶

Update the dictionary with a new entity.

Parameters:

name_dict (dict) – Dictionary of entities.
entity (str) – Entity to be added.

hed.tools.bids.bids_util.get_merged_sidecar(root_path, tsv_file)[source]¶

hed.tools.bids.bids_util.walk_back(root_path, file_path)[source]¶

hed.tools.bids.bids_util.get_candidates(source_dir, tsv_file_dict)[source]¶

hed.tools.bids.bids_util.matches_criteria(json_file_dict, tsv_file_dict)[source]¶