Tools ¶

get_onset_lines(line)[source]¶: Get the lines in the input data with the same line numbers as the data_frame.

insert_issue_details(issues)[source]¶

Inserts issue details as part of the ‘message’ key for a list of issues.

Parameters:: issues (list) – List of issues to get details for.

validate_event_tags()[source]¶

Verify that the events in the HED strings validly represent events.

Returns:: each element is a dictionary with ‘code’ and ‘message’ keys,
Return type:: list

EventsSummary¶

class EventsSummary(hed_schema, file, sidecar=None, name=None)[source]¶

Bases: object

Summarizes HED event annotations for a tabular file, grouping tags by stimulus/response categories.

CUTOFF_TAGS = {'blue-color', 'brown-color', 'cyan-color', 'gray-color', 'green-color', 'orange-color', 'pink-color', 'purple-color', 'red-color', 'visual-presentation', 'white-color', 'yellow-color'}¶

EXCLUDED_PARENTS = {'data-marker', 'data-resolution', 'grayscale', 'hsv-color', 'informational-property', 'luminance', 'luminance-contrast', 'opacity', 'organizational-property', 'quantitative-value', 'relation', 'rgb-color', 'spatiotemporal-value', 'statistical-value', 'task-effect-evidence', 'task-relationship'}¶

FILTERED_TAGS = {'action', 'agent', 'agent-cognitive-state', 'agent-emotional-state', 'agent-physiological-state', 'agent-postural-state', 'agent-property', 'agent-state', 'agent-task-role', 'agent-trait', 'anatomical-item', 'auditory-attribute', 'auditory-device', 'biological-artifact', 'biological-item', 'body-part', 'categorical-class-value', 'categorical-judgment-value', 'categorical-level-value', 'categorical-location-value', 'categorical-orientation-value', 'categorical-value', 'computing-device', 'dara-source-type', 'data-property', 'data-value', 'data-variability-attribute', 'device', 'display-device', 'document', 'environmental-property', 'event', 'face-part', 'geometric-object', 'gustatory-attribute', 'head-part', 'input-device', 'io-device', 'item', 'language-item', 'lower-extremity-part', 'man-made-object', 'media', 'media-clip', 'move-body-part', 'natural-object', 'nonbiological-artifact', 'object', 'olfactory-attribute', 'output-device', 'physical-value', 'property', 'recording-device', 'sensory-attribute', 'sensory-presentation', 'sensory-property', 'spatial-property', 'spectral-property', 'tactile-attribute', 'task-action-type', 'task-attentional-demand', 'task-event-role', 'task-property', 'task-stimulus-role', 'temporal-property', 'torso-part', 'upper-extremity-part', 'visual-attribute', 'visualization'}¶

MATCH_TYPES = ['Experimental-stimulus', 'Participant-response', 'Cue', 'Feedback', 'Instructional', 'Sensory-event', 'Agent-action']¶

REMOVE_TYPES = ['Condition-variable', 'Task']¶

extract_tag_summary()[source]¶

Extract a summary of the tags in a given tabular input file.

Returns:

dict: A dictionary with the summary information - (str, list)
list: A set of tags that do not match any of the specified types but are not excluded.

Return type:

tuple[dict, list]

static match_tags(all_tags, key)[source]¶

Return True if any tag in all_tags has a short_base_tag matching key.

Parameters:

all_tags (list[HedTag]) – The tags to search.
key (str) – The short base tag name to look for.

Returns:

True if a match is found.

Return type:

bool

update_tags(tag_set, all_tags)[source]¶

Add the most-specific ancestor tag names from all_tags into tag_set, respecting cutoff categories.

Parameters:

tag_set (set) – The running set of tag terms to update.
all_tags (list[HedTag]) – Tags to process.

Returns:

The updated tag_set.

Return type:

set

HedTagManager¶

class HedTagManager(event_manager, remove_types=None)[source]¶

Bases: object

Manager for the HED tags from a columnar file.

get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]¶

Return a HED string object with the types removed.

Parameters:

hed_str (str) – Represents a HED string.
remove_types (bool) – If False (the default), do not remove the types managed by this manager.
remove_group (bool) – If False (the default), do not remove the group when removing a type tag, otherwise remove its enclosing group.

get_hed_objs(include_context=True, replace_defs=False)[source]¶

Return a list of HED string objects of same length as the tabular file.

Parameters:

include_context (bool) – If True (default), include the Event-context group in the HED string.
replace_defs (bool) – If True (default=False), replace the Def tags with Definition contents.

Returns:

list - List of HED strings of same length as tabular file.

HedTagCount¶

class HedTagCount(hed_tag, file_name)[source]¶

Bases: object

Counts for a particular HedTag in particular file.

get_empty()[source]¶

Return a copy of this entry with counts reset to zero.

Returns:: A new instance with the same tag name but zeroed event/file counts.
Return type:: HedTagCount

get_info(verbose=False) → dict[source]¶

Return counts for this tag.

Parameters:: verbose (bool) – If False (the default) only number of files included, otherwise a list of files.
Returns:: Keys are ‘tag’, ‘events’, and ‘files’.
Return type:: dict

get_summary() → dict[source]¶

Return a dictionary summary of the events and files for this tag.

Returns:: dictionary summary of events and files that contain this tag.
Return type:: dict

set_value(hed_tag)[source]¶

Update the tag term value counts for a HedTag.

Parameters:: hed_tag (HedTag or None) – Item to use to update the value counts.

HedTagCounts¶

class HedTagCounts(name, total_events=0)[source]¶

Bases: object

Counts of HED tags for a group of columnar files.

Parameters:

name (str) – An identifier for these counts (usually the filename of the tabular file).
total_events (int) – The total number of events in the columnar file.

static create_template(tags) → dict[source]¶

Creates a dictionary with keys based on list of keys in tags dictionary.

Parameters:: tags (dict) – dictionary of tags and key lists.
Returns:: Dictionary with keys in key lists and values are empty lists.
Return type:: dict

Note: This class is used to organize the results of the tags based on a template for display.

get_summary() → dict[source]¶

Return a summary object containing the tag count information of this summary.

Returns:: Keys are ‘name’, ‘files’, ‘total_events’, and ‘details’.
Return type:: dict

merge_tag_dicts(other_dict)[source]¶

Merge the information from another dictionary with this object’s tag dictionary.

Parameters:: other_dict (dict) – Dictionary of tag, HedTagCount to merge.

organize_tags(tag_template) → tuple[source]¶

Organize tags into categories as specified by the tag_template.

Parameters:: tag_template (dict) – A dictionary whose keys are titles and values are lists of HED tags (str).
Returns:: A tuple containing two elements. - dict: Keys are tags (strings) and values are list of HedTagCount for items fitting template. - list: HedTagCount objects corresponding to tags that don’t fit the template.
Return type:: [tuple[dict, list]]

update_tag_counts(hed_string_obj, file_name)[source]¶

Update the tag counts based on a HedString object.

Parameters:

hed_string_obj (HedString) – The HED string whose tags should be counted.
file_name (str) – The name of the file corresponding to these counts.

HedTypeManager¶

class HedTypeManager(event_manager)[source]¶

Bases: object

Manager for type factors and type definitions.

add_type(type_name)[source]¶

Add a type variable to be managed by this manager.

Parameters:: type_name (str) – Type tag name of the type to be added.

get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]¶

Return a DataFrame of factor vectors for the indicated HED tag and values.

Parameters:

type_tag (str) – HED tag to retrieve factors for.
type_values (list or None) – The values of the tag to create factors for or None if all unique values.
factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).

Returns:

DataFrame containing the factor vectors as the columns.

Return type:

Union[pd.DataFrame, None]

get_type(type_tag)[source]¶

Returns the HedType variable associated with the type tag.

Parameters:: type_tag (str) – HED tag to retrieve the type for.
Returns:: the values associated with this type tag.
Return type:: Union[HedType, None]

get_type_def_names(type_var)[source]¶

Return the definitions associated with a particular type tag.

Parameters:: type_var (str) – The name of a type tag such as Condition-variable.
Returns:: Names of definitions that use this type.
Return type:: list

get_type_tag_factor(type_tag, type_value)[source]¶

Return the HedTypeFactors a specified value and extension.

Parameters:

type_tag (str or None) – HED tag for the type.
type_value (str or None) – Value of this tag to return the factors for.

summarize_all(as_json=False)[source]¶

Return a dictionary containing the summaries for the types managed by this manager.

Parameters:: as_json (bool) – If False (the default), return as an object otherwise return as a JSON string.
Returns:: Dictionary with the summary.
Return type:: Union[dict, str]

property types¶

Return a list of types managed by this manager.

Returns:: Type tags names.
Return type:: list

HedType¶

class HedType(event_manager, name, type_tag='condition-variable')[source]¶

Bases: object

Manager of a type variable and its associated context.

get_summary()[source]¶

Return a summary dict mapping each type-value name to its factor summary.

Returns:: Keys are type-value name strings; values are factor summary dicts.
Return type:: dict

get_type_def_names()[source]¶: Return the type defs names

get_type_factors(type_values=None, factor_encoding='one-hot')[source]¶

Create a dataframe with the indicated type tag values as factors.

Parameters:

type_values (list or None) – A list of values of type tags for which to generate factors.
factor_encoding (str) – Type of factor encoding (one-hot or categorical).

Returns:

Contains the specified factors associated with this type tag.

Return type:

pd.DataFrame

static get_type_list(type_tag, item)[source]¶

Find a list of the given type tag from a HedTag, HedGroup, or HedString.

Parameters:

type_tag (str) – a tag whose direct items you wish to remove
item (HedTag or HedGroup) – The item from which to extract condition type_variables.

Returns:

List of the items with this type_tag

Return type:

get_type_value_factors(type_value)[source]¶

Return the HedTypeFactors associated with type_name or None.

Parameters:: type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).
Returns:: Union[HedTypeFactors, None]

get_type_value_level_info(type_value)[source]¶

Return type variable corresponding to type_value.

Parameters:: type_value (str)

Returns:

get_type_value_names()[source]¶

Return the list of type-value names defined in this HedType.

Returns:: Lowercased type-value name strings.
Return type:: list[str]

property total_events¶

Return the total number of events in the associated event list.

Returns:: Number of events.
Return type:: int

property type_variables¶

Return the set of type-value names (keys) found in this HedType.

Returns:: Set of lowercased type-value name strings.
Return type:: set[str]

HedTypeDefs¶

class HedTypeDefs(definitions, type_tag='condition-variable')[source]¶

Bases: object

Manager for definitions associated with a type such as condition-variable.

Properties:: def_map (dict): keys are definition names, values are dict {type_values, description, tags}.

Example: A definition ‘famous-face-cond’ with contents:

‘(Condition-variable/Face-type,Description/A face that should be recognized.,(Image,(Face,Famous)))’

would have type_values [‘face_type’]. All items are strings not objects.

static extract_def_names(item, no_value=True)[source]¶

Return a list of Def values in item.

Parameters:

item (HedTag, HedGroup, or HedString) – An item containing a def tag.
no_value (bool) – If True, strip off extra values after the definition name.

Returns:

A list of definition names (as strings).

Return type:

get_type_values(item)[source]¶

Return a list of type_tag values in item.

Parameters:: item (HedTag, HedGroup, or HedString) – An item potentially containing def tags.
Returns:: A list of the unique values associated with this type
Return type:: list

static split_name(name, lowercase=True)[source]¶

Split a name/# or name/x into name, x.

Parameters:

name (str) – The extension or value portion of a tag.
lowercase (bool) – If True (default), return values are converted to lowercase.

Returns:

Name of the definition.
Value of the definition if it has one.

Return type:

tuple[str, str]

property type_def_names¶

Return list of names of definition that have this type-variable.

Returns:: definition names that have this type.
Return type:: list

property type_names¶

Return list of names of the type-variables associated with type definitions.

Returns:: type names associated with the type definitions
Return type:: list

HedTypeFactors¶

class HedTypeFactors(type_tag, type_value, number_elements)[source]¶

Bases: object

Holds index of positions for a variable type for A columnar file.

ALLOWED_ENCODINGS = ('categorical', 'one-hot')¶

get_factors(factor_encoding='one-hot')[source]¶

Return a DataFrame of factor vectors for this type factor.

Parameters:: factor_encoding (str) – Specifies type of factor encoding (one-hot or categorical).
Returns:: DataFrame containing the factor vectors as the columns.
Return type:: pd.DataFrame

get_summary()[source]¶

Return the summary of the type tag value as a dictionary.

Returns:: Contains the summary.
Return type:: dict

HedTypeCount¶

class HedTypeCount(type_value, type_tag, file_name=None)[source]¶

Bases: object

Manager of the counts of tags for one type tag such as Condition-variable or Task.

Parameters:

type_value (str) – The value of the variable to be counted.
type_tag (str) – The type of variable.

Examples

HedTypeCounts(‘SymmetricCond’, ‘condition-variable’) keeps counts of Condition-variable/Symmetric.

get_summary()[source]¶

Return the summary of one value of one type tag.

Returns:: Count information for one tag of one type.
Return type:: dict

to_dict()[source]¶: Return count information as a dictionary.

update(type_sum, file_id)[source]¶

Update the counts from a HedTypeValues.

Parameters:

type_sum (dict) – Information about the contents for a particular data file.
file_id (str or None) – Name of the file associated with the counts.

HedTypeCounts¶

class HedTypeCounts(name, type_tag)[source]¶

Bases: object

Manager for summaries of tag counts for columnar files.

add_descriptions(type_defs)[source]¶

Update this summary based on the type variable map.

Parameters:: type_defs (HedTypeDefs) – Contains the information about the value of a type.

get_summary()[source]¶

Return the information in the manager as a dictionary.

Returns:: Dict with keys ‘name’, ‘type_tag’, ‘files’, ‘total_events’, and ‘details’.
Return type:: dict

update(counts)[source]¶

Update count information based on counts in another HedTypeCounts.

Parameters:: counts (HedTypeCounts) – Information to use in the update.

update_summary(type_sum, total_events=0, file_id=None)[source]¶

Update this summary based on the type variable map.

Parameters:

type_sum (dict) – Contains the information about the value of a type.
total_events (int) – Total number of events processed.
file_id (str) – Unique identifier for the associated file.

TabularSummary¶

class TabularSummary(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]¶

Bases: object

Summarize the contents of columnar files.

extract_sidecar_template() → dict[source]¶

Extract a BIDS sidecar-compatible dictionary.

Returns:: A sidecar template that can be converted to JSON.
Return type:: dict

static extract_summary(summary_info) → TabularSummary[source]¶

Create a TabularSummary object from a serialized summary.

Parameters:: summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.
Returns:: contains the information in summary_info as a TabularSummary object.
Return type:: TabularSummary

static get_columns_info(dataframe, skip_cols=None) → dict[str, dict][source]¶

Extract unique value counts for columns.

Parameters:

dataframe (DataFrame) – The DataFrame to be analyzed.
skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names (strings) and values that: are dictionaries of unique value counts.

Return type:

dict[str, dict]

get_number_unique(column_names=None) → dict[source]¶

Return the number of unique values in columns.

Parameters:: column_names (list, None) – A list of column names to analyze or all columns if None.
Returns:: Column names are the keys and the number of unique values in the column are the values.
Return type:: dict

get_summary(as_json=False) → dict | str[source]¶

Return the summary in dictionary format.

Parameters:: as_json (bool) – If False, return as a Python dictionary, otherwise convert to a JSON dictionary.
Returns:: A dictionary containing the summary information or a JSON string if as_json is True.
Return type:: Union[dict, str]

static make_combined_dicts(file_dictionary, skip_cols=None) → tuple[TabularSummary, dict[str, TabularSummary]][source]¶

Return combined and individual summaries.

Parameters:

file_dictionary (FileDictionary) – Dictionary of file name keys and full path.
skip_cols (list) – Name of the column.

Returns:

A combined summary of all files in the dictionary.
A dictionary where keys are file names and values are individual TabularSummary objects.

Return type:

tuple[TabularSummary, dict[str, TabularSummary]]

update(data, name=None)[source]¶

Update the counts based on data (DataFrame, filename, or list of filenames).

Parameters:

data (DataFrame, str, or list) – DataFrame containing data to update.
name (str) – Name of the summary.

update_summary(tab_sum)[source]¶

Add TabularSummary values to this object.

Parameters:: tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

The value_cols and skip_cols are updated as long as they are not contradictory.
A new skip column cannot be used.

ColumnNameSummary¶

class ColumnNameSummary(name='')[source]¶

Bases: object

Summarize the unique column names in a dataset.

get_summary(as_json=False)[source]¶

Return summary as an object or in JSON.

Parameters:: as_json (bool) – If False (the default), return the underlying summary object, otherwise transform to JSON.

update(name, columns)[source]¶

Update the summary based on columns associated with a file.

Parameters:

name (str) – File name associated with the columns.
columns (list) – List of file names.

update_headers(column_names)[source]¶

Update the unique combinations of column names.

Parameters:: column_names (list) – List of column names to update.

FileDictionary¶

class FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶

Bases: object

A file dictionary keyed by entity pair indices.

Notes

The entities are identified as 0, 1, … depending on order in the base filename.
The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.

create_file_dict(file_list, key_indices, separator)[source]¶

Create new dict based on key indices.

Parameters:

file_list (list) – Paths of the files to include.
key_indices (tuple) – A tuple of integers representing order of entities for key.
separator (str) – The separator used between entities to form the key.

property file_dict¶: Dictionary of path values in this dictionary.

property file_list¶: List of path values in this dictionary.

get_file_path(key)[source]¶

Return file path corresponding to key.

Parameters:: key (str) – Key used to retrieve the file path.
Returns:: File path.
Return type:: str

iter_files()[source]¶

Iterator over the files in this dictionary.

Yields:: - str – Key into the dictionary. - file: File path.

key_diffs(other_dict)[source]¶

Return symmetric key difference with another dict.

Parameters:: other_dict (FileDictionary)
Returns:: The symmetric difference of the keys in this dictionary and the other one.
Return type:: list

property key_list¶: Keys in this dictionary.

static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]¶

Return a dictionary of files using entity keys.

Parameters:

file_list (list) – Paths to files to use.
key_indices (tuple) – Positions of entities to use for key.
separator (str) – Separator character used to construct key.

Returns:

Key is based on key indices and value is a full path.

Return type:

static make_key(key_string, indices=(0, 2), separator='_')[source]¶

Create a key from specified entities.

Parameters:

key_string (str) – The string from which to extract the key (usually a filename or path).
indices (tuple) – Positions of entity pairs to use as key.
separator (str) – Separator between entity pairs in the created key.

Returns:

The created key.

Return type:

property name¶: Name of this dictionary.

output_files(title=None)[source]¶

Return a string with the output of the list.

Parameters:: title (None, str) – Optional title.
Returns:: The dictionary in string form.
Return type:: str

KeyMap¶

class KeyMap(key_cols, target_cols=None, name='')[source]¶

Bases: object

A map of unique column values for remapping columns.

key_cols¶

A list of column names that will be hashed into the keys for the map.

Type:: list

target_cols¶

Optional list of column names that will be inserted into data and later remapped.

Type:: list or None

name¶

An optional name of this remap for identification purposes.

Type:: str

Notes: This mapping converts all columns in the mapping to strings. The remapping does not support other types of columns.

property columns¶

Return the column names of the columns managed by this map.

Returns:: Column names of the columns managed by this map.
Return type:: list

make_template(additional_cols=None, show_counts=True)[source]¶

Return a dataframe template.

Parameters:

additional_cols (list or None) – Optional list of additional columns to append to the returned dataframe.
show_counts (bool) – If True, number of times each key combination appears is in first column and values are sorted in descending order by.

Returns:

A dataframe containing the template.

Return type:

DataFrame

Raises:

HedFileError – If additional columns are not disjoint from the key columns.

Notes

The template consists of the unique key columns in this map plus additional columns.

remap(data)[source]¶

Remap the columns of a dataframe or columnar file.

Parameters:

data (DataFrame, str) – Columnar data (either DataFrame or filename) whose columns are to be remapped.

Returns:

New dataframe with columns remapped.
List of row numbers that had no correspondence in the mapping.

Return type:

tuple [DataFrame, list]

Raises:

HedFileError – If data is missing some of the key columns.

static remove_quotes(df, columns=None)[source]¶

Remove quotes from the specified columns and convert to string.

Parameters:

df (Dataframe) – Dataframe to process by removing quotes.
columns (list) – List of column names. If None, all columns are used.

Notes

Replacement is done in place.

resort()[source]¶: Sort the col_map in place by the key columns.

update(data, allow_missing=True)[source]¶

Update the existing map with information from data.

Parameters:

data (DataFrame or str) – DataFrame or filename of an events file or event map.
allow_missing (bool) – If True allow missing keys and add as n/a columns.

Raises:

HedFileError – If there are missing keys and allow_missing is False.

TemporalEvent¶

class TemporalEvent(contents, start_index, start_time)[source]¶

Bases: object

A single event process with starting and ending times.

Note: the contents have the Onset and duration removed.

set_end(end_index, end_time)[source]¶

Set end time information for an event process.

Parameters:

end_index (int) – Position of ending event marker corresponding to the end of this event process.
end_time (float) – Ending time of the event (usually in seconds).

Annotation utilities¶

Utilities to facilitate annotation of events in BIDS.

check_df_columns(df, required_cols=('column_name', 'column_value', 'description', 'HED')) → list[str][source]¶

Return a list of the specified columns that are missing from a dataframe.

Parameters:

df (DataFrame) – Spreadsheet to check the columns of.
required_cols (tuple) – List of column names that must be present.

Returns:

List of column names that are missing.

Return type:

list[str]

df_to_hed(dataframe, description_tag=True) → dict[source]¶

Create sidecar-like dictionary from a 4-column dataframe.

Parameters:

dataframe (DataFrame) – A four-column Pandas DataFrame with specific columns.
description_tag (bool) – If True description tag is included.

Returns:

A dictionary compatible with BIDS JSON tabular file that includes HED.

Return type:

Notes

The DataFrame must have the columns with names: column_name, column_value, description, and HED.

extract_tags(hed_string, search_tag) → tuple[str, list[str]][source]¶

Extract all instances of specified tag from a tag_string.

Parameters:

hed_string (str) – Tag string from which to extract tag.
search_tag (str) – HED tag to extract.

Returns:

tuple[str, list[str]

Tag string without the tags.
A list of the tags that were extracted, for example descriptions.

generate_sidecar_entry(column_name, column_values=None) → dict[source]¶

Create a sidecar column dictionary for column.

Parameters:

column_name (str) – Name of the column.
column_values – List of column values.

hed_to_df(sidecar_dict, col_names=None) → DataFrame[source]¶

Return a 4-column dataframe of HED portions of sidecar.

Parameters:

sidecar_dict (dict) – A dictionary conforming to BIDS JSON events sidecar format.
col_names (list, None) – A list of the cols to include in the flattened sidecar.

Returns:

Four-column spreadsheet representing HED portion of sidecar.

Return type:

DataFrame

Notes

The returned DataFrame has columns: column_name, column_value, description, and HED.

merge_hed_dict(sidecar_dict, hed_dict)[source]¶

Update a JSON sidecar based on the hed_dict values.

Parameters:

sidecar_dict (dict) – Dictionary representation of a BIDS JSON sidecar.
hed_dict (dict) – Dictionary derived from a dataframe representation of HED in sidecar.

series_to_factor(series) → list[int][source]¶

Convert a series to an integer factor list.

Parameters:: series (pd.Series) – Series to be converted to a list.
Returns:: list[int] - contains 0’s and 1’s, empty, ‘n/a’ and np.nan are converted to 0.

str_to_tabular(tsv_str, sidecar=None) → TabularInput[source]¶

Return a TabularInput a tsv string.

Parameters:

tsv_str (str) – A string representing a tabular input.
sidecar – An optional Sidecar object.

strs_to_hed_objs(hed_strings, hed_schema) → list[HedString] | None[source]¶

Returns a list of HedString objects from a list of strings.

Parameters:

hed_strings (string or list) – String or strings representing HED annotations.
hed_schema (HedSchema or HedSchemaGroup) – Schema version for the strings.

Returns:

A list of HedString objects or None.

Return type:

Union[list[HedString], None]

strs_to_sidecar(sidecar_strings) → Sidecar | None[source]¶

Return a Sidecar from a sidecar as string or as a list of sidecars as strings.

Parameters:: sidecar_strings (string or list) – String or strings representing sidecars.
Returns:: the merged sidecar from the list.
Return type:: Union[Sidecar, None]

to_factor(data, column=None) → list[int][source]¶

Convert data to an integer factor list.

Parameters:

data (Series or DataFrame) – Series or DataFrame to be converted to a list.
column (str, optional) – Column name if DataFrame, otherwise column 0 is used.

Returns:

A list containing 0’s and 1’s. Empty, ‘n/a’, and np.nan values are converted to 0.

Return type:

list[int]

to_strlist(obj_list) → list[str][source]¶

Convert objects in a list to strings, preserving None values.

Parameters:: obj_list (list) – A list of objects that are None or have a str method.
Returns:: A list with the objects converted to strings. None values are preserved as empty strings.
Return type:: list[str]

BIDS tools¶

BidsDataset¶

class BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶

Bases: object

A BIDS dataset representation primarily focused on HED evaluation.

root_path¶

Real root path of the BIDS dataset.

Type:: str

schema¶

The schema used for evaluation.

Type:: HedSchema or HedSchemaGroup

file_groups¶

A dictionary of BidsFileGroup objects with a given file suffix.

Type:: dict

get_file_group(suffix)[source]¶

Return the file group of files with the specified suffix.

Parameters:: suffix (str) – Suffix of the BidsFileGroup to be returned.
Returns:: The requested tabular group.
Return type:: Union[BidsFileGroup, None]

get_summary()[source]¶: Return an abbreviated summary of the dataset.

validate(check_for_warnings=False, schema=None)[source]¶

Validate the dataset.

Parameters:

check_for_warnings (bool) – If True, check for warnings.
schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.

Returns:

List of issues encountered during validation. Each issue is a dictionary.

Return type:

BidsFile¶

class BidsFile(file_path)[source]¶

Bases: object

A BIDS file with entity dictionary.

file_path¶

Real path of the file.

Type:: str

suffix¶

Suffix part of the filename.

Type:: str

ext¶

Extension (including the .).

Type:: str

entity_dict¶

Dictionary of entity-names (keys) and entity-values (values).

Type:: dict

Notes

This class may hold the merged sidecar giving metadata for this file as well as contents.

clear_contents()[source]¶: Set the contents attribute of this object to None.

property contents¶: Return the current contents of this object.

get_entity(entity_name)[source]¶

Return the entity value for the specified entity.

Parameters:: entity_name (str) – Name of the BIDS entity, for example task, run, or sub.
Returns:: Entity value if any, otherwise None.
Return type:: Union[str, None]

get_key(entities=None)[source]¶

Return a key for this BIDS file given a list of entities.

Parameters:: entities (tuple) – A tuple of strings representing entities.
Returns:: A key based on this object.
Return type:: str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]¶

Set the contents of this object.

Parameters:

content_info (Any) – JSON dictionary The contents appropriate for this object.
overwrite (bool) – If False and the contents are not empty, do nothing.

Notes

Do not set if the contents are already set and no_overwrite is True.

BidsFileGroup¶

class BidsFileGroup(root_path, file_list, suffix='events')[source]¶

Bases: object

Container for BIDS files with a specified suffix.

suffix¶

The file suffix specifying the class of file represented in this group (e.g., events).

Type:: str

sidecar_dict¶

A dictionary of sidecars associated with this suffix .

Type:: dict

datafile_dict¶

A dictionary with values either BidsTabularFile or BidsTimeseriesFile.

Type:: dict

sidecar_dir_dict¶

Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.

Type:: dict

static create_file_group(root_path, file_list, suffix)[source]¶

Construct a BidsFileGroup from a list of files sharing the given suffix.

Parameters:

root_path (str) – Root path of the BIDS dataset.
file_list (list[str]) – List of file paths belonging to this suffix group.
suffix (str) – BIDS file suffix identifying this group (e.g. events).

Returns:

The constructed group, or None if it contains no sidecars or data files.

Return type:

BidsFileGroup or None

get_task_names()[source]¶

Return a sorted list of unique task names found in the file group’s TSV and JSON filenames.

Returns:: Sorted list of unique task name strings (the xxxx portion of task-xxxx entities).
Return type:: list

Notes

Parses both sidecar_dict and datafile_dict file paths.
The BIDS task- entity is matched case-insensitively.

summarize(value_cols=None, skip_cols=None)[source]¶

Return a BidsTabularSummary of group files.

Parameters:

value_cols (list) – Column names designated as value columns.
skip_cols (list) – Column names designated as columns to skip.

Returns:

A summary of the number of values in different columns if tabular group.

Return type:

Union[TabularSummary, None]

Notes

The columns that are not value_cols or skip_col are summarized by counting

the number of times each unique value appears in that column.

validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]¶

Validate the sidecars and datafiles and return a list of issues.

Parameters:

hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
check_for_warnings (bool) – If True, include warnings in the check.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶

Validate the datafiles and return an error list.

Parameters:

hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

Notes: This will clear the contents of the datafiles if they were not previously set.

validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶

Validate merged sidecars.

Parameters:

hed_schema (HedSchema) – HED schema for validation.
extra_def_dicts (DefinitionDict) – Extra definitions.
error_handler (ErrorHandler) – Error handler to use.

Returns:

A list of validation issues found. Each issue is a dictionary.

Return type:

BidsSidecarFile¶

class BidsSidecarFile(file_path)[source]¶

Bases: BidsFile

A BIDS sidecar file.

clear_contents()¶: Set the contents attribute of this object to None.

property contents¶: Return the current contents of this object.

get_entity(entity_name)¶

Return the entity value for the specified entity.

Parameters:: entity_name (str) – Name of the BIDS entity, for example task, run, or sub.
Returns:: Entity value if any, otherwise None.
Return type:: Union[str, None]

get_key(entities=None)¶

Return a key for this BIDS file given a list of entities.

Parameters:: entities (tuple) – A tuple of strings representing entities.
Returns:: A key based on this object.
Return type:: str

Notes

If entities is None, then the file path is used as the key.

static is_hed(json_dict)[source]¶

Return True if the json has HED.

Parameters:: json_dict (dict) – A dictionary representing a JSON file or merged file.
Returns:: True if the dictionary has HED or HED_assembled as a first or second-level key.
Return type:: bool

is_sidecar_for(obj)[source]¶

Return True if this is a sidecar for obj.

Parameters:: obj (BidsFile) – A BidsFile object to check.
Returns:: True if this is a BIDS parent of obj and False otherwise.
Return type:: bool

Notes

A sidecar is a sidecar for itself.

static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]¶

Merge a list of sidecars into a single sidecar.

Parameters:

sidecar_list (list) – A list of Sidecar objects.
name (str) – The name of the merged sidecar.

Returns:

A sidecar constructed from the merged list.

Return type:

Union[Sidecar, None]

set_contents(content_info=None, name='unknown', overwrite=False)[source]¶

Set the contents of the sidecar.

Parameters:

content_info (dict, or None) – If None, create a Sidecar from the object’s file-path.
name (str) – The name of the sidecar.
overwrite (bool) – If True, overwrite contents if already set.

Notes

The handling of content_info is as follows:
- None: This object’s file_path is used.
- dict: This is interpreted as a JSON dictionary.

BidsTabularFile¶

class BidsTabularFile(file_path)[source]¶

Bases: BidsFile

A BIDS tabular file including its associated sidecar.

clear_contents()¶: Set the contents attribute of this object to None.

property contents¶: Return the current contents of this object.

get_entity(entity_name)¶

Return the entity value for the specified entity.

Parameters:: entity_name (str) – Name of the BIDS entity, for example task, run, or sub.
Returns:: Entity value if any, otherwise None.
Return type:: Union[str, None]

get_key(entities=None)¶

Return a key for this BIDS file given a list of entities.

Parameters:: entities (tuple) – A tuple of strings representing entities.
Returns:: A key based on this object.
Return type:: str

Notes

If entities is None, then the file path is used as the key.

set_contents(content_info=None, overwrite=False)[source]¶

Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.

Parameters:

content_info (None) – This always uses the internal file_path to create the contents.
overwrite (bool) – If False (The Default), do not overwrite existing contents if any.

set_sidecar(sidecar)[source]¶

Set the sidecar for this tabular file.

Parameters:: sidecar (Sidecar) – The sidecar for this tabular file.

BIDS utilities¶

BIDS utility functions for schema loading, sidecar merging, and inheritance chain resolution.

get_candidates(source_dir, tsv_file_dict)[source]¶

Return sidecar JSON files in source_dir that are applicable to tsv_file_dict.

Parameters:

source_dir (str) – Directory to search for candidate sidecar files.
tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

Absolute paths to matching sidecar JSON files.

Return type:

list[str]

get_merged_sidecar(root_path, tsv_file)[source]¶

Return a merged sidecar dict following BIDS inheritance rules for a given TSV file.

Parameters:

root_path (str) – Root path of the BIDS dataset.
tsv_file (str) – Path to the TSV file whose inherited sidecars should be merged.

Returns:

Merged sidecar dictionary. Keys from closer (more specific) sidecar files take precedence.

Return type:

get_schema_from_description(root_path)[source]¶

Load the HED schema version declared in the BIDS dataset_description.json.

Parameters:: root_path (str) – Root path of the BIDS dataset.
Returns:: The loaded schema, or None if loading fails.
Return type:: HedSchema or None

group_by_suffix(file_list)[source]¶

Group files by suffix.

Parameters:: file_list (list) – List of file paths.
Returns:: Dictionary with suffixes as keys and file lists as values.
Return type:: dict

matches_criteria(json_file_dict, tsv_file_dict)[source]¶

Return True if a candidate sidecar JSON file applies to the given TSV file.

A sidecar applies when its extension is .json, its suffix matches the TSV, and all BIDS entities in the JSON filename have equal values in the TSV filename.

Parameters:

json_file_dict (dict) – Parsed BIDS filename dict for the candidate JSON file.
tsv_file_dict (dict) – Parsed BIDS filename dict for the target TSV file.

Returns:

True if the sidecar is applicable.

Return type:

bool

parse_bids_filename(file_path)[source]¶

Split a filename into BIDS-relevant components.

Parameters:: file_path (str) – Path to be parsed.
Returns:: Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.
Return type:: dict

Notes

Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.

update_entity(name_dict, entity)[source]¶

Update the dictionary with a new entity.

Parameters:

name_dict (dict) – Dictionary of entities.
entity (str) – Entity to be added.

walk_back(root_path, file_path)[source]¶

Yield inherited sidecar file paths from the directory of file_path back toward root_path.

Traverses parent directories from the file’s location up to root_path, yielding any sidecar JSON files that apply to the given TSV according to BIDS inheritance rules.

Parameters:

root_path (str) – Root path of the BIDS dataset.
file_path (str) – Path to the data file whose applicable sidecars should be found.

Yields:

str – Absolute paths of applicable sidecar JSON files, from nearest to farthest.

Utility functions¶

DataFrame utilities¶

Data handling utilities involving dataframes.

add_columns(df, column_list, value='n/a')[source]¶

Add specified columns to df if not there.

Parameters:

df (DataFrame) – Pandas dataframe.
column_list (list) – List of columns to append to the dataframe.
value (str) – Default fill value for the column.

check_match(ds1, ds2, numeric=False)[source]¶

Check two Pandas data series have the same values.

Parameters:

ds1 (DataSeries) – Pandas data series to check.
ds2 (DataSeries) – Pandas data series to check.
numeric (bool) – If True, treat as numeric and do close-to comparison.

Returns:

Error messages indicating the mismatch or empty if the series match.

Return type:

delete_columns(df, column_list)[source]¶

Delete the specified columns from a dataframe.

Parameters:

df (DataFrame) – Pandas dataframe from which to delete columns.
column_list (list) – List of candidate column names for deletion.

Notes

The deletion of columns is done in place.
This does not raise an error if df does not have a column in the list.

delete_rows_by_column(df, value, column_list=None)[source]¶

Delete rows where columns have this value.

Parameters:

df (DataFrame) – Pandas dataframe from which to delete rows.
value (str) – Specified value to indicate row should be deleted.
column_list (list) – List of columns to search for value.

Notes

All values are converted to string before testing.
Deletion is done in place.

get_eligible_values(values, values_included)[source]¶

Return a list of the items from values that are in values_included or None if no values_included.

Parameters:

values (list) – List of strings against which to test.
values_included (list) – List of items to be selected from values if they are present.

Returns:

list of selected values or None if values_included is empty or None.

Return type:

get_key_hash(key_tuple)[source]¶

Calculate a hash key for tuple of values.

Parameters:: key_tuple (tuple, list) – The key values in the correct order for lookup.
Returns:: A hash key for the tuple.
Return type:: int

get_new_dataframe(data)[source]¶

Get a new dataframe representing a tsv file.

Parameters:

data (DataFrame or str) – DataFrame or filename representing a tsv file.

Returns:

A dataframe containing the contents of the tsv file or if data was: a DataFrame to start with, a new copy of the DataFrame.

Return type:

DataFrame

Raises:

HedFileError –

A filename is given, and it cannot be read into a Dataframe.

get_row_hash(row, key_list)[source]¶

Get a hash key from key column values for row.

Parameters:

row (DataSeries)
key_list (list)

Returns:

Hash key constructed from the entries of row in the columns specified by key_list.

Return type:

Raises:

HedFileError –

If row doesn’t have all the columns in key_list HedFileError is raised.

get_value_dict(tsv_path, key_col='file_basename', value_col='sampling_rate')[source]¶

Get a dictionary of two columns of a dataframe.

Parameters:

tsv_path (str) – Path to a tsv file with a header row to be read into a DataFrame.
key_col (str) – Name of the column which should be the key.
value_col (str) – Name of the column which should be the value.

Returns:

Dictionary with key_col values as the keys and the corresponding value_col values as the values.

Return type:

Raises:

HedFileError – When tsv_path does not correspond to a file that can be read into a DataFrame.

make_info_dataframe(col_info, selected_col)[source]¶

Get a dataframe from selected columns.

Parameters:

col_info (dict) – Dictionary of dictionaries of column values and counts.
selected_col (str) – Name of the column used as top level key for col_info.

Returns:

A two-column dataframe with first column containing values from the: dictionary whose key is selected_col and whose second column are the corresponding counts. The returned value is None if selected_col is not a top-level key in col_info.

Return type:

dataframe

reorder_columns(data, col_order, skip_missing=True)[source]¶

Create a new dataframe with columns reordered.

Parameters:

data (DataFrame, str) – Dataframe or filename of dataframe whose columns are to be reordered.
col_order (list) – List of column names in desired order.
skip_missing (bool) – If true, col_order columns missing from data are skipped, otherwise error.

Returns:

A new reordered dataframe.

Return type:

DataFrame

Raises:

HedFileError – If col_order contains columns not in data and skip_missing is False.
If data corresponds to a filename from which a dataframe cannot be created. –

replace_na(df)[source]¶: Replace (in place) the n/a with np.nan taking care of categorical columns.

replace_values(df, values=None, replace_value='n/a', column_list=None)[source]¶

Replace string values in specified columns.

Parameters:

df (DataFrame) – Dataframe whose values will be replaced.
values (list, None) – List of strings to replace. If None, only empty strings are replaced.
replace_value (str) – String replacement value.
column_list (list, None) – List of columns in which to do replacement. If None all columns are processed.

Returns:

number of values replaced.

Return type:

int

separate_values(values, target_values)[source]¶

Get target values from the target_values list.

Parameters:

values (list) – List of values to be tested.
target_values – List of desired values.

File/IO utilities¶

Utilities for generating and handling file names.

check_filename(test_file, name_prefix=None, name_suffix=None, extensions=None)[source]¶

Return True if correct extension, suffix, and prefix.

Parameters:

test_file (str) – Path of filename to test.
name_prefix (list, str, None) – An optional name_prefix or list of prefixes to accept for the base filename.
name_suffix (list, str, None) – An optional name_suffix or list of suffixes to accept for the base file name.
extensions (list, str, None) – An optional extension or list of extensions to accept for the extensions.

Returns:

True if file has the appropriate format.

Return type:

bool

Notes

Everything is converted to lower case prior to testing so this test should be case-insensitive.
None indicates that all are accepted.

clean_filename(filename)[source]¶

Replace invalid characters with under-bars.

Parameters:: filename (str) – source filename.
Returns:: The filename with anything but alphanumeric, period, hyphens, and under-bars removed.
Return type:: str

extract_suffix_path(path, prefix_path)[source]¶

Return the suffix of path after prefix path has been removed.

Parameters:

path (str)
prefix_path (str)

Returns:

Suffix path.

Return type:

Notes

This function is useful for creating files within BIDS datasets.

get_allowed(value, allowed_values=None, starts_with=True)[source]¶

Return the portion of the value that matches a value in allowed_values or None if no match.

Parameters:

value (str) – value to be matched.
allowed_values (list, str, or None) – Values to match.
starts_with (bool) – If True match is done at beginning of string, otherwise the end.

Returns:

portion of value that matches the various allowed_values.

Return type:

Union[str,list]

Notes

match is done in lower case.

get_alphanumeric_path(pathname, replace_char='_')[source]¶

Replace sequences of non-alphanumeric characters in string (usually a path) with specified character.

Parameters:

pathname (str) – A string usually representing a pathname, but could be any string.
replace_char (str) – Replacement character(s).

Returns:

New string with characters replaced.

Return type:

get_basename(file_path)[source]¶

Return the base filename (without extension) for the given path.

Parameters:: file_path (str) – Path to a file.
Returns:: The filename stem, e.g. sub-01_task-rest_events for sub-01_task-rest_events.tsv.
Return type:: str

get_file_list(root_path, name_prefix=None, name_suffix=None, extensions=None, exclude_dirs=None)[source]¶

Return paths satisfying various conditions.

Parameters:

root_path (str) – Full path of the directory tree to be traversed (no ending slash).
name_prefix (list, str, None) – An optional prefix for the base filename.
name_suffix (list, str, None) – An optional suffix for the base filename.
extensions (list, None) – A list of extensions to be selected.
exclude_dirs (list, None) – A list of paths to be excluded.

Returns:

The full paths.

Return type:

Notes: Exclude directories are paths relative to the root path.

get_filtered_by_element(file_list, elements)[source]¶

Filter a file list by whether the base names have a substring matching any of the members of elements.

Parameters:

file_list (list) – List of file paths to be filtered.
elements (list) – List of strings to use as filename filters.

Returns:

The list only containing file paths whose filenames match a filter.

Return type: