Models

Core data models for working with HED data structures.

Core models

The fundamental data structures for HED annotations and tags.

HedString

class HedString(hed_string, hed_schema, def_dict=None, _contents=None)[source]

Bases: HedGroup

A HED string with its schema and definitions.

CLOSING_GROUP_CHARACTER = ')'
OPENING_GROUP_CHARACTER = '('
append(tag_or_group)

Add a tag or group to this group.

Parameters:

tag_or_group (HedTag or HedGroup) – The new object to add to this group.

casefold()

Convenience function, equivalent to str(self).casefold().

check_if_in_original(tag_or_group) bool

Check if the tag or group in original string.

Parameters:

tag_or_group (HedTag or HedGroup) – The HedTag or HedGroup to be looked for in this group.

Returns:

True if in this group.

Return type:

bool

copy() HedString[source]

Return a deep copy of this string.

Returns:

The copied group.

Return type:

HedString

expand_defs() HedString[source]

Replace def tags with def-expand tags.

This does very minimal validation.

Returns:

self

Return type:

HedString

find_def_tags(recursive=False, include_groups=3) list

Find def and def-expand tags.

Parameters:
  • recursive (bool) – If true, also check subgroups.

  • include_groups (int, 0, 1, 2, 3) – Options for return values. If 0: Return only def and def expand tags/. If 1: Return only def tags and def-expand groups. If 2: Return only groups containing defs, or def-expand groups. If 3 or any other value: Return all 3 as a tuple.

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_exact_tags(exact_tags, recursive=False, include_groups=1) list

Find tags that match exactly, including any extension or value.

Comparison property: HedTag.__eq__ which compares short_tag.casefold() (falling back to org_tag.casefold() for unrecognised tags). Rationale: callers pass a slash-path string such as "def/mydef" and need an exact full-path match — the extension/value is part of the identity ("Def/Foo" must not match "Def/Bar"). Because HedTag.__str__ returns short_tag when the tag is schema-identified, a tag written in long form in the source HED string (e.g. "Event/Sensory-event") will still be found by a short-form query ("Sensory-event"); the schema normalises them to the same short_tag. Unrecognised tags fall back to a case-insensitive comparison of the original text.

Parameters:
  • exact_tags (list of str or HedTag) – Tags to locate; each is compared via HedTag.__eq__, which accepts both str and HedTag operands.

  • recursive (bool) – If true, also check subgroups.

  • include_groups (0, 1 or 2) – If 0: Return only tags If 1: Return only groups If 2 or any other value: Return both

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_placeholder_tag() HedTag | None

Return a placeholder tag, if present in this group.

Returns:

The placeholder tag if found.

Return type:

Union[HedTag, None]

Notes

  • Assumes a valid HedString with no erroneous “#” characters.

find_tags(search_tags, recursive=False, include_groups=2) list

Find the base tags and their containing groups.

Comparison property: short_base_tag (schema short name without any extension or value). Rationale: callers pass bare tag names such as "Event" or "Def" and must match regardless of any extension or value the tag carries in the source string. Using short_base_tag strips the extension/value so "Def/MyDef" is found by searching for "Def".

Parameters:
  • search_tags (container) – A container of short_base_tags to locate.

  • recursive (bool) – If true, also check subgroups.

  • include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.

Returns:

The contents of the list depends on the value of include_groups.

Return type:

list

find_tags_with_term(term, recursive=False, include_groups=2) list

Find tags whose schema ancestry includes the given term.

Comparison property: tag_terms — a tuple of all path components in the tag’s long-form schema path, all casefolded (e.g. ("event", "sensory-event") for the Sensory-event tag). Rationale: this implements HED’s ancestor search — a bare query term such as "Event" must match not only the Event tag itself but also every descendant (Sensory-event, Agent-action, etc.) because those descendants inherit the Event parent. tag_terms encodes the full ancestry, so membership testing (term in tag.tag_terms) handles all descendants in O(k) time where k is the schema depth. This requires a schema-identified tag; unidentified tags have an empty tag_terms tuple and will not be found.

Parameters:
  • term (str) – A single term to search for (compared case-insensitively).

  • recursive (bool) – If true, recursively check subgroups.

  • include_groups (0, 1 or 2) – Controls return values If 0: Return only tags. If 1: Return only groups. If 2 or any other value: Return both.

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_top_level_tags(anchor_tags, include_groups=2) list[source]

Find top level groups with an anchor tag.

A max of 1 tag located per top level group.

Parameters:
  • anchor_tags (container) – A list/set/etc. of short_base_tags to find groups by.

  • include_groups (TopTagReturnType or int) – Controls what is returned. Use TopTagReturnType constants for clarity. TAGS (0): return only anchor tags. GROUPS (1): return only groups. BOTH (2, default): return (tag, group) pairs.

Returns:

The returned result depends on include_groups.

Return type:

list

find_wildcard_tags(search_tags, recursive=False, include_groups=2) list

Find tags whose short form starts with a given prefix (implicit trailing wildcard).

Comparison property: short_tag (schema short name including any extension or value). Rationale: the query is a prefix such as "Def/" or "Eve"; the match must cover the extension/value as well so that "Def/MyDef" is found by "Def/" but not by an unrelated tag that merely shares the same base. short_tag is used (not short_base_tag) so that value-bearing tags like "Duration/3 s" can be matched by a prefix query such as "Duration/". Note: prefix matching is anchored to the start of short_tag only, so "Eve" finds "Event" but not "Sensory-event".

Parameters:
  • search_tags (container) – A container of the starts of short tags to search.

  • recursive (bool) – If True, also check subgroups.

  • include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.

Returns:

The contents of the list depends on the value of include_groups.

Return type:

list

static from_hed_strings(hed_strings) HedString[source]

Create a new HedString from a list of HedStrings.

Parameters:

hed_strings (list or None) – A list of HedString objects to combine. This takes ownership of their children.

Returns:

The newly combined HedString.

Return type:

HedString

get_all_groups(also_return_depth=False) list

Return HedGroups, including descendants and self.

Parameters:

also_return_depth (bool) – If True, yield tuples (group, depth) rather than just groups.

Returns:

The list of all HedGroups in this group, including descendants and self.

Return type:

list

get_all_tags() list

Return HedTags, including descendants.

Returns:

A list of all the tags in this group including descendants.

Return type:

list

get_as_form(tag_attribute) str

Get the string corresponding to the specified form.

Parameters:

tag_attribute (str) – The hed_tag property to use to construct the string (usually short_tag or long_tag).

Returns:

The constructed string after transformation.

Return type:

str

get_as_indented(tag_attribute='short_tag') str

Return the string as a multiline indented format.

Parameters:

tag_attribute (str) – The hed_tag property to use to construct the string (usually short_tag or long_tag).

Returns:

The indented string.

Return type:

str

get_as_long() str

Return this HedGroup as a long tag string.

Returns:

The group as a string with all tags as long tags.

Return type:

str

get_as_original() str[source]

Return the original form of this string.

Returns:

The string with all the tags in their original form.

Return type:

str

Notes

Potentially with some extraneous spaces removed on returned string.

get_as_short() str

Return this HedGroup as a short tag string.

Returns:

The group as a string with all tags as short tags.

Return type:

str

get_first_group() HedGroup

Return the first group in this HED string or group.

Useful for things like Def-expand where they only have a single group.

Returns:

The first group.

Return type:

HedGroup

Raises:

ValueError – If there are no groups.

get_original_hed_string() str

Get the original HED string.

Returns:

The original string with no modification.

Return type:

str

groups() list

Return the direct child groups of this group.

Returns:

All groups directly in this group, filtering out HedTag children.

Return type:

list

property is_group

Always False since the underlying string is not a group with parentheses.

lower()

Convenience function, equivalent to str(self).lower().

remove(items_to_remove: Iterable[HedTag | HedGroup])

Remove any tags/groups in items_to_remove.

Parameters:

items_to_remove (list) – List of HedGroups and/or HedTags to remove by identity.

Notes

  • Any groups that become empty will also be pruned.

  • If you pass a child and parent group, the child will also be removed from the parent.

remove_definitions()[source]

Remove definition tags and groups from this string.

This does not validate definitions and will blindly removing invalid ones as well.

remove_refs()[source]

Remove any refs(tags contained entirely inside curly braces) from the string.

This does NOT validate the contents of the curly braces. This is only relevant when directly editing sidecar strings. Tools will naturally ignore these.

static replace(item_to_replace, new_contents)

Replace an existing tag or group.

Note: This is a static method that relies on the parent attribute of item_to_replace.

Parameters:
  • item_to_replace (HedTag or HedGroup) – The item to replace must exist or this will raise an error.

  • new_contents (HedTag or HedGroup) – Replacement contents.

Raises:
shrink_defs() HedString[source]

Replace def-expand tags with def tags.

This does not validate them and will blindly shrink invalid ones as well.

Returns:

self

Return type:

HedString

sort()

Sort the tags and groups in this HedString in a consistent order.

sorted() HedGroup

Return a sorted copy of this HED group

Returns:

The sorted copy.

Return type:

HedGroup

property span: tuple[int, int]

Return the source span.

Returns:

start and end index of the group (including parentheses) from the source string.

Return type:

tuple[int, int]

static split_hed_string(hed_string) list[tuple[bool, tuple[int, int]]][source]

Split a HED string into delimiters and tags.

Parameters:

hed_string (str) – The HED string to split.

Returns:

A list of tuples where each tuple is (is_hed_tag, (start_pos, end_pos)).

Return type:

list[tuple[bool, tuple[int, int]]]

Notes

  • The tuple format is as follows
    • is_hed_tag (bool): A (possible) HED tag if True, delimiter if not.

    • start_pos (int): Index of start of string in hed_string.

    • end_pos (int): Index of end of string in hed_string.

  • This function does not validate tags or delimiters in any form.

static split_into_groups(hed_string, hed_schema, def_dict=None) list[source]

Split the HED string into a parse tree.

Parameters:
  • hed_string (str) – A HED string consisting of tags and tag groups to be processed.

  • hed_schema (HedSchema) – HED schema to use to identify tags.

  • def_dict (DefinitionDict) – The definitions to identify.

Returns:

A list of HedTag and/or HedGroup.

Return type:

list

Raises:

ValueError – If the string is significantly malformed, such as mismatched parentheses.

Notes

  • The parse tree consists of tag groups, tags, and delimiters.

tags() list

Return the direct child tags of this group.

Returns:

All tags directly in this group, filtering out HedGroup children.

Return type:

list

validate(allow_placeholders=True, error_handler=None) list[dict][source]

Validate the string using the schema.

Parameters:
  • allow_placeholders (bool) – Allow placeholders in the string.

  • error_handler (ErrorHandler or None) – The error handler to use, creates a default one if none passed.

Returns:

A list of issues for HED string.

Return type:

list[dict]

HedTag

class HedTag(hed_string, hed_schema, span=None, def_dict=None)[source]

Bases: object

A single HED tag.

Notes

  • HedTag is a smart class in that it keeps track of its original value and positioning as well as pointers to the relevant HED schema information, if relevant.

property attributes: dict

Return a dict of all the attributes this tag has or empty dict if this is not a value tag.

Returns:

A dict of attributes this tag has.

Return type:

dict

Notes

  • Returns empty dict if this is not a unit class tag.

  • The dictionary has unit name as the key and HedSchemaEntry as value.

property base_tag: str

Long form without value or extension.

Returns:

The long form of the tag, without value or extension.

Return type:

str

base_tag_has_attribute(tag_attribute) bool[source]

Check to see if the tag has a specific attribute.

This is primarily used to check for things like TopLevelTag on Definitions and similar.

Parameters:

tag_attribute (str) – A tag attribute.

Returns:

True if the tag has the specified attribute. False, if otherwise.

Return type:

bool

casefold() str[source]

Convenience function, equivalent to str(self).casefold().

copy() HedTag[source]

Return a deep copy of this tag.

Returns:

The copied group.

Return type:

HedTag

property default_unit

Get the default unit class unit for this tag.

Only a tag with a single unit class can have default units.

Returns:

the default unit entry for this tag, or None

Return type:

unit(UnitEntry or None)

property expandable: 'HedGroup' | 'HedTag' | None

Return what this expands to.

This is primarily used for Def/Def-expand tags at present.

Lazily set the first time it’s called.

Returns:

Returns the expanded form of this tag.

Return type:

Union[HedGroup,HedTag,None]

property expanded: bool

Return if this is currently expanded or not.

Will always be False unless expandable is set. This is primarily used for Def/Def-expand tags at present.

Returns:

True if this is currently expanded.

Return type:

bool

property extension: str

Get the extension or value of tag.

Generally this is just the portion after the last slash. Returns an empty string if no extension or value.

Returns:

The tag name.

Return type:

str

Notes

  • This tag must have been computed first.

get_normalized_str()[source]

Return a case-folded, canonical string used for hashing and equality comparison.

Uses the schema short tag name when available; falls back to the raw tag text.

Returns:

Lowercase canonical form of the tag including any extension or value.

Return type:

str

get_stripped_unit_value(extension_text) tuple[str | None, str | None][source]

Return the extension divided into value and units, if the units are valid.

Parameters:

extension_text (str) – The text to split, in case it’s a portion of a tag.

Returns:

The extension portion with the units removed or None if invalid units. Union[str, None]: The units or None if no units of the right unit class are found.

Return type:

Union[str, None]

Examples

‘Duration/3 ms’ will return (‘3’, ‘ms’)

get_tag_unit_class_units() list[source]

Get the unit class units associated with a particular tag.

Returns:

A list containing the unit class units associated with a particular tag or an empty list.

Return type:

list

has_attribute(attribute) bool[source]

Return True if this is an attribute this tag has.

Parameters:

attribute (str) – Name of the attribute.

Returns:

True if this tag has the attribute.

Return type:

bool

is_basic_tag() bool[source]

Return True if a known tag with no extension or value.

Returns:

True if this is a known tag without extension or value.

Return type:

bool

is_column_ref() bool[source]

Return if this tag is a column reference from a sidecar.

You should only see these if you are directly accessing sidecar strings, tools should remove them otherwise.

Returns:

True if this is a column ref.

Return type:

bool

is_placeholder() bool[source]

Returns if this tag has a placeholder in it.

Returns:

True if it has a placeholder.

Return type:

bool

is_takes_value_tag() bool[source]

Return True if this is a takes value tag.

Returns:

True if this is a takes value tag.

Return type:

bool

is_unit_class_tag() bool[source]

Return True if this is a unit class tag.

Returns:

True if this is a unit class tag.

Return type:

bool

is_value_class_tag() bool[source]

Return True if this is a value class tag.

Returns:

True if this is a tag with a value class.

Return type:

bool

property long_tag: str

Long form including value or extension.

Returns:

The long form of this tag.

Return type:

str

lower() str[source]

Convenience function, equivalent to str(self).lower().

property org_base_tag: str

Original form without value or extension.

Returns:

The original form of the tag, without value or extension.

Return type:

str

Notes

  • Warning: This could be empty if the original tag had a name_prefix prepended. e.g. a column where “Label/” is prepended, thus the column value has zero base portion.

property org_tag: str

Return the original unmodified tag.

Returns:

The original unmodified tag.

Return type:

str

replace_placeholder(placeholder_value)[source]

If tag has a placeholder character(#), replace with value.

Parameters:

placeholder_value (str) – Value to replace placeholder with.

property schema_namespace: str

Library namespace for this tag if one exists.

Returns:

The library namespace, including the colon.

Return type:

str

property short_base_tag: str

Short form without value or extension.

Returns:

The short non-extension port of a tag.

Return type:

str

Notes

  • ParentNodes/Def/DefName would return just “Def”.

property short_tag: str

Short form including value or extension.

Returns:

The short form of the tag, including value or extension.

Return type:

str

property tag: str

Returns the tag or the original tag if no user form set.

Returns:

The custom set user form of the tag.

Return type:

str

tag_exists_in_schema() bool[source]

Return whether the schema entry for this tag exists.

Returns:

True if this tag exists.

Return type:

bool

Notes

  • This does NOT assure this is a valid tag.

tag_modified() bool[source]

Return True if tag has been modified from original.

Returns:

Return True if the tag is modified.

Return type:

bool

Notes

  • Modifications can include adding a column name_prefix.

property unit_classes: dict

Return a dict of all the unit classes this tag accepts.

Returns:

A dict of unit classes this tag accepts.

Return type:

dict

Notes

  • Returns empty dict if this is not a unit class tag.

  • The dictionary has unit name as the key and HedSchemaEntry as value.

value_as_default_unit() float | None[source]

Return the value converted to default units if possible or None if invalid.

Returns:

The extension value in default units. If no default units it assumes that the extension value is in default units.

Return type:

Union[float, None]

Examples

‘Duration/300 ms’ will return .3

property value_classes: dict

Return a dict of all the value classes this tag accepts.

Returns:

A dictionary of HedSchemaEntry value classes this tag accepts.

Return type:

dict

Notes

  • Returns empty dict if this is not a value class.

  • The dictionary has unit name as the key and HedSchemaEntry as value.

HedGroup

class HedGroup(hed_string='', startpos=None, endpos=None, contents=None)[source]

Bases: object

A single parenthesized HED string.

append(tag_or_group)[source]

Add a tag or group to this group.

Parameters:

tag_or_group (HedTag or HedGroup) – The new object to add to this group.

casefold()[source]

Convenience function, equivalent to str(self).casefold().

check_if_in_original(tag_or_group) bool[source]

Check if the tag or group in original string.

Parameters:

tag_or_group (HedTag or HedGroup) – The HedTag or HedGroup to be looked for in this group.

Returns:

True if in this group.

Return type:

bool

copy() HedGroup[source]

Return a deep copy of this group.

Returns:

The copied group.

Return type:

HedGroup

find_def_tags(recursive=False, include_groups=3) list[source]

Find def and def-expand tags.

Parameters:
  • recursive (bool) – If true, also check subgroups.

  • include_groups (int, 0, 1, 2, 3) – Options for return values. If 0: Return only def and def expand tags/. If 1: Return only def tags and def-expand groups. If 2: Return only groups containing defs, or def-expand groups. If 3 or any other value: Return all 3 as a tuple.

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_exact_tags(exact_tags, recursive=False, include_groups=1) list[source]

Find tags that match exactly, including any extension or value.

Comparison property: HedTag.__eq__ which compares short_tag.casefold() (falling back to org_tag.casefold() for unrecognised tags). Rationale: callers pass a slash-path string such as "def/mydef" and need an exact full-path match — the extension/value is part of the identity ("Def/Foo" must not match "Def/Bar"). Because HedTag.__str__ returns short_tag when the tag is schema-identified, a tag written in long form in the source HED string (e.g. "Event/Sensory-event") will still be found by a short-form query ("Sensory-event"); the schema normalises them to the same short_tag. Unrecognised tags fall back to a case-insensitive comparison of the original text.

Parameters:
  • exact_tags (list of str or HedTag) – Tags to locate; each is compared via HedTag.__eq__, which accepts both str and HedTag operands.

  • recursive (bool) – If true, also check subgroups.

  • include_groups (0, 1 or 2) – If 0: Return only tags If 1: Return only groups If 2 or any other value: Return both

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_placeholder_tag() HedTag | None[source]

Return a placeholder tag, if present in this group.

Returns:

The placeholder tag if found.

Return type:

Union[HedTag, None]

Notes

  • Assumes a valid HedString with no erroneous “#” characters.

find_tags(search_tags, recursive=False, include_groups=2) list[source]

Find the base tags and their containing groups.

Comparison property: short_base_tag (schema short name without any extension or value). Rationale: callers pass bare tag names such as "Event" or "Def" and must match regardless of any extension or value the tag carries in the source string. Using short_base_tag strips the extension/value so "Def/MyDef" is found by searching for "Def".

Parameters:
  • search_tags (container) – A container of short_base_tags to locate.

  • recursive (bool) – If true, also check subgroups.

  • include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.

Returns:

The contents of the list depends on the value of include_groups.

Return type:

list

find_tags_with_term(term, recursive=False, include_groups=2) list[source]

Find tags whose schema ancestry includes the given term.

Comparison property: tag_terms — a tuple of all path components in the tag’s long-form schema path, all casefolded (e.g. ("event", "sensory-event") for the Sensory-event tag). Rationale: this implements HED’s ancestor search — a bare query term such as "Event" must match not only the Event tag itself but also every descendant (Sensory-event, Agent-action, etc.) because those descendants inherit the Event parent. tag_terms encodes the full ancestry, so membership testing (term in tag.tag_terms) handles all descendants in O(k) time where k is the schema depth. This requires a schema-identified tag; unidentified tags have an empty tag_terms tuple and will not be found.

Parameters:
  • term (str) – A single term to search for (compared case-insensitively).

  • recursive (bool) – If true, recursively check subgroups.

  • include_groups (0, 1 or 2) – Controls return values If 0: Return only tags. If 1: Return only groups. If 2 or any other value: Return both.

Returns:

A list of tuples. The contents depend on the values of the include_group.

Return type:

list

find_wildcard_tags(search_tags, recursive=False, include_groups=2) list[source]

Find tags whose short form starts with a given prefix (implicit trailing wildcard).

Comparison property: short_tag (schema short name including any extension or value). Rationale: the query is a prefix such as "Def/" or "Eve"; the match must cover the extension/value as well so that "Def/MyDef" is found by "Def/" but not by an unrelated tag that merely shares the same base. short_tag is used (not short_base_tag) so that value-bearing tags like "Duration/3 s" can be matched by a prefix query such as "Duration/". Note: prefix matching is anchored to the start of short_tag only, so "Eve" finds "Event" but not "Sensory-event".

Parameters:
  • search_tags (container) – A container of the starts of short tags to search.

  • recursive (bool) – If True, also check subgroups.

  • include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.

Returns:

The contents of the list depends on the value of include_groups.

Return type:

list

get_all_groups(also_return_depth=False) list[source]

Return HedGroups, including descendants and self.

Parameters:

also_return_depth (bool) – If True, yield tuples (group, depth) rather than just groups.

Returns:

The list of all HedGroups in this group, including descendants and self.

Return type:

list

get_all_tags() list[source]

Return HedTags, including descendants.

Returns:

A list of all the tags in this group including descendants.

Return type:

list

get_as_form(tag_attribute) str[source]

Get the string corresponding to the specified form.

Parameters:

tag_attribute (str) – The hed_tag property to use to construct the string (usually short_tag or long_tag).

Returns:

The constructed string after transformation.

Return type:

str

get_as_indented(tag_attribute='short_tag') str[source]

Return the string as a multiline indented format.

Parameters:

tag_attribute (str) – The hed_tag property to use to construct the string (usually short_tag or long_tag).

Returns:

The indented string.

Return type:

str

get_as_long() str[source]

Return this HedGroup as a long tag string.

Returns:

The group as a string with all tags as long tags.

Return type:

str

get_as_short() str[source]

Return this HedGroup as a short tag string.

Returns:

The group as a string with all tags as short tags.

Return type:

str

get_first_group() HedGroup[source]

Return the first group in this HED string or group.

Useful for things like Def-expand where they only have a single group.

Returns:

The first group.

Return type:

HedGroup

Raises:

ValueError – If there are no groups.

get_original_hed_string() str[source]

Get the original HED string.

Returns:

The original string with no modification.

Return type:

str

groups() list[source]

Return the direct child groups of this group.

Returns:

All groups directly in this group, filtering out HedTag children.

Return type:

list

property is_group

True if this is a parenthesized group.

lower()[source]

Convenience function, equivalent to str(self).lower().

remove(items_to_remove: Iterable[HedTag | HedGroup])[source]

Remove any tags/groups in items_to_remove.

Parameters:

items_to_remove (list) – List of HedGroups and/or HedTags to remove by identity.

Notes

  • Any groups that become empty will also be pruned.

  • If you pass a child and parent group, the child will also be removed from the parent.

static replace(item_to_replace, new_contents)[source]

Replace an existing tag or group.

Note: This is a static method that relies on the parent attribute of item_to_replace.

Parameters:
  • item_to_replace (HedTag or HedGroup) – The item to replace must exist or this will raise an error.

  • new_contents (HedTag or HedGroup) – Replacement contents.

Raises:
sort()[source]

Sort the tags and groups in this HedString in a consistent order.

sorted() HedGroup[source]

Return a sorted copy of this HED group

Returns:

The sorted copy.

Return type:

HedGroup

property span: tuple[int, int]

Return the source span.

Returns:

start and end index of the group (including parentheses) from the source string.

Return type:

tuple[int, int]

tags() list[source]

Return the direct child tags of this group.

Returns:

All tags directly in this group, filtering out HedGroup children.

Return type:

list

DefinitionDict

class DefinitionDict(def_dicts=None, hed_schema=None)[source]

Bases: object

Gathers definitions from a single source.

add_definitions(defs, hed_schema=None)[source]

Add definitions from dict(s) or strings(s) to this dict.

Parameters:
  • defs (list, DefinitionDict, dict, or str) – DefinitionDict or list of DefinitionDicts/strings/dicts whose definitions should be added.

  • hed_schema (HedSchema or None) – Required if passing strings or lists of strings, unused otherwise.

Note - dict form expects DefinitionEntries in the same form as a DefinitionDict

Note - str or list of strings will parse the strings using the hed_schema. Note - You can mix and match types, eg [DefinitionDict, str, list of str] would be valid input.

Raises:

TypeError – Bad type passed as defs.

check_for_definitions(hed_string_obj, error_handler=None) list[dict][source]

Check string for definition tags, adding them to self.

Parameters:
  • hed_string_obj (HedString) – A single HED string to gather definitions from.

  • error_handler (ErrorHandler or None) – Error context used to identify where definitions are found.

Returns:

List of issues encountered in checking for definitions. Each issue is a dictionary.

Return type:

list[dict]

get(def_name) DefinitionEntry | None[source]

Get the definition entry for the definition name.

Not case-sensitive

Parameters:

def_name (str) – Name of the definition to retrieve.

Returns:

Definition entry for the requested definition.

Return type:

Union[DefinitionEntry, None]

static get_as_strings(def_dict) dict[str, str][source]

Convert the entries to strings of the contents

Parameters:

def_dict (dict) – A dict of definitions

Returns:

Definition name and contents

Return type:

dict[str,str]

get_definition_entry(def_tag)[source]

Get the entry for a given def tag.

Does not validate at all.

Parameters:

def_tag (HedTag) – Source HED tag that may be a Def or Def-expand tag.

Returns:

The definition entry if it exists

Return type:

def_entry(DefinitionEntry or None)

property issues

Return issues about duplicate definitions.

items()[source]

Return the dictionary of definitions.

Alias for .defs.items()

Returns:

DefinitionEntry}): A list of definitions.

Return type:

def_entries({str

DefinitionEntry

class DefinitionEntry(name, contents, takes_value, source_context)[source]

Bases: object

Stores the resolved contents of a single HED Definition.

A DefinitionEntry is created when a Definition/ tag group is parsed and stored in a DefinitionDict. It captures:

  • name — the lower-cased label portion (without Definition/).

  • contents — the inner HedGroup of the definition (None if the definition body is empty).

  • takes_value — whether exactly one tag inside contains a # placeholder (i.e. the definition expects a run-time value via Def/name/value).

  • source_context — the error-context stack captured at parse time, used to produce precise error messages when the definition is later expanded.

Use this class directly when you need to:

  • Iterate over a DefinitionDict and inspect individual definition bodies or their placeholder status.

  • Build tooling that expands, serialises, or analyses HED definitions programmatically.

Most users never need this classget_def_entry() and expand_def_tag() handle the common workflows.

get_definition(replace_tag, placeholder_value=None, return_copy_of_tag=False) HedGroup | None[source]

Return a copy of the definition with the tag expanded and the placeholder plugged in.

Returns None if placeholder_value passed when it doesn’t take value, or vice versa.

Parameters:
  • replace_tag (HedTag) – The def HED tag to replace with an expanded version.

  • placeholder_value (str or None) – If present and required, will replace any pound signs in the definition contents.

  • return_copy_of_tag (bool) – Set to True for validation.

Returns:

The contents of this definition(including the def tag itself).

Return type:

Union[HedGroup, None]

Raises:

ValueError – Something internally went wrong with finding the placeholder tag. This should not be possible.

DefExpandGatherer

class DefExpandGatherer(hed_schema, known_defs=None, ambiguous_defs=None, errors=None)[source]

Bases: object

Gather definitions from a series of def-expands, including possibly ambiguous ones.

Notes: The def-dict contains the known definitions. After validation, it also contains resolved definitions. The errors contain the definition contents that are known to be in error. The ambiguous_defs contain the definitions that cannot be resolved based on the data.

process_def_expands(hed_strings, known_defs=None) tuple[DefinitionDict, dict, dict][source]

Process the HED strings containing def-expand tags.

Parameters:
  • hed_strings (pd.Series or list) – A Pandas Series or list of HED strings to be processed.

  • known_defs (dict, optional) – A dictionary of known definitions to be added.

Returns:

A tuple containing the DefinitionDict, ambiguous definitions, and a

dictionary of error lists keyed by definition name

Return type:

tuple [DefinitionDict, dict, dict]

Constants

Enumerations and named constants used across the models layer.

DefTagNames

class DefTagNames[source]

Bases: object

Source names for definitions, def labels, and expanded labels.

ALL_TIME_KEYS = {'Delay', 'Duration', 'Inset', 'Offset', 'Onset'}
DEFINITION_KEY = 'Definition'
DEF_EXPAND_KEY = 'Def-expand'
DEF_KEY = 'Def'
DELAY_KEY = 'Delay'
DURATION_KEY = 'Duration'
DURATION_KEYS = {'Delay', 'Duration'}
INSET_KEY = 'Inset'
OFFSET_KEY = 'Offset'
ONSET_KEY = 'Onset'
TEMPORAL_KEYS = {'Inset', 'Offset', 'Onset'}
TIMELINE_KEYS = {'Delay', 'Inset', 'Offset', 'Onset'}

TopTagReturnType

class TopTagReturnType(*values)[source]

Bases: IntEnum

Return-type selector for find_top_level_tags().

Pass one of these constants as the include_groups argument to control whether the method returns anchor tags, containing groups, or (tag, group) pairs.

TAGS

Return only the anchor HedTag objects.

GROUPS

Return only the HedGroup objects that contain each anchor tag.

BOTH

Return (tag, group) tuples pairing each anchor tag with its containing group.

BOTH = 2
GROUPS = 1
TAGS = 0
as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)
bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3
bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6
conjugate()

Returns self, the complex conjugate of any int.

denominator

the denominator of a rational number in lowest terms

classmethod from_bytes(bytes, byteorder='big', *, signed=False)

Return the integer represented by the given array of bytes.

bytes

Holds the array of bytes to convert. The argument must either support the buffer protocol or be an iterable object producing bytes. Bytes and bytearray are examples of built-in objects that support the buffer protocol.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.

signed

Indicates whether two’s complement is used to represent the integer.

imag

the imaginary part of a complex number

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

numerator

the numerator of a rational number in lowest terms

real

the real part of a complex number

to_bytes(length=1, byteorder='big', *, signed=False)

Return an array of bytes representing an integer.

length

Length of bytes object to use. An OverflowError is raised if the integer is not representable with the given number of bytes. Default is length 1.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.

signed

Determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised.

Input models

Models for handling different types of input data.

BaseInput

class BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]

Bases: object

Superclass representing a basic columnar file.

EXCEL_EXTENSION = ['.xlsx']
TEXT_EXTENSION = ['.tsv', '.txt']
assemble(mapper=None, skip_curly_braces=False) DataFrame[source]

Assembles the HED strings.

Parameters:
  • mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.

  • skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

pd.Dataframe

column_metadata() dict[int, ColumnMetadata][source]

Return the metadata for each column.

Returns:

Number/ColumnMetadata pairs.

Return type:

dict[int, ColumnMetadata]

property columns: list[str]

Returns a list of the column names.

Empty if no column names.

Returns:

The column names.

Return type:

list

static combine_dataframe(dataframe) Series[source]
Combine all columns in the given dataframe into a single HED string series,

skipping empty columns and columns with empty strings.

Parameters:

dataframe (pd.Dataframe) – The dataframe to combine

Returns:

The assembled series.

Return type:

pd.Series

convert_to_form(hed_schema, tag_form)[source]

Convert all tags in underlying dataframe to the specified form.

Parameters:
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

convert_to_long(hed_schema)[source]

Convert all tags in underlying dataframe to long form.

Parameters:

hed_schema (HedSchema or None) – The schema to use to convert tags.

convert_to_short(hed_schema)[source]

Convert all tags in underlying dataframe to short form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.

property dataframe

The underlying dataframe.

property dataframe_a: DataFrame

Return the assembled dataframe Probably a placeholder name.

Returns:

the assembled dataframe

Return type:

pd.Dataframe

expand_defs(hed_schema, def_dict)[source]

Shrinks any def-expand found in the underlying dataframe.

Parameters:
  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

get_column_refs() list[source]

Return a list of column refs for this file.

Default implementation returns empty list.

Returns:

A list of unique column refs found.

Return type:

list

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]

Return the definition dict for this file.

Note: Baseclass implementation returns just extra_def_dicts.

Parameters:
  • hed_schema (HedSchema) – Identifies tags to find definitions(if needed).

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

get_worksheet(worksheet_name=None) Workbook | None[source]

Get the requested worksheet.

Parameters:

worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.

Returns:

The workbook request.

Return type:

Union[openpyxl.workbook.Workbook, None]

Notes

If None, returns the first worksheet.

Raises:

KeyError – If the specified worksheet name does not exist.

property has_column_names: bool

True if dataframe has column names.

property loaded_workbook

The underlying loaded workbooks.

property name: str

Name of the data.

property needs_sorting: bool

Return True if this both has an onset column, and it needs sorting.

property onsets

Return the onset column if it exists.

reset_mapper(new_mapper)[source]

Set mapper to a different view of the file.

Parameters:

new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

property series_a: Series

Return the assembled dataframe as a series.

Returns:

the assembled dataframe with columns merged.

Return type:

pd.Series

property series_filtered: Series | None

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:

the assembled dataframe with columns merged, and the rows filtered together.

Return type:

Union[pd.Series, None]

set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')[source]

Replace the specified cell with transformed text.

Parameters:
  • row_number (int) – The row number of the spreadsheet to set.

  • column_number (int) – The column number of the spreadsheet to set.

  • new_string_obj (HedString) – Object with text to put in the given cell.

  • tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc.)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:
  • ValueError – If there is not a loaded dataframe.

  • KeyError – If the indicated row/column does not exist.

  • AttributeError – If the indicated tag_form is not an attribute of HedTag.

shrink_defs(hed_schema)[source]

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.

to_csv(file=None) str | None[source]

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

Union[str, None]

Raises:

OSError – If the file cannot be opened.

to_excel(file)[source]

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:
  • ValueError – If empty file object was passed.

  • OSError – If the file cannot be opened.

validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:
  • hed_schema (HedSchema) – The schema to use for validation.

  • extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.

  • name (str) – The name to report errors from this file as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

list[dict]

property worksheet_name

The worksheet name.

Sidecar

class Sidecar(files, name=None)[source]

Bases: object

Contents of a JSON file or JSON files.

property all_hed_columns: list[str]

Return all columns that are HED compatible.

Returns:

A list of all valid HED columns by name.

Return type:

list

property column_data

Generate the ColumnMetadata for this sidecar.

Returns:

ColumnMetadata}): The column metadata defined by this sidecar.

Return type:

dict({str

property def_dict: DefinitionDict

Definitions from this sidecar.

Generally you should instead call get_def_dict to get the relevant definitions.

Returns:

The definitions for this sidecar.

Return type:

DefinitionDict

get_as_json_string() str[source]

Return this sidecar’s column metadata as a string.

Returns:

The json string representing this sidecar.

Return type:

str

get_column_refs() list[str][source]

Returns a list of column refs found in this sidecar.

This does not validate

Returns:

A list of unique column refs found.

Return type:

list[str]

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]

Return the definition dict for this sidecar.

Parameters:
  • hed_schema (HedSchema) – Identifies tags to find definitions.

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

load_sidecar_file(file)[source]

Load column metadata from a given json file.

Parameters:

file (str or FileLike) – If a string, this is a filename. Otherwise, it will be parsed as a file-like.

Raises:

HedFileError – If the file was not found or could not be parsed into JSON.

load_sidecar_files(files)[source]

Load json from a given file or list.

Parameters:

files (str or FileLike or list) – A string or file-like object representing a JSON file, or a list of such.

Raises:

HedFileError – If the file was not found or could not be parsed into JSON.

save_as_json(save_filename)[source]

Save column metadata to a JSON file.

Parameters:

save_filename (str) – Path to save file.

validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]

Create a SidecarValidator and validate this sidecar with the schema.

Parameters:
  • hed_schema (HedSchema) – Input data to be validated.

  • extra_def_dicts (list or DefinitionDict) – Extra def dicts in addition to sidecar.

  • name (str) – The name to report this sidecar as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues associated with each level in the HED string.

Return type:

list[dict]

TabularInput

class TabularInput(file=None, sidecar=None, name=None)[source]

Bases: BaseInput

A BIDS tabular file with sidecar.

EXCEL_EXTENSION = ['.xlsx']
HED_COLUMN_NAME = 'HED'
TEXT_EXTENSION = ['.tsv', '.txt']
assemble(mapper=None, skip_curly_braces=False) DataFrame

Assembles the HED strings.

Parameters:
  • mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.

  • skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

pd.Dataframe

column_metadata() dict[int, ColumnMetadata]

Return the metadata for each column.

Returns:

Number/ColumnMetadata pairs.

Return type:

dict[int, ColumnMetadata]

property columns: list[str]

Returns a list of the column names.

Empty if no column names.

Returns:

The column names.

Return type:

list

static combine_dataframe(dataframe) Series
Combine all columns in the given dataframe into a single HED string series,

skipping empty columns and columns with empty strings.

Parameters:

dataframe (pd.Dataframe) – The dataframe to combine

Returns:

The assembled series.

Return type:

pd.Series

convert_to_form(hed_schema, tag_form)

Convert all tags in underlying dataframe to the specified form.

Parameters:
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

convert_to_long(hed_schema)

Convert all tags in underlying dataframe to long form.

Parameters:

hed_schema (HedSchema or None) – The schema to use to convert tags.

convert_to_short(hed_schema)

Convert all tags in underlying dataframe to short form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.

property dataframe

The underlying dataframe.

property dataframe_a: DataFrame

Return the assembled dataframe Probably a placeholder name.

Returns:

the assembled dataframe

Return type:

pd.Dataframe

expand_defs(hed_schema, def_dict)

Shrinks any def-expand found in the underlying dataframe.

Parameters:
  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

get_column_refs() list[str][source]

Return a list of column refs for this file.

Default implementation returns none.

Returns:

A list of unique column refs found.

Return type:

list[str]

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]

Return the definition dict for this sidecar.

Parameters:
  • hed_schema (HedSchema) – Used to identify tags to find definitions.

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

get_sidecar() Sidecar | None[source]

Return the sidecar associated with this TabularInput.

get_worksheet(worksheet_name=None) Workbook | None

Get the requested worksheet.

Parameters:

worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.

Returns:

The workbook request.

Return type:

Union[openpyxl.workbook.Workbook, None]

Notes

If None, returns the first worksheet.

Raises:

KeyError – If the specified worksheet name does not exist.

property has_column_names: bool

True if dataframe has column names.

property loaded_workbook

The underlying loaded workbooks.

property name: str

Name of the data.

property needs_sorting: bool

Return True if this both has an onset column, and it needs sorting.

property onsets

Return the onset column if it exists.

reset_column_mapper(sidecar=None)[source]

Change the sidecars and settings.

Parameters:

sidecar (str or [str] or Sidecar or [Sidecar]) – A list of json filenames to pull sidecar info from.

reset_mapper(new_mapper)

Set mapper to a different view of the file.

Parameters:

new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

property series_a: Series

Return the assembled dataframe as a series.

Returns:

the assembled dataframe with columns merged.

Return type:

pd.Series

property series_filtered: Series | None

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:

the assembled dataframe with columns merged, and the rows filtered together.

Return type:

Union[pd.Series, None]

set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')

Replace the specified cell with transformed text.

Parameters:
  • row_number (int) – The row number of the spreadsheet to set.

  • column_number (int) – The column number of the spreadsheet to set.

  • new_string_obj (HedString) – Object with text to put in the given cell.

  • tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc.)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:
  • ValueError – If there is not a loaded dataframe.

  • KeyError – If the indicated row/column does not exist.

  • AttributeError – If the indicated tag_form is not an attribute of HedTag.

shrink_defs(hed_schema)

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.

to_csv(file=None) str | None

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

Union[str, None]

Raises:

OSError – If the file cannot be opened.

to_excel(file)

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:
  • ValueError – If empty file object was passed.

  • OSError – If the file cannot be opened.

validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict]

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:
  • hed_schema (HedSchema) – The schema to use for validation.

  • extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.

  • name (str) – The name to report errors from this file as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

list[dict]

property worksheet_name

The worksheet name.

SpreadsheetInput

class SpreadsheetInput(file=None, file_type=None, worksheet_name=None, tag_columns=None, has_column_names=True, column_prefix_dictionary=None, name=None)[source]

Bases: BaseInput

A spreadsheet of HED tags.

EXCEL_EXTENSION = ['.xlsx']
TEXT_EXTENSION = ['.tsv', '.txt']
assemble(mapper=None, skip_curly_braces=False) DataFrame

Assembles the HED strings.

Parameters:
  • mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.

  • skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

pd.Dataframe

column_metadata() dict[int, ColumnMetadata]

Return the metadata for each column.

Returns:

Number/ColumnMetadata pairs.

Return type:

dict[int, ColumnMetadata]

property columns: list[str]

Returns a list of the column names.

Empty if no column names.

Returns:

The column names.

Return type:

list

static combine_dataframe(dataframe) Series
Combine all columns in the given dataframe into a single HED string series,

skipping empty columns and columns with empty strings.

Parameters:

dataframe (pd.Dataframe) – The dataframe to combine

Returns:

The assembled series.

Return type:

pd.Series

convert_to_form(hed_schema, tag_form)

Convert all tags in underlying dataframe to the specified form.

Parameters:
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

convert_to_long(hed_schema)

Convert all tags in underlying dataframe to long form.

Parameters:

hed_schema (HedSchema or None) – The schema to use to convert tags.

convert_to_short(hed_schema)

Convert all tags in underlying dataframe to short form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.

property dataframe

The underlying dataframe.

property dataframe_a: DataFrame

Return the assembled dataframe Probably a placeholder name.

Returns:

the assembled dataframe

Return type:

pd.Dataframe

expand_defs(hed_schema, def_dict)

Shrinks any def-expand found in the underlying dataframe.

Parameters:
  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

get_column_refs() list

Return a list of column refs for this file.

Default implementation returns empty list.

Returns:

A list of unique column refs found.

Return type:

list

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict

Return the definition dict for this file.

Note: Baseclass implementation returns just extra_def_dicts.

Parameters:
  • hed_schema (HedSchema) – Identifies tags to find definitions(if needed).

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

get_worksheet(worksheet_name=None) Workbook | None

Get the requested worksheet.

Parameters:

worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.

Returns:

The workbook request.

Return type:

Union[openpyxl.workbook.Workbook, None]

Notes

If None, returns the first worksheet.

Raises:

KeyError – If the specified worksheet name does not exist.

property has_column_names: bool

True if dataframe has column names.

property loaded_workbook

The underlying loaded workbooks.

property name: str

Name of the data.

property needs_sorting: bool

Return True if this both has an onset column, and it needs sorting.

property onsets

Return the onset column if it exists.

reset_mapper(new_mapper)

Set mapper to a different view of the file.

Parameters:

new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

property series_a: Series

Return the assembled dataframe as a series.

Returns:

the assembled dataframe with columns merged.

Return type:

pd.Series

property series_filtered: Series | None

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:

the assembled dataframe with columns merged, and the rows filtered together.

Return type:

Union[pd.Series, None]

set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')

Replace the specified cell with transformed text.

Parameters:
  • row_number (int) – The row number of the spreadsheet to set.

  • column_number (int) – The column number of the spreadsheet to set.

  • new_string_obj (HedString) – Object with text to put in the given cell.

  • tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc.)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:
  • ValueError – If there is not a loaded dataframe.

  • KeyError – If the indicated row/column does not exist.

  • AttributeError – If the indicated tag_form is not an attribute of HedTag.

shrink_defs(hed_schema)

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.

to_csv(file=None) str | None

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

Union[str, None]

Raises:

OSError – If the file cannot be opened.

to_excel(file)

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:
  • ValueError – If empty file object was passed.

  • OSError – If the file cannot be opened.

validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict]

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:
  • hed_schema (HedSchema) – The schema to use for validation.

  • extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.

  • name (str) – The name to report errors from this file as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

list[dict]

property worksheet_name

The worksheet name.

TimeseriesInput

class TimeseriesInput(file=None, sidecar=None, extra_def_dicts=None, name=None)[source]

Bases: BaseInput

A BIDS time series tabular file.

EXCEL_EXTENSION = ['.xlsx']
HED_COLUMN_NAME = 'HED'
TEXT_EXTENSION = ['.tsv', '.txt']
assemble(mapper=None, skip_curly_braces=False) DataFrame

Assembles the HED strings.

Parameters:
  • mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.

  • skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

pd.Dataframe

column_metadata() dict[int, ColumnMetadata]

Return the metadata for each column.

Returns:

Number/ColumnMetadata pairs.

Return type:

dict[int, ColumnMetadata]

property columns: list[str]

Returns a list of the column names.

Empty if no column names.

Returns:

The column names.

Return type:

list

static combine_dataframe(dataframe) Series
Combine all columns in the given dataframe into a single HED string series,

skipping empty columns and columns with empty strings.

Parameters:

dataframe (pd.Dataframe) – The dataframe to combine

Returns:

The assembled series.

Return type:

pd.Series

convert_to_form(hed_schema, tag_form)

Convert all tags in underlying dataframe to the specified form.

Parameters:
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

convert_to_long(hed_schema)

Convert all tags in underlying dataframe to long form.

Parameters:

hed_schema (HedSchema or None) – The schema to use to convert tags.

convert_to_short(hed_schema)

Convert all tags in underlying dataframe to short form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.

property dataframe

The underlying dataframe.

property dataframe_a: DataFrame

Return the assembled dataframe Probably a placeholder name.

Returns:

the assembled dataframe

Return type:

pd.Dataframe

expand_defs(hed_schema, def_dict)

Shrinks any def-expand found in the underlying dataframe.

Parameters:
  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

get_column_refs() list

Return a list of column refs for this file.

Default implementation returns empty list.

Returns:

A list of unique column refs found.

Return type:

list

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict

Return the definition dict for this file.

Note: Baseclass implementation returns just extra_def_dicts.

Parameters:
  • hed_schema (HedSchema) – Identifies tags to find definitions(if needed).

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

get_worksheet(worksheet_name=None) Workbook | None

Get the requested worksheet.

Parameters:

worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.

Returns:

The workbook request.

Return type:

Union[openpyxl.workbook.Workbook, None]

Notes

If None, returns the first worksheet.

Raises:

KeyError – If the specified worksheet name does not exist.

property has_column_names: bool

True if dataframe has column names.

property loaded_workbook

The underlying loaded workbooks.

property name: str

Name of the data.

property needs_sorting: bool

Return True if this both has an onset column, and it needs sorting.

property onsets

Return the onset column if it exists.

reset_mapper(new_mapper)

Set mapper to a different view of the file.

Parameters:

new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

property series_a: Series

Return the assembled dataframe as a series.

Returns:

the assembled dataframe with columns merged.

Return type:

pd.Series

property series_filtered: Series | None

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:

the assembled dataframe with columns merged, and the rows filtered together.

Return type:

Union[pd.Series, None]

set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')

Replace the specified cell with transformed text.

Parameters:
  • row_number (int) – The row number of the spreadsheet to set.

  • column_number (int) – The column number of the spreadsheet to set.

  • new_string_obj (HedString) – Object with text to put in the given cell.

  • tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc.)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:
  • ValueError – If there is not a loaded dataframe.

  • KeyError – If the indicated row/column does not exist.

  • AttributeError – If the indicated tag_form is not an attribute of HedTag.

shrink_defs(hed_schema)

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.

to_csv(file=None) str | None

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

Union[str, None]

Raises:

OSError – If the file cannot be opened.

to_excel(file)

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:
  • ValueError – If empty file object was passed.

  • OSError – If the file cannot be opened.

validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict]

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:
  • hed_schema (HedSchema) – The schema to use for validation.

  • extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.

  • name (str) – The name to report errors from this file as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

list[dict]

property worksheet_name

The worksheet name.

ColumnMapper

class ColumnMapper(sidecar=None, tag_columns=None, column_prefix_dictionary=None, optional_tag_columns=None, warn_on_missing_column=False)[source]

Bases: object

Translates tabular file columns into HED tag streams for validation and analysis.

ColumnMapper is the low-level engine behind TabularInput and SpreadsheetInput. It resolves column definitions from a Sidecar and/or explicit parameters into a per-column transform pipeline that produces HED strings row-by-row.

Use this class directly when you need to:

  • Build a custom tabular reader that doesn’t subclass BaseInput.

  • Inspect or override column mappings before validating (e.g. dynamic column selection at runtime).

  • Reuse a single mapper across many DataFrames for performance.

For the common case (reading a BIDS events file), prefer TabularInput which wraps ColumnMapper automatically.

Notes

  • All column numbers are 0-based.

  • The column_prefix_dictionary parameter is treated as a shorthand for creating value columns: {"col": "Description"} becomes {"col": "Description/#"} internally.

static check_for_blank_names(column_map, allow_blank_names) list[dict][source]

Validate there are no blank column names.

Parameters:
  • column_map (iterable) – A list of column names.

  • allow_blank_names (bool) – Only find issues if True.

Returns:

A list of dicts, one per issue.

Return type:

list[dict]

check_for_mapping_issues(allow_blank_names=False) list[dict][source]

Find all issues given the current column_map, tag_columns, etc.

Parameters:

allow_blank_names (bool) – Only flag blank names if False.

Returns:

All issues found as a list of dicts.

Return type:

list[dict]

property column_prefix_dictionary

Return the column_prefix_dictionary with numbers turned into names where possible.

Returns:

A column_prefix_dictionary with column labels as keys.

Return type:

column_prefix_dictionary(list of str or int)

get_column_mapping_issues() list[dict][source]

Get all the issues with finalizing column mapping(duplicate columns, missing required, etc.).

Notes

  • This is deprecated and now a wrapper for “check_for_mapping_issues()”.

Returns:

A list dictionaries of all issues found from mapping column names to numbers.

Return type:

list[dict]

get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]

Return def dicts from every column description.

Parameters:
  • hed_schema (Schema) – A HED schema object to use for extracting definitions.

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

get_tag_columns()[source]

Return the column numbers or names that are mapped to be HedTags.

Note: This is NOT the tag_columns or optional_tag_columns parameter, though they set it.

Returns:

A list of column numbers or names that are ColumnType.HedTags.

0-based if integer-based, otherwise column name.

Return type:

column_identifiers(list)

get_transformers()[source]

Return the transformers to use on a dataframe.

Returns:

dict({str or int: func}): The functions to use to transform each column. need_categorical(list of int): A list of columns to treat as categorical.

Return type:

tuple(dict, list)

set_column_map(new_column_map=None)[source]

Set the column number to name mapping.

Parameters:

new_column_map (list or dict) – Either an ordered list of the column names or column_number:column name. dictionary. In both cases, column numbers start at 0.

set_column_prefix_dictionary(column_prefix_dictionary, finalize_mapping=True)[source]

Set the column prefix dictionary.

set_tag_columns(tag_columns=None, optional_tag_columns=None, finalize_mapping=True)[source]

Set tag columns and optional tag columns.

Parameters:
  • tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags. If None, clears existing tag_columns

  • optional_tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags, but not an error if missing. If None, clears existing tag_columns

  • finalize_mapping (bool) – Re-generate the internal mapping if True, otherwise no effect until finalize.

property sidecar_column_data

Pass through to get the sidecar ColumnMetadata.

Returns:

ColumnMetadata}): The column metadata defined by this sidecar.

Return type:

dict({str

property tag_columns

Return the known tag and optional tag columns with numbers as names when possible.

Returns:

A list of all tag and optional tag columns as labels.

Return type:

tag_columns(list of str or int)

ColumnMetadata

class ColumnMetadata(column_type=None, name=None, source=None)[source]

Bases: object

Column in a ColumnMapper.

static expected_pound_sign_count(column_type) tuple[int, int][source]

Return how many pound signs a column string should have.

Parameters:

column_type (ColumnType) – The type of the column.

Returns:

  • The expected count: 0 or 1.

  • The type of the error we should issue.

Return type:

tuple[int, int]

get_hed_strings() Series[source]

Return the HED strings for this entry as a series.

Returns:

The HED strings for this series.(potentially empty).

Return type:

pd.Series

property hed_dict: dict | str

The HED strings for any given entry.

Returns:

A string or dict of strings for this column.

Return type:

Union[dict, str]

set_hed_strings(new_strings) bool[source]

Set the HED strings for this entry.

Parameters:

new_strings (pd.Series, dict, or str) – The HED strings to set. This should generally be the return value from get_hed_strings.

Returns:

True if the strings were successfully set, False otherwise.

Return type:

bool

property source_dict: dict | str

The raw dict for this entry(if it exists).

Returns:

A string or dict of strings for this column.

Return type:

Union[dict, str]

ColumnType

class ColumnType(*values)[source]

Bases: Enum

The overall column_type of a column in column mapper, e.g. treat it as HED tags.

Mostly internal to column mapper related code

Categorical = 'categorical'
HEDTags = 'hed_tags'
Ignore = 'ignore'
Unknown = None
Value = 'value'

Query models

Classes and functions for searching and querying HED annotations.

QueryHandler

class QueryHandler(expression_string)[source]

Bases: object

Parse a search expression into a form than can be used to search a HED string.

search(hed_string_obj) list[source]

Search for the query in the given HED string.

Parameters:

hed_string_obj (HedString) – String to search

Returns:

List of search result. Generally you should just treat this as a bool. True if a match was found.

Return type:

list[any]

SearchResult

class SearchResult(group, children)[source]

Bases: object

Holder for and manipulation of search results.

Represents a query match result consisting of:

  • group: The containing HedGroup where matches were found.

  • children: The specific matched elements (tags/groups) within that group (NOT all children of the group — only those that satisfied the query).

Example: When searching for “Red” in the HED string “(Red, Blue, Green)”:

  • group = the containing group (Red, Blue, Green)

  • children = [Red] (only the matched tag)

has_same_children(other)[source]

Checks if these two results have the same children by identity (not equality).

Parameters:

other (SearchResult) – Another search result to compare with this one.

Returns:

True if both results have the same group and identical children.

Return type:

bool

has_same_tags(other)

Checks if these two results have the same children by identity (not equality).

Parameters:

other (SearchResult) – Another search result to compare with this one.

Returns:

True if both results have the same group and identical children.

Return type:

bool

merge_and_result(other)[source]

Returns a new result with the combined children from this and other.

Parameters:

other (SearchResult) – Another search result to merge with this one.

Returns:

A new SearchResult containing unique children from both results.

Return type:

SearchResult

Raises:

ValueError – If the groups are not the same.

get_query_handlers

get_query_handlers(queries, query_names=None) tuple[list[QueryHandler | None], list[QueryHandler | None], list][source]

Return a list of query handlers, query names, and issues if any.

Parameters:
  • queries (list) – A list of query strings.

  • query_names (list or None) – A list of column names for results of queries. If missing — query_1, query_2, etc.

Returns:

A tuple containing:
  • list: QueryHandlers for successfully parsed queries or None.

  • list: str names to assign to results of the queries or None.

  • list: issues if any of the queries could not be parsed or other errors occurred.

Return type:

tuple

search_hed_objs

search_hed_objs(hed_objs, queries, query_names) DataFrame[source]

Return a DataFrame of factors based on results of queries.

Parameters:
  • hed_objs (list) – A list of HedString objects (empty entries or None entries are 0’s

  • queries (list) – A list of query strings or QueryHandler objects.

  • query_names (list) – A list of column names for results of queries.

Returns:

Contains the factor vectors with results of the queries.

Return type:

pd.DataFrame

Raises:

ValueError – If query names are invalid or duplicated.

DataFrame utilities

Functions for transforming HED strings within pandas DataFrames.

convert_to_form

convert_to_form(df, hed_schema, tag_form, columns=None)[source]

Convert all tags in underlying dataframe to the specified form (in place).

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to.

  • columns (list) – The columns to modify on the dataframe.

expand_defs

expand_defs(df, hed_schema, def_dict, columns=None)[source]

Expands any def tags found in the dataframe.

Converts in place

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

  • columns (list or None) – The columns to modify on the dataframe.

shrink_defs

shrink_defs(df, hed_schema, columns=None)[source]

Shrink (in place) any def-expand tags found in the specified columns in the dataframe.

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • columns (list or None) – The columns to modify on the dataframe.

process_def_expands

process_def_expands(hed_strings, hed_schema, known_defs=None, ambiguous_defs=None) tuple[DefinitionDict, dict, dict][source]

Gather def-expand tags in the strings/compare with known definitions to find any differences.

Parameters:
  • hed_strings (list or pd.Series) – A list of HED strings to process.

  • hed_schema (HedSchema) – The schema to use.

  • known_defs (DefinitionDict or list or str or None) – A DefinitionDict or anything its constructor takes. These are the known definitions going in, that must match perfectly.

  • ambiguous_defs (dict) – A dictionary containing ambiguous definitions. format TBD. Currently def name key: list of lists of HED tags values

Returns:

A tuple containing the DefinitionDict, ambiguous definitions, and a

dictionary of error lists keyed by definition name

Return type:

tuple [DefinitionDict, dict, dict]

sort_dataframe_by_onsets

sort_dataframe_by_onsets(df)[source]

Sort a dataframe by the onset column.

Parameters:

df (pd.Dataframe) – Dataframe to sort.

Returns:

The sorted dataframe, or the original dataframe if it didn’t have an onset column.

Return type:

pd.DataFrame

filter_series_by_onset

filter_series_by_onset(series, onsets)[source]

Return the series, with rows that have the same onset combined.

Parameters:
  • series (pd.Series or pd.Dataframe) – The series to filter. If dataframe, it filters the “HED” column.

  • onsets (pd.Series) – The onset column to filter by.

Returns:

the series with rows filtered together.

Return type:

Union[Series, Dataframe]

split_delay_tags

split_delay_tags(series, hed_schema, onsets)[source]

Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.

Parameters:
  • series (pd.Series or None) – the series of tags to split/sort

  • hed_schema (HedSchema) – The schema to use to identify tags

  • onsets (pd.Series or None)

Returns:

If we had onsets, a dataframe with 3 columns

”HED”: The HED strings(still str) “onset”: the updated onsets “original_index”: the original source line. Multiple lines can have the same original source line.

Return type:

Union[pd.Dataframe, None]

Note: This dataframe may be longer than the original series, but it will never be shorter.