Models¶
Core data models for working with HED data structures.
Core models¶
The fundamental data structures for HED annotations and tags.
HedString¶
- class hed.models.hed_string.HedString(hed_string, hed_schema, def_dict=None, _contents=None)[source]¶
Bases:
HedGroupA HED string with its schema and definitions.
- OPENING_GROUP_CHARACTER = '('¶
- CLOSING_GROUP_CHARACTER = ')'¶
- __init__(hed_string, hed_schema, def_dict=None, _contents=None)[source]¶
Constructor for the HedString class.
- Parameters:
hed_string (str) – A HED string consisting of tags and tag groups.
hed_schema (HedSchema) – The schema to use to identify tags.
def_dict (DefinitionDict or None) – The def dict to use to identify def/def expand tags.
_contents ([HedGroup and/or HedTag] or None) – Create a HedString from this exact list of children. Does not make a copy.
Notes
The HedString object parses its component tags and groups into a tree-like structure.
- static from_hed_strings(hed_strings) HedString[source]¶
Create a new HedString from a list of HedStrings.
- property is_group¶
Always False since the underlying string is not a group with parentheses.
- copy() HedString[source]¶
Return a deep copy of this string.
- Returns:
The copied group.
- Return type:
- remove_definitions()[source]¶
Remove definition tags and groups from this string.
This does not validate definitions and will blindly removing invalid ones as well.
- shrink_defs() HedString[source]¶
Replace def-expand tags with def tags.
This does not validate them and will blindly shrink invalid ones as well.
- Returns:
self
- Return type:
- expand_defs() HedString[source]¶
Replace def tags with def-expand tags.
This does very minimal validation.
- Returns:
self
- Return type:
- get_as_original() str[source]¶
Return the original form of this string.
- Returns:
The string with all the tags in their original form.
- Return type:
Notes
Potentially with some extraneous spaces removed on returned string.
- static split_into_groups(hed_string, hed_schema, def_dict=None) list[source]¶
Split the HED string into a parse tree.
- Parameters:
hed_string (str) – A HED string consisting of tags and tag groups to be processed.
hed_schema (HedSchema) – HED schema to use to identify tags.
def_dict (DefinitionDict) – The definitions to identify.
- Returns:
A list of HedTag and/or HedGroup.
- Return type:
- Raises:
ValueError – If the string is significantly malformed, such as mismatched parentheses.
Notes
The parse tree consists of tag groups, tags, and delimiters.
- static split_hed_string(hed_string) list[tuple[bool, tuple[int, int]]][source]¶
Split a HED string into delimiters and tags.
- Parameters:
hed_string (str) – The HED string to split.
- Returns:
A list of tuples where each tuple is (is_hed_tag, (start_pos, end_pos)).
- Return type:
Notes
- The tuple format is as follows
is_hed_tag (bool): A (possible) HED tag if True, delimiter if not.
start_pos (int): Index of start of string in hed_string.
end_pos (int): Index of end of string in hed_string.
This function does not validate tags or delimiters in any form.
- validate(allow_placeholders=True, error_handler=None) list[dict][source]¶
Validate the string using the schema.
- Parameters:
allow_placeholders (bool) – Allow placeholders in the string.
error_handler (ErrorHandler or None) – The error handler to use, creates a default one if none passed.
- Returns:
A list of issues for HED string.
- Return type:
- find_top_level_tags(anchor_tags, include_groups=2) list[source]¶
Find top level groups with an anchor tag.
A max of 1 tag located per top level group.
- Parameters:
anchor_tags (container) – A list/set/etc. of short_base_tags to find groups by.
include_groups (TopTagReturnType or int) – Controls what is returned. Use
TopTagReturnTypeconstants for clarity.TAGS(0): return only anchor tags.GROUPS(1): return only groups.BOTH(2, default): return(tag, group)pairs.
- Returns:
The returned result depends on include_groups.
- Return type:
HedTag¶
- class hed.models.hed_tag.HedTag(hed_string, hed_schema, span=None, def_dict=None)[source]¶
Bases:
objectA single HED tag.
Notes
HedTag is a smart class in that it keeps track of its original value and positioning as well as pointers to the relevant HED schema information, if relevant.
- __init__(hed_string, hed_schema, span=None, def_dict=None)[source]¶
Creates a HedTag.
- Parameters:
hed_string (str) – Source HED string for this tag.
hed_schema (HedSchema) – A parameter for calculating canonical forms on creation.
span (int, int) – The start and end indexes of the tag in the hed_string.
def_dict (DefinitionDict or None) – The def dict to use to identify def/def expand tags.
- property schema_namespace: str¶
Library namespace for this tag if one exists.
- Returns:
The library namespace, including the colon.
- Return type:
- property short_tag: str¶
Short form including value or extension.
- Returns:
The short form of the tag, including value or extension.
- Return type:
- property base_tag: str¶
Long form without value or extension.
- Returns:
The long form of the tag, without value or extension.
- Return type:
- property short_base_tag: str¶
Short form without value or extension.
- Returns:
The short non-extension port of a tag.
- Return type:
Notes
ParentNodes/Def/DefName would return just “Def”.
- property org_base_tag: str¶
Original form without value or extension.
- Returns:
The original form of the tag, without value or extension.
- Return type:
Notes
Warning: This could be empty if the original tag had a name_prefix prepended. e.g. a column where “Label/” is prepended, thus the column value has zero base portion.
- tag_modified() bool[source]¶
Return True if tag has been modified from original.
- Returns:
Return True if the tag is modified.
- Return type:
Notes
Modifications can include adding a column name_prefix.
- property tag: str¶
Returns the tag or the original tag if no user form set.
- Returns:
The custom set user form of the tag.
- Return type:
- property extension: str¶
Get the extension or value of tag.
Generally this is just the portion after the last slash. Returns an empty string if no extension or value.
- Returns:
The tag name.
- Return type:
Notes
This tag must have been computed first.
- property long_tag: str¶
Long form including value or extension.
- Returns:
The long form of this tag.
- Return type:
- property org_tag: str¶
Return the original unmodified tag.
- Returns:
The original unmodified tag.
- Return type:
- property expanded: bool¶
Return if this is currently expanded or not.
Will always be False unless expandable is set. This is primarily used for Def/Def-expand tags at present.
- Returns:
True if this is currently expanded.
- Return type:
- property expandable: 'HedGroup' | 'HedTag' | None¶
Return what this expands to.
This is primarily used for Def/Def-expand tags at present.
Lazily set the first time it’s called.
- is_column_ref() bool[source]¶
Return if this tag is a column reference from a sidecar.
You should only see these if you are directly accessing sidecar strings, tools should remove them otherwise.
- Returns:
True if this is a column ref.
- Return type:
- __str__() str[source]¶
Convert this HedTag to a string.
- Returns:
The original tag if we haven’t set a new tag.(e.g. short to long).
- Return type:
- get_stripped_unit_value(extension_text) tuple[str | None, str | None][source]¶
Return the extension divided into value and units, if the units are valid.
- Parameters:
extension_text (str) – The text to split, in case it’s a portion of a tag.
- Returns:
The extension portion with the units removed or None if invalid units. Union[str, None]: The units or None if no units of the right unit class are found.
- Return type:
Union[str, None]
Examples
‘Duration/3 ms’ will return (‘3’, ‘ms’)
- value_as_default_unit() float | None[source]¶
Return the value converted to default units if possible or None if invalid.
- Returns:
The extension value in default units. If no default units it assumes that the extension value is in default units.
- Return type:
Union[float, None]
Examples
‘Duration/300 ms’ will return .3
- property unit_classes: dict¶
Return a dict of all the unit classes this tag accepts.
- Returns:
A dict of unit classes this tag accepts.
- Return type:
Notes
Returns empty dict if this is not a unit class tag.
The dictionary has unit name as the key and HedSchemaEntry as value.
- property value_classes: dict¶
Return a dict of all the value classes this tag accepts.
- Returns:
A dictionary of HedSchemaEntry value classes this tag accepts.
- Return type:
Notes
Returns empty dict if this is not a value class.
The dictionary has unit name as the key and HedSchemaEntry as value.
- property attributes: dict¶
Return a dict of all the attributes this tag has or empty dict if this is not a value tag.
- Returns:
A dict of attributes this tag has.
- Return type:
Notes
Returns empty dict if this is not a unit class tag.
The dictionary has unit name as the key and HedSchemaEntry as value.
- tag_exists_in_schema() bool[source]¶
Return whether the schema entry for this tag exists.
- Returns:
True if this tag exists.
- Return type:
Notes
This does NOT assure this is a valid tag.
- is_takes_value_tag() bool[source]¶
Return True if this is a takes value tag.
- Returns:
True if this is a takes value tag.
- Return type:
- is_unit_class_tag() bool[source]¶
Return True if this is a unit class tag.
- Returns:
True if this is a unit class tag.
- Return type:
- is_value_class_tag() bool[source]¶
Return True if this is a value class tag.
- Returns:
True if this is a tag with a value class.
- Return type:
- is_basic_tag() bool[source]¶
Return True if a known tag with no extension or value.
- Returns:
True if this is a known tag without extension or value.
- Return type:
- get_tag_unit_class_units() list[source]¶
Get the unit class units associated with a particular tag.
- Returns:
A list containing the unit class units associated with a particular tag or an empty list.
- Return type:
- property default_unit¶
Get the default unit class unit for this tag.
Only a tag with a single unit class can have default units.
- Returns:
the default unit entry for this tag, or None
- Return type:
unit(UnitEntry or None)
- base_tag_has_attribute(tag_attribute) bool[source]¶
Check to see if the tag has a specific attribute.
This is primarily used to check for things like TopLevelTag on Definitions and similar.
- is_placeholder() bool[source]¶
Returns if this tag has a placeholder in it.
- Returns:
True if it has a placeholder.
- Return type:
HedGroup¶
- class hed.models.hed_group.HedGroup(hed_string='', startpos=None, endpos=None, contents=None)[source]¶
Bases:
objectA single parenthesized HED string.
- __init__(hed_string='', startpos=None, endpos=None, contents=None)[source]¶
Return an empty HedGroup object.
- Parameters:
hed_string (str or None) – Source HED string for this group.
startpos (int or None) – Starting index of group(including parentheses) in hed_string.
endpos (int or None) – Position after the end (including parentheses) in hed_string.
contents (list or None) – A list of HedTags and/or HedGroups that will be set as the contents of this group. Mostly used during definition expansion.
- static replace(item_to_replace, new_contents)[source]¶
Replace an existing tag or group.
Note: This is a static method that relies on the parent attribute of item_to_replace.
- remove(items_to_remove: Iterable[HedTag | HedGroup])[source]¶
Remove any tags/groups in items_to_remove.
- Parameters:
items_to_remove (list) – List of HedGroups and/or HedTags to remove by identity.
Notes
Any groups that become empty will also be pruned.
If you pass a child and parent group, the child will also be removed from the parent.
- sorted() HedGroup[source]¶
Return a sorted copy of this HED group
- Returns:
The sorted copy.
- Return type:
- property is_group¶
True if this is a parenthesized group.
- get_all_tags() list[source]¶
Return HedTags, including descendants.
- Returns:
A list of all the tags in this group including descendants.
- Return type:
- get_all_groups(also_return_depth=False) list[source]¶
Return HedGroups, including descendants and self.
- tags() list[source]¶
Return the direct child tags of this group.
- Returns:
All tags directly in this group, filtering out HedGroup children.
- Return type:
- groups() list[source]¶
Return the direct child groups of this group.
- Returns:
All groups directly in this group, filtering out HedTag children.
- Return type:
- get_first_group() HedGroup[source]¶
Return the first group in this HED string or group.
Useful for things like Def-expand where they only have a single group.
- Returns:
The first group.
- Return type:
- Raises:
ValueError – If there are no groups.
- get_original_hed_string() str[source]¶
Get the original HED string.
- Returns:
The original string with no modification.
- Return type:
- __str__() str[source]¶
Convert this HedGroup to a string.
- Returns:
The group as a string, including any modified HedTags.
- Return type:
- get_as_short() str[source]¶
Return this HedGroup as a short tag string.
- Returns:
The group as a string with all tags as short tags.
- Return type:
- get_as_long() str[source]¶
Return this HedGroup as a long tag string.
- Returns:
The group as a string with all tags as long tags.
- Return type:
- get_as_indented(tag_attribute='short_tag') str[source]¶
Return the string as a multiline indented format.
- find_placeholder_tag() HedTag | None[source]¶
Return a placeholder tag, if present in this group.
- Returns:
The placeholder tag if found.
- Return type:
Union[HedTag, None]
Notes
Assumes a valid HedString with no erroneous “#” characters.
- __eq__(other)[source]¶
Test whether other is equal to this object.
Note: This does not account for sorting. Objects must be in the same order to match.
- find_tags(search_tags, recursive=False, include_groups=2) list[source]¶
Find the base tags and their containing groups.
Comparison property:
short_base_tag(schema short name without any extension or value). Rationale: callers pass bare tag names such as"Event"or"Def"and must match regardless of any extension or value the tag carries in the source string. Usingshort_base_tagstrips the extension/value so"Def/MyDef"is found by searching for"Def".- Parameters:
search_tags (container) – A container of short_base_tags to locate.
recursive (bool) – If true, also check subgroups.
include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.
- Returns:
The contents of the list depends on the value of include_groups.
- Return type:
- find_wildcard_tags(search_tags, recursive=False, include_groups=2) list[source]¶
Find tags whose short form starts with a given prefix (implicit trailing wildcard).
Comparison property:
short_tag(schema short name including any extension or value). Rationale: the query is a prefix such as"Def/"or"Eve"; the match must cover the extension/value as well so that"Def/MyDef"is found by"Def/"but not by an unrelated tag that merely shares the same base.short_tagis used (notshort_base_tag) so that value-bearing tags like"Duration/3 s"can be matched by a prefix query such as"Duration/". Note: prefix matching is anchored to the start ofshort_tagonly, so"Eve"finds"Event"but not"Sensory-event".- Parameters:
search_tags (container) – A container of the starts of short tags to search.
recursive (bool) – If True, also check subgroups.
include_groups (0, 1 or 2) – Specify return values. If 0: return a list of the HedTags. If 1: return a list of the HedGroups containing the HedTags. If 2: return a list of tuples (HedTag, HedGroup) for the found tags.
- Returns:
The contents of the list depends on the value of include_groups.
- Return type:
- find_exact_tags(exact_tags, recursive=False, include_groups=1) list[source]¶
Find tags that match exactly, including any extension or value.
Comparison property:
HedTag.__eq__which comparesshort_tag.casefold()(falling back toorg_tag.casefold()for unrecognised tags). Rationale: callers pass a slash-path string such as"def/mydef"and need an exact full-path match — the extension/value is part of the identity ("Def/Foo"must not match"Def/Bar"). BecauseHedTag.__str__returnsshort_tagwhen the tag is schema-identified, a tag written in long form in the source HED string (e.g."Event/Sensory-event") will still be found by a short-form query ("Sensory-event"); the schema normalises them to the sameshort_tag. Unrecognised tags fall back to a case-insensitive comparison of the original text.- Parameters:
- Returns:
A list of tuples. The contents depend on the values of the include_group.
- Return type:
- find_def_tags(recursive=False, include_groups=3) list[source]¶
Find def and def-expand tags.
- Parameters:
recursive (bool) – If true, also check subgroups.
include_groups (int, 0, 1, 2, 3) – Options for return values. If 0: Return only def and def expand tags/. If 1: Return only def tags and def-expand groups. If 2: Return only groups containing defs, or def-expand groups. If 3 or any other value: Return all 3 as a tuple.
- Returns:
A list of tuples. The contents depend on the values of the include_group.
- Return type:
- find_tags_with_term(term, recursive=False, include_groups=2) list[source]¶
Find tags whose schema ancestry includes the given term.
Comparison property:
tag_terms— a tuple of all path components in the tag’s long-form schema path, all casefolded (e.g.("event", "sensory-event")for theSensory-eventtag). Rationale: this implements HED’s ancestor search — a bare query term such as"Event"must match not only theEventtag itself but also every descendant (Sensory-event,Agent-action, etc.) because those descendants inherit theEventparent.tag_termsencodes the full ancestry, so membership testing (term in tag.tag_terms) handles all descendants in O(k) time where k is the schema depth. This requires a schema-identified tag; unidentified tags have an emptytag_termstuple and will not be found.- Parameters:
- Returns:
A list of tuples. The contents depend on the values of the include_group.
- Return type:
DefinitionDict¶
- class hed.models.definition_dict.DefinitionDict(def_dicts=None, hed_schema=None)[source]¶
Bases:
objectGathers definitions from a single source.
- __init__(def_dicts=None, hed_schema=None)[source]¶
Definitions to be considered a single source.
- Parameters:
def_dicts (str or list or DefinitionDict) – DefDict or list of DefDicts/strings or a single string whose definitions should be added.
hed_schema (HedSchema or None) – Required if passing strings or lists of strings, unused otherwise.
- Raises:
TypeError – Bad type passed as def_dicts.
- add_definitions(defs, hed_schema=None)[source]¶
Add definitions from dict(s) or strings(s) to this dict.
- Parameters:
defs (list, DefinitionDict, dict, or str) – DefinitionDict or list of DefinitionDicts/strings/dicts whose definitions should be added.
hed_schema (HedSchema or None) – Required if passing strings or lists of strings, unused otherwise.
- Note - dict form expects DefinitionEntries in the same form as a DefinitionDict
Note - str or list of strings will parse the strings using the hed_schema. Note - You can mix and match types, eg [DefinitionDict, str, list of str] would be valid input.
- Raises:
TypeError – Bad type passed as defs.
- get(def_name) DefinitionEntry | None[source]¶
Get the definition entry for the definition name.
Not case-sensitive
- Parameters:
def_name (str) – Name of the definition to retrieve.
- Returns:
Definition entry for the requested definition.
- Return type:
Union[DefinitionEntry, None]
- items()[source]¶
Return the dictionary of definitions.
Alias for .defs.items()
- Returns:
DefinitionEntry}): A list of definitions.
- Return type:
def_entries({str
- property issues¶
Return issues about duplicate definitions.
- check_for_definitions(hed_string_obj, error_handler=None) list[dict][source]¶
Check string for definition tags, adding them to self.
- Parameters:
hed_string_obj (HedString) – A single HED string to gather definitions from.
error_handler (ErrorHandler or None) – Error context used to identify where definitions are found.
- Returns:
List of issues encountered in checking for definitions. Each issue is a dictionary.
- Return type:
- get_definition_entry(def_tag)[source]¶
Get the entry for a given def tag.
Does not validate at all.
- Parameters:
def_tag (HedTag) – Source HED tag that may be a Def or Def-expand tag.
- Returns:
The definition entry if it exists
- Return type:
def_entry(DefinitionEntry or None)
DefinitionEntry¶
- class hed.models.definition_entry.DefinitionEntry(name, contents, takes_value, source_context)[source]¶
Bases:
objectStores the resolved contents of a single HED Definition.
A
DefinitionEntryis created when aDefinition/tag group is parsed and stored in aDefinitionDict. It captures:name — the lower-cased label portion (without
Definition/).contents — the inner
HedGroupof the definition (Noneif the definition body is empty).takes_value — whether exactly one tag inside contains a
#placeholder (i.e. the definition expects a run-time value viaDef/name/value).source_context — the error-context stack captured at parse time, used to produce precise error messages when the definition is later expanded.
Use this class directly when you need to:
Iterate over a
DefinitionDictand inspect individual definition bodies or their placeholder status.Build tooling that expands, serialises, or analyses HED definitions programmatically.
Most users never need this class —
get_def_entry()andexpand_def_tag()handle the common workflows.- __init__(name, contents, takes_value, source_context)[source]¶
Initialize info for a single definition.
- Parameters:
name (str) – The label portion of this name (not including Definition/).
contents (HedGroup) – The contents of this definition (which could be None).
takes_value (bool) – If True, expects ONE tag to have a single # sign in it.
source_context (list, None) – List (stack) of dictionaries giving context for reporting errors.
- get_definition(replace_tag, placeholder_value=None, return_copy_of_tag=False) HedGroup | None[source]¶
Return a copy of the definition with the tag expanded and the placeholder plugged in.
Returns None if placeholder_value passed when it doesn’t take value, or vice versa.
- Parameters:
- Returns:
The contents of this definition(including the def tag itself).
- Return type:
Union[HedGroup, None]
- Raises:
ValueError – Something internally went wrong with finding the placeholder tag. This should not be possible.
- __eq__(other)[source]¶
Check equality based on name, contents, and takes_value.
- Parameters:
other (DefinitionEntry) – Another DefinitionEntry to compare with.
- Returns:
True if name, contents, and takes_value are equal, False otherwise.
- Return type:
DefExpandGatherer¶
- class hed.models.def_expand_gather.DefExpandGatherer(hed_schema, known_defs=None, ambiguous_defs=None, errors=None)[source]¶
Bases:
objectGather definitions from a series of def-expands, including possibly ambiguous ones.
Notes: The def-dict contains the known definitions. After validation, it also contains resolved definitions. The errors contain the definition contents that are known to be in error. The ambiguous_defs contain the definitions that cannot be resolved based on the data.
- __init__(hed_schema, known_defs=None, ambiguous_defs=None, errors=None)[source]¶
Initialize the DefExpandGatherer class.
- Parameters:
hed_schema (HedSchema) – The HED schema to be used for processing.
known_defs (str or list or DefinitionDict) – A dictionary of known definitions.
ambiguous_defs (dict or None) – An optional dictionary of ambiguous def-expand definitions.
errors (dict or None) – An optional dictionary to store errors keyed by definition names.
- process_def_expands(hed_strings, known_defs=None) tuple[DefinitionDict, dict, dict][source]¶
Process the HED strings containing def-expand tags.
- Parameters:
- Returns:
- A tuple containing the DefinitionDict, ambiguous definitions, and a
dictionary of error lists keyed by definition name
- Return type:
tuple [DefinitionDict, dict, dict]
Constants¶
Enumerations and named constants used across the models layer.
DefTagNames¶
- class hed.models.model_constants.DefTagNames[source]¶
Bases:
objectSource names for definitions, def labels, and expanded labels.
- DEF_KEY = 'Def'¶
- DEF_EXPAND_KEY = 'Def-expand'¶
- DEFINITION_KEY = 'Definition'¶
- ONSET_KEY = 'Onset'¶
- OFFSET_KEY = 'Offset'¶
- INSET_KEY = 'Inset'¶
- DURATION_KEY = 'Duration'¶
- DELAY_KEY = 'Delay'¶
- TEMPORAL_KEYS = {'Inset', 'Offset', 'Onset'}¶
- DURATION_KEYS = {'Delay', 'Duration'}¶
- ALL_TIME_KEYS = {'Delay', 'Duration', 'Inset', 'Offset', 'Onset'}¶
- TIMELINE_KEYS = {'Delay', 'Inset', 'Offset', 'Onset'}¶
TopTagReturnType¶
- class hed.models.model_constants.TopTagReturnType(value)[source]¶
Bases:
IntEnumReturn-type selector for
find_top_level_tags().Pass one of these constants as the
include_groupsargument to control whether the method returns anchor tags, containing groups, or (tag, group) pairs.- TAGS¶
Return only the anchor
HedTagobjects.
- GROUPS¶
Return only the
HedGroupobjects that contain each anchor tag.
- BOTH¶
Return
(tag, group)tuples pairing each anchor tag with its containing group.
- TAGS = 0¶
- GROUPS = 1¶
- BOTH = 2¶
Input models¶
Models for handling different types of input data.
BaseInput¶
- class hed.models.base_input.BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]¶
Bases:
objectSuperclass representing a basic columnar file.
- TEXT_EXTENSION = ['.tsv', '.txt']¶
- EXCEL_EXTENSION = ['.xlsx']¶
- __init__(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]¶
Constructor for the BaseInput class.
- Parameters:
file (str or file-like or pd.Dataframe) – An xlsx/tsv file to open.
file_type (str or None) – “.xlsx” (Excel), “.tsv” or “.txt” (tab-separated text). Derived from file if file is a filename. Ignored if pandas dataframe.
worksheet_name (str or None) – Name of Excel workbook worksheet name to use. (Not applicable to tsv files.)
has_column_names (bool) – True if file has column names. This value is ignored if you pass in a pandas dataframe.
mapper (ColumnMapper or None) – Indicates which columns have HED tags. See SpreadsheetInput or TabularInput for examples of how to use built-in a ColumnMapper.
name (str or None) – Optional field for how this file will report errors.
allow_blank_names (bool) – If True, column names can be blank
- Raises:
HedFileError – For various issues.
- Notes: Reasons for raising HedFileError include:
file is blank.
An invalid dataframe was passed with size 0.
An invalid extension was provided.
A duplicate or empty column name appears.
Cannot open the indicated file.
The specified worksheet name does not exist.
If the sidecar file or tabular file had invalid format and could not be read.
- reset_mapper(new_mapper)[source]¶
Set mapper to a different view of the file.
- Parameters:
new_mapper (ColumnMapper) – A column mapper to be associated with this base input.
- property dataframe¶
The underlying dataframe.
- property dataframe_a: DataFrame¶
Return the assembled dataframe Probably a placeholder name.
- Returns:
the assembled dataframe
- Return type:
pd.Dataframe
- property series_a: Series¶
Return the assembled dataframe as a series.
- Returns:
the assembled dataframe with columns merged.
- Return type:
pd.Series
- property series_filtered: Series | None¶
Return the assembled dataframe as a series, with rows that have the same onset combined.
- Returns:
the assembled dataframe with columns merged, and the rows filtered together.
- Return type:
Union[pd.Series, None]
- property onsets¶
Return the onset column if it exists.
- property loaded_workbook¶
The underlying loaded workbooks.
- property worksheet_name¶
The worksheet name.
- convert_to_form(hed_schema, tag_form)[source]¶
Convert all tags in underlying dataframe to the specified form.
- convert_to_short(hed_schema)[source]¶
Convert all tags in underlying dataframe to short form.
- Parameters:
hed_schema (HedSchema) – The schema to use to convert tags.
- convert_to_long(hed_schema)[source]¶
Convert all tags in underlying dataframe to long form.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to convert tags.
- shrink_defs(hed_schema)[source]¶
Shrinks any def-expand found in the underlying dataframe.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to identify defs.
- expand_defs(hed_schema, def_dict)[source]¶
Shrinks any def-expand found in the underlying dataframe.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to identify defs.
def_dict (DefinitionDict) – The definitions to expand.
- to_excel(file)[source]¶
Output to an Excel file.
- Parameters:
file (str or file-like) – Location to save this base input.
- Raises:
ValueError – If empty file object was passed.
OSError – If the file cannot be opened.
- property columns: list[str]¶
Returns a list of the column names.
Empty if no column names.
- Returns:
The column names.
- Return type:
- column_metadata() dict[int, ColumnMetadata][source]¶
Return the metadata for each column.
- Returns:
Number/ColumnMetadata pairs.
- Return type:
- set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')[source]¶
Replace the specified cell with transformed text.
- Parameters:
Notes
Any attribute of a HedTag that returns a string is a valid value of tag_form.
- Raises:
ValueError – If there is not a loaded dataframe.
KeyError – If the indicated row/column does not exist.
AttributeError – If the indicated tag_form is not an attribute of HedTag.
- get_worksheet(worksheet_name=None) Workbook | None[source]¶
Get the requested worksheet.
- Parameters:
worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.
- Returns:
The workbook request.
- Return type:
Union[openpyxl.workbook.Workbook, None]
Notes
If None, returns the first worksheet.
- Raises:
KeyError – If the specified worksheet name does not exist.
- validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]¶
Creates a SpreadsheetValidator and returns all issues with this file.
- Parameters:
hed_schema (HedSchema) – The schema to use for validation.
extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.
name (str) – The name to report errors from this file as.
error_handler (ErrorHandler) – Error context to use. Creates a new one if None.
- Returns:
A list of issues for a HED string.
- Return type:
- assemble(mapper=None, skip_curly_braces=False) DataFrame[source]¶
Assembles the HED strings.
- Parameters:
mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.
skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.
- Returns:
The assembled dataframe.
- Return type:
pd.Dataframe
- static combine_dataframe(dataframe) Series[source]¶
- Combine all columns in the given dataframe into a single HED string series,
skipping empty columns and columns with empty strings.
- Parameters:
dataframe (pd.Dataframe) – The dataframe to combine
- Returns:
The assembled series.
- Return type:
pd.Series
- get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]¶
Return the definition dict for this file.
Note: Baseclass implementation returns just extra_def_dicts.
- Parameters:
hed_schema (HedSchema) – Identifies tags to find definitions(if needed).
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts).
- Return type:
Sidecar¶
- class hed.models.sidecar.Sidecar(files, name=None)[source]¶
Bases:
objectContents of a JSON file or JSON files.
- __iter__()[source]¶
An iterator to go over the individual column metadata.
- Returns:
An iterator over the column metadata values.
- Return type:
iterator
- property all_hed_columns: list[str]¶
Return all columns that are HED compatible.
- Returns:
A list of all valid HED columns by name.
- Return type:
- property def_dict: DefinitionDict¶
Definitions from this sidecar.
Generally you should instead call get_def_dict to get the relevant definitions.
- Returns:
The definitions for this sidecar.
- Return type:
- property column_data¶
Generate the ColumnMetadata for this sidecar.
- Returns:
ColumnMetadata}): The column metadata defined by this sidecar.
- Return type:
dict({str
- get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]¶
Return the definition dict for this sidecar.
- Parameters:
hed_schema (HedSchema) – Identifies tags to find definitions.
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts).
- Return type:
- save_as_json(save_filename)[source]¶
Save column metadata to a JSON file.
- Parameters:
save_filename (str) – Path to save file.
- get_as_json_string() str[source]¶
Return this sidecar’s column metadata as a string.
- Returns:
The json string representing this sidecar.
- Return type:
- load_sidecar_file(file)[source]¶
Load column metadata from a given json file.
- Parameters:
file (str or FileLike) – If a string, this is a filename. Otherwise, it will be parsed as a file-like.
- Raises:
HedFileError – If the file was not found or could not be parsed into JSON.
- load_sidecar_files(files)[source]¶
Load json from a given file or list.
- Parameters:
files (str or FileLike or list) – A string or file-like object representing a JSON file, or a list of such.
- Raises:
HedFileError – If the file was not found or could not be parsed into JSON.
- validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]¶
Create a SidecarValidator and validate this sidecar with the schema.
- Parameters:
hed_schema (HedSchema) – Input data to be validated.
extra_def_dicts (list or DefinitionDict) – Extra def dicts in addition to sidecar.
name (str) – The name to report this sidecar as.
error_handler (ErrorHandler) – Error context to use. Creates a new one if None.
- Returns:
A list of issues associated with each level in the HED string.
- Return type:
TabularInput¶
- class hed.models.tabular_input.TabularInput(file=None, sidecar=None, name=None)[source]¶
Bases:
BaseInputA BIDS tabular file with sidecar.
- HED_COLUMN_NAME = 'HED'¶
- __init__(file=None, sidecar=None, name=None)[source]¶
Constructor for the TabularInput class.
- Parameters:
- Raises:
HedFileError – For the following issues:
- The file is blank. –
- An invalid dataframe was passed with size 0. –
- An invalid extension was provided. –
- A duplicate or empty column name appears. –
OSError: If it cannot open the indicated file. ValueError: If this file has no column names.
- get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]¶
Return the definition dict for this sidecar.
- Parameters:
hed_schema (HedSchema) – Used to identify tags to find definitions.
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts).
- Return type:
SpreadsheetInput¶
- class hed.models.spreadsheet_input.SpreadsheetInput(file=None, file_type=None, worksheet_name=None, tag_columns=None, has_column_names=True, column_prefix_dictionary=None, name=None)[source]¶
Bases:
BaseInputA spreadsheet of HED tags.
- __init__(file=None, file_type=None, worksheet_name=None, tag_columns=None, has_column_names=True, column_prefix_dictionary=None, name=None)[source]¶
Constructor for the SpreadsheetInput class.
- Parameters:
file (str or file like) – An xlsx/tsv file to open or a File object.
file_type (str or None) – “.xlsx” for Excel, “.tsv” or “.txt” for tsv. data.
worksheet_name (str or None) – The name of the Excel workbook worksheet that contains the HED tags. Not applicable to tsv files. If omitted for Excel, the first worksheet is assumed.
tag_columns (list) – A list of ints or strs containing the columns that contain the HED tags. If ints then column numbers with [1] indicating only the second column has tags.
has_column_names (bool) – True if file has column names. Validation will skip over the first row. first line of the file if the spreadsheet as column names.
column_prefix_dictionary (dict or None) – Dictionary with keys that are column numbers/names and values are HED tag prefixes to prepend to the tags in that column before processing.
Notes
If file is a string, file_type is derived from file and this parameter is ignored.
column_prefix_dictionary may be deprecated/renamed. These are no longer prefixes, but rather converted to value columns. e.g. {“key”: “Description”, 1: “Label/”} will turn into value columns as {“key”: “Description/#”, 1: “Label/#”} It will be a validation issue if column 1 is called “key” in the above example. This means it no longer accepts anything but the value portion only in the columns.
- Raises:
HedFileError – for any of the following issues:
- The file is blank. –
- An invalid dataframe was passed with size 0. –
- An invalid extension was provided. –
- A duplicate or empty column name appears. –
- Cannot open the indicated file. –
- The specified worksheet name does not exist. –
TimeseriesInput¶
- class hed.models.timeseries_input.TimeseriesInput(file=None, sidecar=None, extra_def_dicts=None, name=None)[source]¶
Bases:
BaseInputA BIDS time series tabular file.
- HED_COLUMN_NAME = 'HED'¶
ColumnMapper¶
- class hed.models.column_mapper.ColumnMapper(sidecar=None, tag_columns=None, column_prefix_dictionary=None, optional_tag_columns=None, warn_on_missing_column=False)[source]¶
Bases:
objectTranslates tabular file columns into HED tag streams for validation and analysis.
ColumnMapperis the low-level engine behindTabularInputandSpreadsheetInput. It resolves column definitions from aSidecarand/or explicit parameters into a per-column transform pipeline that produces HED strings row-by-row.Use this class directly when you need to:
Build a custom tabular reader that doesn’t subclass
BaseInput.Inspect or override column mappings before validating (e.g. dynamic column selection at runtime).
Reuse a single mapper across many DataFrames for performance.
For the common case (reading a BIDS events file), prefer
TabularInputwhich wrapsColumnMapperautomatically.Notes
All column numbers are 0-based.
The
column_prefix_dictionaryparameter is treated as a shorthand for creating value columns:{"col": "Description"}becomes{"col": "Description/#"}internally.
- __init__(sidecar=None, tag_columns=None, column_prefix_dictionary=None, optional_tag_columns=None, warn_on_missing_column=False)[source]¶
Constructor for ColumnMapper.
- Parameters:
sidecar (Sidecar) – A sidecar to gather column data from.
tag_columns – (list): A list of ints or strings containing the columns that contain the HED tags. Sidecar column definitions will take precedent if there is a conflict with tag_columns.
column_prefix_dictionary (dict) – Dictionary with keys that are column numbers/names and values are HED tag prefixes to prepend to the tags in that column before processing.
optional_tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags. If the column is otherwise unspecified, convert this column type to HEDTags.
warn_on_missing_column (bool) – If True, issue mapping warnings on column names that are missing from the sidecar.
Notes
All column numbers are 0 based.
- The column_prefix_dictionary may be deprecated/renamed in the future.
These are no longer prefixes, but rather converted to value columns: {“key”: “Description”, 1: “Label/”} will turn into value columns as {“key”: “Description/#”, 1: “Label/#”} It will be a validation issue if column 1 is called “key” in the above example. This means it no longer accepts anything but the value portion only in the columns.
- property tag_columns¶
Return the known tag and optional tag columns with numbers as names when possible.
- property column_prefix_dictionary¶
Return the column_prefix_dictionary with numbers turned into names where possible.
- static check_for_blank_names(column_map, allow_blank_names) list[dict][source]¶
Validate there are no blank column names.
- property sidecar_column_data¶
Pass through to get the sidecar ColumnMetadata.
- Returns:
ColumnMetadata}): The column metadata defined by this sidecar.
- Return type:
dict({str
- get_tag_columns()[source]¶
Return the column numbers or names that are mapped to be HedTags.
Note: This is NOT the tag_columns or optional_tag_columns parameter, though they set it.
- Returns:
- A list of column numbers or names that are ColumnType.HedTags.
0-based if integer-based, otherwise column name.
- Return type:
column_identifiers(list)
- set_tag_columns(tag_columns=None, optional_tag_columns=None, finalize_mapping=True)[source]¶
Set tag columns and optional tag columns.
- Parameters:
tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags. If None, clears existing tag_columns
optional_tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags, but not an error if missing. If None, clears existing tag_columns
finalize_mapping (bool) – Re-generate the internal mapping if True, otherwise no effect until finalize.
- set_column_prefix_dictionary(column_prefix_dictionary, finalize_mapping=True)[source]¶
Set the column prefix dictionary.
- check_for_mapping_issues(allow_blank_names=False) list[dict][source]¶
Find all issues given the current column_map, tag_columns, etc.
- get_def_dict(hed_schema, extra_def_dicts=None) DefinitionDict[source]¶
Return def dicts from every column description.
- Parameters:
hed_schema (Schema) – A HED schema object to use for extracting definitions.
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts).
- Return type:
ColumnMetadata¶
- class hed.models.column_metadata.ColumnMetadata(column_type=None, name=None, source=None)[source]¶
Bases:
objectColumn in a ColumnMapper.
- __init__(column_type=None, name=None, source=None)[source]¶
A single column entry in the column mapper.
- Parameters:
column_type (ColumnType or None) – How to treat this column when reading data.
name (str, int, or None) – The column_name or column number identifying this column. If name is a string, you’ll need to use a column map to set the number later.
source (dict or str or None) – Either the entire loaded json sidecar or a single HED string.
- get_hed_strings() Series[source]¶
Return the HED strings for this entry as a series.
- Returns:
The HED strings for this series.(potentially empty).
- Return type:
pd.Series
ColumnType¶
- class hed.models.column_metadata.ColumnType(value)[source]¶
Bases:
EnumThe overall column_type of a column in column mapper, e.g. treat it as HED tags.
Mostly internal to column mapper related code
- Unknown = None¶
- Ignore = 'ignore'¶
- Categorical = 'categorical'¶
- Value = 'value'¶
- HEDTags = 'hed_tags'¶
Query models¶
Classes and functions for searching and querying HED annotations.
QueryHandler¶
- class hed.models.query_handler.QueryHandler(expression_string)[source]¶
Bases:
objectParse a search expression into a form than can be used to search a HED string.
- __init__(expression_string)[source]¶
Compiles a QueryHandler for a particular expression, so it can be used to search HED strings.
Basic Input Examples:
‘Event’ - Finds any strings with Event, or a descendent tag of Event such as Sensory-event.
‘Event && Action’ - Find any strings with Event and Action, including descendant tags.
‘Event || Action’ - Same as above, but it has either.
‘“Event”’ - Finds the Event tag, but not any descendent tags.
Def/DefName/* - Find Def/DefName instances with placeholders, regardless of the value of the placeholder.
‘Eve*’ - Find any short tags that begin with Eve*, such as Event, but not Sensory-event.
‘[Event && Action]’ - Find a group that contains both Event and Action(at any level).
‘{Event && Action}’ - Find a group with Event And Action at the same level.
‘{Event && Action:}’ - Find a group with Event And Action at the same level, and nothing else.
‘{Event && Action:Agent}’ - Find a group with Event And Action at the same level, and optionally an Agent tag.
Practical Complex Example:
- {(Onset || Offset), (Def || {Def-expand}): ???} - A group with an onset tag,
a def tag or def-expand group, and an optional wildcard group
- Parameters:
expression_string (str) – The query string.
SearchResult¶
- class hed.models.query_util.SearchResult(group, children)[source]¶
Bases:
objectHolder for and manipulation of search results.
Represents a query match result consisting of:
group: The containing HedGroup where matches were found.
children: The specific matched elements (tags/groups) within that group (NOT all children of the group — only those that satisfied the query).
Example: When searching for “Red” in the HED string “(Red, Blue, Green)”:
group = the containing group (Red, Blue, Green)
children = [Red] (only the matched tag)
- __init__(group, children)[source]¶
Initialize a search result.
- Parameters:
group (HedGroup) – The group where the children were found.
children (HedTag, HedGroup, or list) – The matched child elements (tags or groups) that satisfied the query condition. Can be: - Single tag/group that matched - List of tags/groups that matched - Empty list (for negation or when group matched but no specific children)
- merge_and_result(other)[source]¶
Returns a new result with the combined children from this and other.
- Parameters:
other (SearchResult) – Another search result to merge with this one.
- Returns:
A new SearchResult containing unique children from both results.
- Return type:
- Raises:
ValueError – If the groups are not the same.
- has_same_children(other)[source]¶
Checks if these two results have the same children by identity (not equality).
- Parameters:
other (SearchResult) – Another search result to compare with this one.
- Returns:
True if both results have the same group and identical children.
- Return type:
- has_same_tags(other)¶
Checks if these two results have the same children by identity (not equality).
- Parameters:
other (SearchResult) – Another search result to compare with this one.
- Returns:
True if both results have the same group and identical children.
- Return type:
get_query_handlers¶
- hed.models.query_service.get_query_handlers(queries, query_names=None) tuple[list[QueryHandler | None], list[QueryHandler | None], list][source]¶
Return a list of query handlers, query names, and issues if any.
- Parameters:
- Returns:
- A tuple containing:
list: QueryHandlers for successfully parsed queries or None.
list: str names to assign to results of the queries or None.
list: issues if any of the queries could not be parsed or other errors occurred.
- Return type:
search_hed_objs¶
- hed.models.query_service.search_hed_objs(hed_objs, queries, query_names) DataFrame[source]¶
Return a DataFrame of factors based on results of queries.
- Parameters:
- Returns:
Contains the factor vectors with results of the queries.
- Return type:
pd.DataFrame
- Raises:
ValueError – If query names are invalid or duplicated.
String-based search (experimental)¶
Warning
This facility is experimental. Its API (classes, function signatures, and
behaviour) may change in future releases without notice. Do not rely on
hed.models.string_search or hed.models.schema_lookup as a stable
public interface. Import directly from those sub-modules rather than from
the top-level hed package.
Search functions that operate on raw HED strings without requiring pre-parsed HedString objects
or a loaded schema. See also HED search details for a full comparison of all three
search implementations.
StringQueryHandler¶
- class hed.models.string_search.StringQueryHandler(expression_string)[source]¶
Bases:
QueryHandlerExecute HED queries against raw HED strings without requiring a schema.
Subclasses
QueryHandlerand reuses its tokeniser and expression-tree compiler unchanged. Onlysearch()is overridden to accept a raw string rather than aHedString.The compiled expression tree is evaluated against a
StringNodetree produced byparse_hed_string(). BecauseStringNodeduck-types theHedGroup/HedTaginterface expected by the expression classes, no changes toquery_expressionsare required.Ancestor search¶
Without a schema_lookup the system falls back to literal term matching (bare query
"Event"matches only the tag"event", not its descendants). Pass a lookup dict fromgenerate_schema_lookup()to enable full ancestor search on short-form strings.Example:
handler = StringQueryHandler("Event && Action") bool(handler.search("Event, Action")) # True (literal match)
- param expression_string:
The HED query expression — same syntax as
QueryHandler.- type expression_string:
str
- search(raw_string, schema_lookup=None)[source]¶
Search for the compiled query in a raw HED string.
- Parameters:
- Returns:
- List of
SearchResultobjects. Evaluate as a bool —
Truewhen at least one match was found.
- List of
- Return type:
StringNode¶
- class hed.models.string_search.StringNode(text=None, is_group=False, parent=None, depth=0, schema_lookup=None)[source]¶
Bases:
objectLightweight tree node representing a parsed fragment of a raw HED string.
A single
StringNodeacts as both a group (whenis_group=True, analogous toHedGroup) and a tag leaf (whenis_group=Falseand the node is a direct child of a group, analogous toHedTag). The root node is always treated as the top-level string container (analogous toHedString): it is never parenthesised (is_group=False) but still participates in group traversal.Duck-typing contract with the expression evaluation layer:
is_group— bool._parent— parentStringNodeorNone.children— direct child nodes (tags and groups).tags()— direct child tag (non-group leaf) nodes.groups()— direct child group nodes.get_all_groups()— all group-like nodes in the subtree, including self.get_all_tags()— all leaf tag nodes in the subtree.find_tags_with_term(term, ...)— ancestor-aware tag search.find_exact_tags(exact_tags, ...)— casefold-exact tag search.find_wildcard_tags(search_tags, ...)— prefix tag search.tag_terms— tuple of casefolded ancestry components (set on leaves).short_tag— casefolded tag text (set on leaves), used by wildcard search.
- Parameters:
text (str or None) – Casefolded tag text for leaf nodes;
Nonefor the root node.is_group (bool) –
Trueif this node represents a parenthesised group.parent (StringNode or None) – The parent node.
depth (int) – Nesting depth (root = 0).
schema_lookup (dict or None) – Optional schema lookup dict (
generate_schema_lookup()) used to populatetag_termsfor ancestor search.
- tags()[source]¶
Return direct child tag (leaf, non-group) nodes.
- Returns:
Direct children that are leaves.
- Return type:
- groups()[source]¶
Return direct child group nodes.
- Returns:
Direct children that are parenthesised groups.
- Return type:
- get_all_groups()[source]¶
Return all group-like nodes in this subtree, including self.
Mirrors
HedGroup.get_all_groups()which always includes the receiver (even when the receiver is aHedStringwithis_group=False).- Returns:
All group-like nodes, self first.
- Return type:
- get_all_tags()[source]¶
Return all leaf tag nodes in this subtree (depth-first).
- Returns:
All leaf nodes in the subtree.
- Return type:
- find_tags_with_term(term, recursive=False, include_groups=2)[source]¶
Find leaf tags whose
tag_termsinclude term (ancestor search).When no schema lookup was provided at parse time,
tag_termsfor a leaf is derived from the slash-separated components of the tag text, so long-form strings give ancestor search for free; short-form strings produce literal matching only.- Parameters:
- Returns:
Depends on include_groups.
- Return type:
- find_exact_tags(exact_tags, recursive=False, include_groups=1)[source]¶
Find leaf tags whose casefolded text exactly matches any entry in exact_tags.
- find_wildcard_tags(search_tags, recursive=False, include_groups=2)[source]¶
Find leaf tags whose
short_tagstarts with any entry in search_tags.
- __eq__(other)[source]¶
Compare with another StringNode or a string.
When other is a string, compares
self.text(already casefolded) againstother.casefold(). This mirrorsHedTag.__eq__and is required so thattag in exact_tagsworks infind_exact_tags().- Parameters:
other (StringNode or str) – The value to compare against.
- Returns:
True if equal.
- Return type:
parse_hed_string¶
- hed.models.string_search.parse_hed_string(raw_string, schema_lookup=None)[source]¶
Parse a raw HED string into a
StringNodetree.Uses
split_hed_string()to tokenise the input (the same splitter used by the full HED parser) and builds a lightweightStringNodetree without constructing anyHedTagorHedGroupobjects.The root node is a non-parenthesised container (
is_group=False) that mirrorsHedString. Parenthesised sub-groups become childStringNodeinstances withis_group=True. Individual tag strings become leafStringNodeinstances with their text stored casefolded.- Parameters:
raw_string (str) – A raw HED string such as
"(Red, Square), Blue".schema_lookup (dict or None) – Optional mapping produced by
generate_schema_lookup(). When provided, leaftag_termsare populated from the lookup, enabling ancestor search on short-form strings.
- Returns:
Root node of the parsed tree.
- Return type:
Notes
Malformed strings (unbalanced parentheses) produce a partial tree; no exception is raised at parse time — mirroring HedString behaviour.
Whitespace is stripped from tag text. Empty tag tokens are ignored.
string_search¶
- hed.models.string_search.string_search(strings, query, schema_lookup=None)[source]¶
Search a list of HED strings using a query expression.
Compiles the query once and applies it to every element, returning a list of booleans.
None,float('nan'), and empty strings evaluate toFalse.- Parameters:
query (str) – A HED query expression (same syntax as
QueryHandler).schema_lookup (dict or None) – Optional schema lookup dict for ancestor search; see
generate_schema_lookup().
- Returns:
One boolean per input string.
- Return type:
Example:
from hed.models.string_search import string_search mask = string_search(events["HED"].tolist(), "Sensory-event") matching_rows = [row for row, m in zip(events.itertuples(), mask) if m]
Schema lookup utilities¶
Pre-generate and persist a tag-ancestor lookup dictionary from a HedSchema
for use with StringQueryHandler.
- hed.models.schema_lookup.generate_schema_lookup(schema)[source]¶
Build a schema lookup table mapping short tag names to their
tag_terms.Walks the tags section of schema (or all component schemas in a
HedSchemaGroup) and collects each tag’stag_termstuple, keyed by the tag’s casefolded short name.- Parameters:
schema (HedSchema or HedSchemaGroup) – The loaded HED schema.
- Returns:
- Mapping
short_tag_casefold→ tag_termstuple as stored in the schema entry.
- Mapping
- Return type:
Notes
Tags whose
/#value placeholder end-entry is skipped (they share the parent tag’s short name with a trailing/#which is already stripped by the schema loader).For
HedSchemaGroup, all member schemas are merged; later schemas overwrite earlier ones on key collision.Library namespace prefixes (e.g.
"sc:"in"sc:Event") are not stripped — include the namespace when searching if needed.
- hed.models.schema_lookup.save_schema_lookup(lookup, path)[source]¶
Serialise a schema lookup dict to a JSON file.
Values (tuples) are saved as JSON arrays and restored as tuples on load.
- Parameters:
lookup (dict[str, tuple]) – The lookup dict from
generate_schema_lookup().path (str or Path) – Destination file path.
DataFrame utilities¶
Functions for transforming HED strings within pandas DataFrames.
convert_to_form¶
expand_defs¶
- hed.models.df_util.expand_defs(df, hed_schema, def_dict, columns=None)[source]¶
Expands any def tags found in the dataframe.
Converts in place
- Parameters:
df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema or None) – The schema to use to identify defs.
def_dict (DefinitionDict) – The definitions to expand.
columns (list or None) – The columns to modify on the dataframe.
shrink_defs¶
process_def_expands¶
- hed.models.df_util.process_def_expands(hed_strings, hed_schema, known_defs=None, ambiguous_defs=None) tuple[DefinitionDict, dict, dict][source]¶
Gather def-expand tags in the strings/compare with known definitions to find any differences.
- Parameters:
hed_strings (list or pd.Series) – A list of HED strings to process.
hed_schema (HedSchema) – The schema to use.
known_defs (DefinitionDict or list or str or None) – A DefinitionDict or anything its constructor takes. These are the known definitions going in, that must match perfectly.
ambiguous_defs (dict) – A dictionary containing ambiguous definitions. format TBD. Currently def name key: list of lists of HED tags values
- Returns:
- A tuple containing the DefinitionDict, ambiguous definitions, and a
dictionary of error lists keyed by definition name
- Return type:
tuple [DefinitionDict, dict, dict]
sort_dataframe_by_onsets¶
filter_series_by_onset¶
- hed.models.df_util.filter_series_by_onset(series, onsets)[source]¶
Return the series, with rows that have the same onset combined.
- Parameters:
series (pd.Series or pd.Dataframe) – The series to filter. If dataframe, it filters the “HED” column.
onsets (pd.Series) – The onset column to filter by.
- Returns:
the series with rows filtered together.
- Return type:
Union[Series, Dataframe]