Validator

Validation tools for HED data structures and annotations.

Core validator classes

HedValidator

class HedValidator(hed_schema, def_dicts=None, definitions_allowed=False)[source]

Bases: object

Top level validation of HED strings.

This module contains the HedValidator class which is used to validate the tags in a HED string or a file. The file types include .tsv, .txt, and .xlsx. To get the validation issues after creating a HedValidator class call the get_validation_issues() function.

check_tag_formatting(original_tag) list[dict][source]

Report repeated or erroneous slashes.

Parameters:

original_tag (HedTag) – The original tag that is used to report the error.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list[dict]

pattern_doubleslash = re.compile('([ \\t/]{2,}|^/|/$)')
run_basic_checks(hed_string, allow_placeholders) list[dict][source]

Run basic validation checks on a HED string.

Parameters:
  • hed_string (HedString) – The HED string to validate.

  • allow_placeholders (bool) – Whether placeholders are allowed in the HED string.

Returns:

A list of issues found during validation. Each issue is represented as a dictionary.

Return type:

list[dict]

Notes

  • This method performs initial validation checks on the HED string, including character validation and tag validation.

  • It checks for invalid characters, calculates canonical forms, and validates individual tags.

  • If any issues are found during these checks, the method stops and returns the issues immediately.

  • The method also validates definition tags if applicable.

run_full_string_checks(hed_string) list[dict][source]

Run all full-string validation checks on a HED string.

Parameters:

hed_string (HedString) – The HED string to validate.

Returns:

A list of issues found during validation. Each issue is represented as a dictionary.

Return type:

list[dict]

Notes

  • This method iterates through a series of validation checks defined in the checks list.

  • Each check is a callable function that takes hed_string as input and returns a list of issues.

  • If any check returns issues, the method stops and returns those issues immediately.

  • If no issues are found, an empty list is returned.

validate(hed_string, allow_placeholders, error_handler=None) list[dict][source]

Validate the HED string object using the schema.

Parameters:
  • hed_string (HedString) – the string to validate.

  • allow_placeholders (bool) – allow placeholders in the string.

  • error_handler (ErrorHandler or None) – the error handler to use, creates a default one if none passed.

Returns:

A list of issues for HED string.

Return type:

list[dict]

validate_units(original_tag, validate_text=None, report_as=None, error_code=None, index_offset=0, allow_placeholders=True) list[dict][source]

Validate units and value classes

Parameters:
  • original_tag (HedTag) – The source tag

  • validate_text (str) – the text we want to validate, if not the full extension.

  • report_as (HedTag) – Report the error tag as coming from a different one. Mostly for definitions that expand.

  • error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.

  • index_offset (int) – Offset into the extension validate_text starts at

  • allow_placeholders (bool) – Whether placeholders are allowed (affects value class validation for “#”)

Returns:

Issues found from units

Return type:

list[dict]

Specialized validators

SidecarValidator

class SidecarValidator(hed_schema)[source]

Bases: object

Validates HED annotations in a BIDS JSON sidecar against a HED schema.

reserved_category_values = ['n/a']
reserved_column_names = ['HED']
validate(sidecar, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]

Validate the input data using the schema

Parameters:
  • sidecar (Sidecar) – Input data to be validated.

  • extra_def_dicts (list or DefinitionDict) – extra def dicts in addition to sidecar

  • name (str) – The name to report this sidecar as

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues associated with each level in the HED string.

Return type:

list[dict]

validate_structure(sidecar, error_handler) list[dict][source]

Validate the raw structure of this sidecar.

Parameters:
  • sidecar (Sidecar) – the sidecar to validate

  • error_handler (ErrorHandler) – The error handler to use for error context.

Returns:

A list of issues found with the structure.

Return type:

list[dict]

SpreadsheetValidator

class SpreadsheetValidator(hed_schema)[source]

Bases: object

Validates HED annotations in a tabular (TSV/Excel) spreadsheet against a HED schema.

ONSET_TOLERANCE = 1e-07
TEMPORAL_ANCHORS = re.compile('onset|inset|offset|delay')
validate(data, def_dicts=None, name=None, error_handler=None) list[dict][source]

Validate the input data using the schema

Parameters:
  • data (BaseInput) – Input data to be validated.

  • def_dicts (list of DefDict or DefDict) – all definitions to use for validation

  • name (str) – The name to report errors from this file as

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for HED string

Return type:

list[dict]

DefValidator

class DefValidator(def_dicts=None, hed_schema=None)[source]

Bases: DefinitionDict

Validates Def/ and Def-expand/, as well as Temporal groups: Onset, Inset, and Offset

add_definitions(defs, hed_schema=None)

Add definitions from dict(s) or strings(s) to this dict.

Parameters:
  • defs (list, DefinitionDict, dict, or str) – DefinitionDict or list of DefinitionDicts/strings/dicts whose definitions should be added.

  • hed_schema (HedSchema or None) – Required if passing strings or lists of strings, unused otherwise.

Note - dict form expects DefinitionEntries in the same form as a DefinitionDict

Note - str or list of strings will parse the strings using the hed_schema. Note - You can mix and match types, eg [DefinitionDict, str, list of str] would be valid input.

Raises:

TypeError – Bad type passed as defs.

check_for_definitions(hed_string_obj, error_handler=None) list[dict]

Check string for definition tags, adding them to self.

Parameters:
  • hed_string_obj (HedString) – A single HED string to gather definitions from.

  • error_handler (ErrorHandler or None) – Error context used to identify where definitions are found.

Returns:

List of issues encountered in checking for definitions. Each issue is a dictionary.

Return type:

list[dict]

get(def_name) DefinitionEntry | None

Get the definition entry for the definition name.

Not case-sensitive

Parameters:

def_name (str) – Name of the definition to retrieve.

Returns:

Definition entry for the requested definition.

Return type:

Union[DefinitionEntry, None]

static get_as_strings(def_dict) dict[str, str]

Convert the entries to strings of the contents

Parameters:

def_dict (dict) – A dict of definitions

Returns:

Definition name and contents

Return type:

dict[str,str]

get_definition_entry(def_tag)

Get the entry for a given def tag.

Does not validate at all.

Parameters:

def_tag (HedTag) – Source HED tag that may be a Def or Def-expand tag.

Returns:

The definition entry if it exists

Return type:

def_entry(DefinitionEntry or None)

property issues

Return issues about duplicate definitions.

items()

Return the dictionary of definitions.

Alias for .defs.items()

Returns:

DefinitionEntry}): A list of definitions.

Return type:

def_entries({str

validate_def_tags(hed_string_obj) list[dict][source]

Validate Def/Def-Expand tags.

Parameters:
  • hed_string_obj (HedString) – The HED string to process.

  • hed_validator (HedValidator) – Used to validate the placeholder replacement.

Returns:

Issues found related to validating defs. Each issue is a dictionary.

Return type:

list[dict]

validate_def_value_units(def_tag, hed_validator, allow_placeholders=False) list[dict][source]

Equivalent to HedValidator.validate_units for the special case of a Def or Def-expand tag

validate_onset_offset(hed_string_obj) list[dict][source]

Validate onset/offset

Parameters:

hed_string_obj (HedString) – The HED string to check.

Returns:

A list of issues found in validating onsets (i.e., out of order onsets, unknown def names).

Return type:

list[dict]

OnsetValidator

class OnsetValidator[source]

Bases: object

Validates onset/offset pairs.

static check_for_banned_tags(hed_string) list[dict][source]

Returns an issue for every tag found from the banned list (for files without onset column).

Parameters:

hed_string (HedString) – The string to check.

Returns:

The validation issues associated with the characters. Each issue is dictionary.

Return type:

list[dict]

validate_temporal_relations(hed_string_obj) list[dict][source]

Validate onset/offset/inset tag relations

Parameters:

hed_string_obj (HedString) – The HED string to check.

Returns:

A list of issues found in validating onsets (i.e., out of order onsets, repeated def names).

Return type:

list[dict]

ReservedChecker

class ReservedChecker[source]

Bases: object

Thread-safe singleton that loads reserved tag rules and checks groups for compliance.

check_reserved_compatibility(group, reserved_tags)[source]

Check to make sure that the reserved tags can be used together and no duplicates.

Parameters:
  • group (HedTagGroup) – A group to be checked.

  • reserved_tags (list of HedTag) – A list of reserved tags in this group.

check_tag_requirements(group, reserved_tags)[source]

Check the tag requirements within the group.

Parameters:
  • group (HedTagGroup) – A group to be checked.

  • reserved_tags (list of HedTag) – A list of reserved tags in this group.

Notes: This is only called when there are some reserved incompatible tags.

get_def_information(group, reserved_tags) list[list][source]

Get definition information for reserved tags.

Parameters:
  • group (HedGroup) – The HED group to check.

  • reserved_tags (list of HedTag) – The reserved tags to process.

Returns:

A list containing [requires_defs, defs].

Return type:

list[list]

get_group_requirements(reserved_tags) tuple[float, float][source]

Returns the maximum and minimum number of groups required for these reserved tags.

Parameters:

reserved_tags (list of HedTag) – The reserved tags to be checked.

Returns:

the maximum required and the minimum required.

Return type:

tuple[float, float]

get_incompatible(tag, reserved_tags) list[source]

Return the list of tags that cannot be in the same group with tag.

Parameters:
  • tag (HedTag) – Reserved tag to be tested.

  • reserved_tags (list of HedTag) – Reserved tags (no duplicates).

Returns:

List of incompatible tags.

Return type:

list[HedTag]

static get_instance()[source]

Return the singleton ReservedChecker instance, creating it on first call.

Returns:

The shared singleton instance.

Return type:

ReservedChecker

get_reserved(group)[source]

Return the list of reserved tags found directly within the given HED group.

Parameters:

group (HedGroup) – The group to inspect.

Returns:

Tags in the group whose short base tag is a reserved name.

Return type:

list[HedTag]

reserved_reqs_path = '/home/runner/work/hed-resources/hed-resources/submodules/hed-python/hed/validator/data/reservedTags.json'

Validator utilities

CharValidator

class CharValidator(modern_allowed_char_rules=False)[source]

Bases: object

Class responsible for basic character level validation of a string or tag.

DEFAULT_ALLOWED_PLACEHOLDER_CHARS = '.+-^ _#'
INVALID_STRING_CHARS = '[]{}~'
INVALID_STRING_CHARS_PLACEHOLDERS = '[]~'
TAG_ALLOWED_CHARS = '-_/'
check_for_invalid_extension_chars(original_tag, validate_text, error_code=None, index_offset=0) list[dict][source]

Report invalid characters in extension/value.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • validate_text (str) – the text we want to validate, if not the full extension.

  • error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.

  • index_offset (int) – Offset into the extension validate_text starts at.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

check_invalid_character_issues(hed_string, allow_placeholders) list[dict][source]

Report invalid characters.

Parameters:
  • hed_string (str) – A HED string.

  • allow_placeholders (bool) – Allow placeholder and curly brace characters.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

Notes

  • Invalid tag characters are defined by self.INVALID_STRING_CHARS or

    self.INVALID_STRING_CHARS_PLACEHOLDERS

check_tag_invalid_chars(original_tag, allow_placeholders) list[dict][source]

Report invalid characters in the given tag.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • allow_placeholders (bool) – Allow placeholder characters(#) if True.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

CharRexValidator

class CharRexValidator(modern_allowed_char_rules=False)[source]

Bases: CharValidator

Class responsible for basic character level validation of a string or tag.

DEFAULT_ALLOWED_PLACEHOLDER_CHARS = '.+-^ _#'
INVALID_STRING_CHARS = '[]{}~'
INVALID_STRING_CHARS_PLACEHOLDERS = '[]~'
TAG_ALLOWED_CHARS = '-_/'
check_for_invalid_extension_chars(original_tag, validate_text, error_code=None, index_offset=0) list[dict]

Report invalid characters in extension/value.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • validate_text (str) – the text we want to validate, if not the full extension.

  • error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.

  • index_offset (int) – Offset into the extension validate_text starts at.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

check_invalid_character_issues(hed_string, allow_placeholders) list[dict]

Report invalid characters.

Parameters:
  • hed_string (str) – A HED string.

  • allow_placeholders (bool) – Allow placeholder and curly brace characters.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

Notes

  • Invalid tag characters are defined by self.INVALID_STRING_CHARS or

    self.INVALID_STRING_CHARS_PLACEHOLDERS

check_tag_invalid_chars(original_tag, allow_placeholders) list[dict]

Report invalid characters in the given tag.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • allow_placeholders (bool) – Allow placeholder characters(#) if True.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

get_problem_chars(in_str, cname)[source]

Return a list of (index, char) pairs for characters in in_str not allowed by the value class cname.

Parameters:
  • in_str (str) – The string to check.

  • cname (str) – The value class name used to look up allowed character classes.

Returns:

Each tuple contains the character index and the offending character.

Return type:

list[tuple[int, str]]

is_valid_value(in_string, cname)[source]

Check whether in_string is a valid whole-word value for class cname.

Parameters:
  • in_string (str) – The string to validate.

  • cname (str) – The value class name to look up the word-level regex for.

Returns:

  • True if no word-level regex is defined for cname (class imposes no constraint).

  • A re.Match object if in_string matches the word-level regex (valid value).

  • False if in_string does not match the word-level regex (invalid value).

Return type:

True | re.Match | False

UnitValueValidator

class UnitValueValidator(modern_allowed_char_rules=False, value_validators=None)[source]

Bases: object

Validates units.

DATE_TIME_VALUE_CLASS = 'dateTimeClass'
DIGIT_OR_POUND_EXPRESSION = '^(-?[\\d.]+(?:e-?\\d+)?|#)$'
NAME_VALUE_CLASS = 'nameClass'
NUMERIC_VALUE_CLASS = 'numericClass'
TEXT_VALUE_CLASS = 'textClass'
check_tag_unit_class_units_are_valid(original_tag, validate_text, report_as=None, error_code=None, allow_placeholders=True) list[dict][source]

Report incorrect unit class or units.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • validate_text (str) – The text to validate.

  • report_as (HedTag) – Report errors as coming from this tag, rather than original_tag.

  • error_code (str) – Override error codes.

  • allow_placeholders (bool) – Whether placeholders are allowed (affects value class validation for “#”)

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

check_tag_value_class_valid(original_tag, validate_text, report_as=None) list[dict][source]

Report an invalid value portion.

Parameters:
  • original_tag (HedTag) – The original tag that is used to report the error.

  • validate_text (str) – The text to validate.

  • report_as (HedTag) – Report errors as coming from this tag, rather than original_tag.

Returns:

Validation issues.

Return type:

list

static report_value_char_errors(class_name, errors, report_as)[source]

Build validation issues for specific invalid characters within a value class string.

Parameters:
  • class_name (str) – The value class name that detected the errors.

  • errors (list[tuple[str, int]]) – Character/index pairs of invalid characters.

  • report_as (HedTag) – The tag object used as context in error reporting.

Returns:

Validation issue dictionaries.

Return type:

list[dict]

static report_value_errors(error_dict, class_valid, report_as)[source]

Build validation issues from per-class character error and validity dicts.

Parameters:
  • error_dict (dict) – Mapping of class name to list of (char, index) problem tuples.

  • class_valid (dict) – Mapping of class name to a validity result (True, re.Match, or False) indicating whether the full value passed word-level format validation for that class.

  • report_as (HedTag) – The tag object used as context in error reporting.

Returns:

Validation issue dictionaries.

Return type:

list[dict]

validate_value_class_type(unit_or_value_portion, valid_types) bool[source]

Report invalid unit or valid class values.

Parameters:
  • unit_or_value_portion (str) – The value portion to validate.

  • valid_types (list) – The names of value class or unit class types (e.g. dateTime or dateTimeClass).

Returns:

True if this is one of the valid_types validators.

Return type:

bool

DuplicateChecker

class DuplicateChecker[source]

Bases: object

Detects duplicate tags and groups within a HED annotation.

check_for_duplicates(group) list[dict][source]

Find duplicates in a HED group and return the errors found.

Parameters:

group (HedGroup) – The HED group to be checked.

Returns:

List of validation issues – which might be empty if no duplicates detected.

Return type:

list

get_hash(group) int | None[source]

Return the unique hash for the group as long as no duplicates.

Parameters:

group (HedGroup) – The HED group to be checked.

Returns:

Unique hash or None if duplicates were detected within the group.

Return type:

Union[int, None]

Note: As a side effect, this method will clear the issues list if no duplicates are found.

GroupValidator

class GroupValidator(hed_schema)[source]

Bases: object

Validation for attributes across groups HED tags.

This is things like Required, Unique, top level tags, etc.

check_for_required_tags(tags) list[source]

Report missing required tags.

Parameters:

tags (list) – HedTags containing the tags.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

check_multiple_unique_tags_exist(tags) list[source]

Report if multiple identical unique tags exist

A unique Term can only appear once in a given HedString. Unique terms are terms with the ‘unique’ property in the schema.

Parameters:

tags (list) – HedTags containing the tags.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

static check_tag_level_issue(original_tag_list, is_top_level, is_group) list[source]

Report tags incorrectly positioned in hierarchy.

Parameters:
  • original_tag_list (list of HedTag) – HedTags containing the original tags.

  • is_top_level (bool) – If True, this group is a “top level tag group”.

  • is_group (bool) – If True group should be contained by parenthesis.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

run_all_tags_validators(hed_string_obj) list[dict][source]

Report invalid the multi-tag properties in a HED string, e.g. required tags.

Parameters:

hed_string_obj (HedString) – A HedString object.

Returns:

The issues associated with the tags in the HED string. Each issue is a dictionary.

Return type:

list

run_tag_level_validators(hed_string_obj) list[dict][source]

Report invalid groups at each level.

Parameters:

hed_string_obj (HedString) – A HedString object.

Returns:

Issues associated with each level in the HED string. Each issue is a dictionary.

Return type:

list

Notes

  • This pertains to the top-level, all groups, and nested groups.

static validate_duration_tags(hed_string_obj) list[source]

Validate Duration/Delay tag groups

Parameters:

hed_string_obj (HedString) – The HED string to check.

Returns:

Issues found in validating durations (i.e., extra tags or groups present, or a group missing)

Return type:

list

StringValidator

class StringValidator[source]

Bases: object

Runs checks on the raw string that depend on multiple characters, e.g. mismatched parentheses

CLOSING_GROUP_CHARACTER = ')'
COMMA = ','
OPENING_GROUP_CHARACTER = '('
static check_count_tag_group_parentheses(hed_string) list[dict][source]

Report unmatched parentheses.

Parameters:

hed_string (str) – A HED string.

Returns:

A list of validation list. Each issue is a dictionary.

Return type:

list

check_delimiter_issues_in_hed_string(hed_string) list[dict][source]

Report missing commas or commas in value tags.

Parameters:

hed_string (str) – A HED string.

Returns:

A validation issues list. Each issue is a dictionary.

Return type:

list

run_string_validator(hed_string_obj)[source]

Run all string-level structural checks on a HED string object.

Parameters:

hed_string_obj (HedString) – The parsed HED string to validate.

Returns:

Validation issue dictionaries.

Return type:

list[dict]

TagValidator

class TagValidator[source]

Bases: object

Validation for individual HED tags.

CAMEL_CASE_EXPRESSION = '([A-Z]+\\s*[a-z-]*)+'
check_capitalization(original_tag) list[dict][source]

Report warning if incorrect tag capitalization.

Parameters:

original_tag (HedTag) – The original tag used to report the warning.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

static check_for_placeholder(original_tag, is_definition=False) list[dict][source]

Report invalid placeholder characters.

Parameters:
  • original_tag (HedTag) – The HedTag to be checked

  • is_definition (bool) – If True, placeholders are allowed.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

Notes

  • Invalid placeholder may appear in the extension/value portion of a tag.

static check_tag_exists_in_schema(original_tag) list[dict][source]

Report invalid tag or doesn’t take a value.

Parameters:

original_tag (HedTag) – The original tag that is used to report the error.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

check_tag_is_deprecated(original_tag) list[dict][source]

Return a validation issue if the tag carries the DeprecatedFrom attribute.

Parameters:

original_tag (HedTag) – The tag to check.

Returns:

A singleton list with a deprecation issue, or an empty list.

Return type:

list[dict]

static check_tag_requires_child(original_tag) list[dict][source]

Report if tag is a leaf with ‘requiredTag’ attribute.

Parameters:

original_tag (HedTag) – The original tag that is used to report the error.

Returns:

Validation issues. Each issue is a dictionary.

Return type:

list

run_individual_tag_validators(original_tag, allow_placeholders=False, is_definition=False) list[dict][source]

Runs the validators on the individual tags.

This ignores most illegal characters except in extensions.

Parameters:
  • original_tag (HedTag) – A original tag.

  • allow_placeholders (bool) – Allow value class or extensions to be placeholders rather than a specific value.

  • is_definition (bool) – This tag is part of a Definition, not a normal line.

Returns:

The validation issues associated with the tags. Each issue is dictionary.

Return type:

list