Validator¶
Validation tools for HED data structures and annotations.
Core validator classes¶
HedValidator¶
- class HedValidator(hed_schema, def_dicts=None, definitions_allowed=False)[source]¶
Bases:
objectTop level validation of HED strings.
This module contains the HedValidator class which is used to validate the tags in a HED string or a file. The file types include .tsv, .txt, and .xlsx. To get the validation issues after creating a HedValidator class call the get_validation_issues() function.
- pattern_doubleslash = re.compile('([ \\t/]{2,}|^/|/$)')¶
- run_basic_checks(hed_string, allow_placeholders) list[dict][source]¶
Run basic validation checks on a HED string.
- Parameters:
- Returns:
A list of issues found during validation. Each issue is represented as a dictionary.
- Return type:
Notes
This method performs initial validation checks on the HED string, including character validation and tag validation.
It checks for invalid characters, calculates canonical forms, and validates individual tags.
If any issues are found during these checks, the method stops and returns the issues immediately.
The method also validates definition tags if applicable.
- run_full_string_checks(hed_string) list[dict][source]¶
Run all full-string validation checks on a HED string.
- Parameters:
hed_string (HedString) – The HED string to validate.
- Returns:
A list of issues found during validation. Each issue is represented as a dictionary.
- Return type:
Notes
This method iterates through a series of validation checks defined in the checks list.
Each check is a callable function that takes hed_string as input and returns a list of issues.
If any check returns issues, the method stops and returns those issues immediately.
If no issues are found, an empty list is returned.
- validate(hed_string, allow_placeholders, error_handler=None) list[dict][source]¶
Validate the HED string object using the schema.
- Parameters:
hed_string (HedString) – the string to validate.
allow_placeholders (bool) – allow placeholders in the string.
error_handler (ErrorHandler or None) – the error handler to use, creates a default one if none passed.
- Returns:
A list of issues for HED string.
- Return type:
- validate_units(original_tag, validate_text=None, report_as=None, error_code=None, index_offset=0, allow_placeholders=True) list[dict][source]¶
Validate units and value classes
- Parameters:
original_tag (HedTag) – The source tag
validate_text (str) – the text we want to validate, if not the full extension.
report_as (HedTag) – Report the error tag as coming from a different one. Mostly for definitions that expand.
error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.
index_offset (int) – Offset into the extension validate_text starts at
allow_placeholders (bool) – Whether placeholders are allowed (affects value class validation for “#”)
- Returns:
Issues found from units
- Return type:
Specialized validators¶
SidecarValidator¶
- class SidecarValidator(hed_schema)[source]¶
Bases:
objectValidates HED annotations in a BIDS JSON sidecar against a HED schema.
- reserved_category_values = ['n/a']¶
- reserved_column_names = ['HED']¶
- validate(sidecar, extra_def_dicts=None, name=None, error_handler=None) list[dict][source]¶
Validate the input data using the schema
- Parameters:
sidecar (Sidecar) – Input data to be validated.
extra_def_dicts (list or DefinitionDict) – extra def dicts in addition to sidecar
name (str) – The name to report this sidecar as
error_handler (ErrorHandler) – Error context to use. Creates a new one if None.
- Returns:
A list of issues associated with each level in the HED string.
- Return type:
- validate_structure(sidecar, error_handler) list[dict][source]¶
Validate the raw structure of this sidecar.
- Parameters:
sidecar (Sidecar) – the sidecar to validate
error_handler (ErrorHandler) – The error handler to use for error context.
- Returns:
A list of issues found with the structure.
- Return type:
SpreadsheetValidator¶
- class SpreadsheetValidator(hed_schema)[source]¶
Bases:
objectValidates HED annotations in a tabular (TSV/Excel) spreadsheet against a HED schema.
- ONSET_TOLERANCE = 1e-07¶
- TEMPORAL_ANCHORS = re.compile('onset|inset|offset|delay')¶
- validate(data, def_dicts=None, name=None, error_handler=None) list[dict][source]¶
Validate the input data using the schema
- Parameters:
data (BaseInput) – Input data to be validated.
def_dicts (list of DefDict or DefDict) – all definitions to use for validation
name (str) – The name to report errors from this file as
error_handler (ErrorHandler) – Error context to use. Creates a new one if None.
- Returns:
A list of issues for HED string
- Return type:
DefValidator¶
- class DefValidator(def_dicts=None, hed_schema=None)[source]¶
Bases:
DefinitionDictValidates Def/ and Def-expand/, as well as Temporal groups: Onset, Inset, and Offset
- add_definitions(defs, hed_schema=None)¶
Add definitions from dict(s) or strings(s) to this dict.
- Parameters:
defs (list, DefinitionDict, dict, or str) – DefinitionDict or list of DefinitionDicts/strings/dicts whose definitions should be added.
hed_schema (HedSchema or None) – Required if passing strings or lists of strings, unused otherwise.
- Note - dict form expects DefinitionEntries in the same form as a DefinitionDict
Note - str or list of strings will parse the strings using the hed_schema. Note - You can mix and match types, eg [DefinitionDict, str, list of str] would be valid input.
- Raises:
TypeError – Bad type passed as defs.
- check_for_definitions(hed_string_obj, error_handler=None) list[dict]¶
Check string for definition tags, adding them to self.
- Parameters:
hed_string_obj (HedString) – A single HED string to gather definitions from.
error_handler (ErrorHandler or None) – Error context used to identify where definitions are found.
- Returns:
List of issues encountered in checking for definitions. Each issue is a dictionary.
- Return type:
- get(def_name) DefinitionEntry | None¶
Get the definition entry for the definition name.
Not case-sensitive
- Parameters:
def_name (str) – Name of the definition to retrieve.
- Returns:
Definition entry for the requested definition.
- Return type:
Union[DefinitionEntry, None]
- get_definition_entry(def_tag)¶
Get the entry for a given def tag.
Does not validate at all.
- Parameters:
def_tag (HedTag) – Source HED tag that may be a Def or Def-expand tag.
- Returns:
The definition entry if it exists
- Return type:
def_entry(DefinitionEntry or None)
- property issues¶
Return issues about duplicate definitions.
- items()¶
Return the dictionary of definitions.
Alias for .defs.items()
- Returns:
DefinitionEntry}): A list of definitions.
- Return type:
def_entries({str
- validate_def_tags(hed_string_obj) list[dict][source]¶
Validate Def/Def-Expand tags.
- Parameters:
hed_string_obj (HedString) – The HED string to process.
hed_validator (HedValidator) – Used to validate the placeholder replacement.
- Returns:
Issues found related to validating defs. Each issue is a dictionary.
- Return type:
OnsetValidator¶
ReservedChecker¶
- class ReservedChecker[source]¶
Bases:
objectThread-safe singleton that loads reserved tag rules and checks groups for compliance.
- check_reserved_compatibility(group, reserved_tags)[source]¶
Check to make sure that the reserved tags can be used together and no duplicates.
- check_tag_requirements(group, reserved_tags)[source]¶
Check the tag requirements within the group.
- Parameters:
Notes: This is only called when there are some reserved incompatible tags.
- get_def_information(group, reserved_tags) list[list][source]¶
Get definition information for reserved tags.
- get_group_requirements(reserved_tags) tuple[float, float][source]¶
Returns the maximum and minimum number of groups required for these reserved tags.
- get_incompatible(tag, reserved_tags) list[source]¶
Return the list of tags that cannot be in the same group with tag.
- static get_instance()[source]¶
Return the singleton ReservedChecker instance, creating it on first call.
- Returns:
The shared singleton instance.
- Return type:
- get_reserved(group)[source]¶
Return the list of reserved tags found directly within the given HED group.
- reserved_reqs_path = '/home/runner/work/hed-resources/hed-resources/submodules/hed-python/hed/validator/data/reservedTags.json'¶
Validator utilities¶
CharValidator¶
- class CharValidator(modern_allowed_char_rules=False)[source]¶
Bases:
objectClass responsible for basic character level validation of a string or tag.
- DEFAULT_ALLOWED_PLACEHOLDER_CHARS = '.+-^ _#'¶
- INVALID_STRING_CHARS = '[]{}~'¶
- INVALID_STRING_CHARS_PLACEHOLDERS = '[]~'¶
- TAG_ALLOWED_CHARS = '-_/'¶
- check_for_invalid_extension_chars(original_tag, validate_text, error_code=None, index_offset=0) list[dict][source]¶
Report invalid characters in extension/value.
- Parameters:
original_tag (HedTag) – The original tag that is used to report the error.
validate_text (str) – the text we want to validate, if not the full extension.
error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.
index_offset (int) – Offset into the extension validate_text starts at.
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
- check_invalid_character_issues(hed_string, allow_placeholders) list[dict][source]¶
Report invalid characters.
- Parameters:
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
Notes
- Invalid tag characters are defined by self.INVALID_STRING_CHARS or
self.INVALID_STRING_CHARS_PLACEHOLDERS
CharRexValidator¶
- class CharRexValidator(modern_allowed_char_rules=False)[source]¶
Bases:
CharValidatorClass responsible for basic character level validation of a string or tag.
- DEFAULT_ALLOWED_PLACEHOLDER_CHARS = '.+-^ _#'¶
- INVALID_STRING_CHARS = '[]{}~'¶
- INVALID_STRING_CHARS_PLACEHOLDERS = '[]~'¶
- TAG_ALLOWED_CHARS = '-_/'¶
- check_for_invalid_extension_chars(original_tag, validate_text, error_code=None, index_offset=0) list[dict]¶
Report invalid characters in extension/value.
- Parameters:
original_tag (HedTag) – The original tag that is used to report the error.
validate_text (str) – the text we want to validate, if not the full extension.
error_code (str) – The code to override the error as. Again mostly for def/def-expand tags.
index_offset (int) – Offset into the extension validate_text starts at.
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
- check_invalid_character_issues(hed_string, allow_placeholders) list[dict]¶
Report invalid characters.
- Parameters:
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
Notes
- Invalid tag characters are defined by self.INVALID_STRING_CHARS or
self.INVALID_STRING_CHARS_PLACEHOLDERS
- check_tag_invalid_chars(original_tag, allow_placeholders) list[dict]¶
Report invalid characters in the given tag.
- get_problem_chars(in_str, cname)[source]¶
Return a list of (index, char) pairs for characters in in_str not allowed by the value class cname.
- is_valid_value(in_string, cname)[source]¶
Check whether in_string is a valid whole-word value for class cname.
- Parameters:
- Returns:
Trueif no word-level regex is defined for cname (class imposes no constraint).A
re.Matchobject if in_string matches the word-level regex (valid value).Falseif in_string does not match the word-level regex (invalid value).
- Return type:
True | re.Match | False
UnitValueValidator¶
- class UnitValueValidator(modern_allowed_char_rules=False, value_validators=None)[source]¶
Bases:
objectValidates units.
- DATE_TIME_VALUE_CLASS = 'dateTimeClass'¶
- DIGIT_OR_POUND_EXPRESSION = '^(-?[\\d.]+(?:e-?\\d+)?|#)$'¶
- NAME_VALUE_CLASS = 'nameClass'¶
- NUMERIC_VALUE_CLASS = 'numericClass'¶
- TEXT_VALUE_CLASS = 'textClass'¶
- check_tag_unit_class_units_are_valid(original_tag, validate_text, report_as=None, error_code=None, allow_placeholders=True) list[dict][source]¶
Report incorrect unit class or units.
- Parameters:
original_tag (HedTag) – The original tag that is used to report the error.
validate_text (str) – The text to validate.
report_as (HedTag) – Report errors as coming from this tag, rather than original_tag.
error_code (str) – Override error codes.
allow_placeholders (bool) – Whether placeholders are allowed (affects value class validation for “#”)
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
- check_tag_value_class_valid(original_tag, validate_text, report_as=None) list[dict][source]¶
Report an invalid value portion.
- static report_value_char_errors(class_name, errors, report_as)[source]¶
Build validation issues for specific invalid characters within a value class string.
- static report_value_errors(error_dict, class_valid, report_as)[source]¶
Build validation issues from per-class character error and validity dicts.
- Parameters:
error_dict (dict) – Mapping of class name to list of (char, index) problem tuples.
class_valid (dict) – Mapping of class name to a validity result (
True,re.Match, orFalse) indicating whether the full value passed word-level format validation for that class.report_as (HedTag) – The tag object used as context in error reporting.
- Returns:
Validation issue dictionaries.
- Return type:
DuplicateChecker¶
- class DuplicateChecker[source]¶
Bases:
objectDetects duplicate tags and groups within a HED annotation.
- check_for_duplicates(group) list[dict][source]¶
Find duplicates in a HED group and return the errors found.
- get_hash(group) int | None[source]¶
Return the unique hash for the group as long as no duplicates.
- Parameters:
group (HedGroup) – The HED group to be checked.
- Returns:
Unique hash or None if duplicates were detected within the group.
- Return type:
Union[int, None]
Note: As a side effect, this method will clear the issues list if no duplicates are found.
GroupValidator¶
- class GroupValidator(hed_schema)[source]¶
Bases:
objectValidation for attributes across groups HED tags.
This is things like Required, Unique, top level tags, etc.
- check_multiple_unique_tags_exist(tags) list[source]¶
Report if multiple identical unique tags exist
A unique Term can only appear once in a given HedString. Unique terms are terms with the ‘unique’ property in the schema.
- static check_tag_level_issue(original_tag_list, is_top_level, is_group) list[source]¶
Report tags incorrectly positioned in hierarchy.
- run_all_tags_validators(hed_string_obj) list[dict][source]¶
Report invalid the multi-tag properties in a HED string, e.g. required tags.
- run_tag_level_validators(hed_string_obj) list[dict][source]¶
Report invalid groups at each level.
- Parameters:
hed_string_obj (HedString) – A HedString object.
- Returns:
Issues associated with each level in the HED string. Each issue is a dictionary.
- Return type:
Notes
This pertains to the top-level, all groups, and nested groups.
StringValidator¶
- class StringValidator[source]¶
Bases:
objectRuns checks on the raw string that depend on multiple characters, e.g. mismatched parentheses
- CLOSING_GROUP_CHARACTER = ')'¶
- COMMA = ','¶
- OPENING_GROUP_CHARACTER = '('¶
- static check_count_tag_group_parentheses(hed_string) list[dict][source]¶
Report unmatched parentheses.
TagValidator¶
- class TagValidator[source]¶
Bases:
objectValidation for individual HED tags.
- CAMEL_CASE_EXPRESSION = '([A-Z]+\\s*[a-z-]*)+'¶
- check_capitalization(original_tag) list[dict][source]¶
Report warning if incorrect tag capitalization.
- static check_for_placeholder(original_tag, is_definition=False) list[dict][source]¶
Report invalid placeholder characters.
- Parameters:
- Returns:
Validation issues. Each issue is a dictionary.
- Return type:
Notes
Invalid placeholder may appear in the extension/value portion of a tag.
- static check_tag_exists_in_schema(original_tag) list[dict][source]¶
Report invalid tag or doesn’t take a value.
- check_tag_is_deprecated(original_tag) list[dict][source]¶
Return a validation issue if the tag carries the DeprecatedFrom attribute.
- static check_tag_requires_child(original_tag) list[dict][source]¶
Report if tag is a leaf with ‘requiredTag’ attribute.
- run_individual_tag_validators(original_tag, allow_placeholders=False, is_definition=False) list[dict][source]¶
Runs the validators on the individual tags.
This ignores most illegal characters except in extensions.
- Parameters:
- Returns:
The validation issues associated with the tags. Each issue is dictionary.
- Return type: