Summarize HED validation

The summarize_hed_validation operation runs the HED validator on the requested data and produces a summary of the errors. For more information on HED validation, see the HED validation guide.

Purpose

Use this operation to:

  • Validate HED annotations in event files and sidecars

  • Identify annotation errors and warnings

  • Check consistency of definitions and tag usage

  • Verify onset/offset matching and temporal consistency

  • Document validation status for datasets

Parameters

In addition to the required summary_name and summary_filename parameters, the summarize_hed_validation operation has an optional boolean parameter check_for_warnings. If check_for_warnings is false, the summary will not report warnings.

Parameters for the summarize_hed_validation operation.

Parameter

Type

Description

summary_name

str

A unique name used to identify this summary.

summary_filename

str

A unique file basename to use for saving this summary.

append_timecode

bool

(Optional: Default false) If true, append a time code to filename.

check_for_warnings

bool

(Optional: Default false) If true, warnings are reported in addition to errors.

The summarize_hed_validation is a HED operation and the calling program must provide a HED schema version and usually a JSON sidecar containing the HED annotations.

The validation process takes place in two stages: first the JSON sidecar is validated. This strategy is used because a single error in the JSON sidecar can generate an error message for every line in the corresponding data file.

If the JSON sidecar has errors (warnings don’t count), the validation process is terminated without validation of the data file and assembled HED annotations.

If the JSON sidecar does not have errors, the validator assembles the annotations for each line in the data files and validates the assembled HED annotation. Data file-wide consistency, such as matched onsets and offsets, is also checked.

Example

A JSON file with a single summarize_hed_validation summarization operation.

[{
   "operation": "summarize_hed_validation",
   "description": "Summarize validation errors in the sample dataset.",
   "parameters": {
       "summary_name": "AOMIC_sample_validation",
       "summary_filename": "AOMIC_sample_validation",
       "check_for_warnings": true
   }
}]

Results

To demonstrate the output of the validation operation, we modified the first row of the sample remodel event file so that trial_type column contained the value baloney rather than go. This modification generates a warning because the meaning of baloney is not defined in the sample remodel sidecar file. The results of executing the example operation with the modified file are shown in the following example.

Text summary of summarize_hed_validation operation on a modified sample data file.

Summary name: AOMIC_sample_validation
Summary type: hed_validation
Summary filename: AOMIC_sample_validation

Summary details:

Dataset: [1 sidecar files, 1 event files]
   task-stopsignal_acq-seq_events.json: 0 issues
   sub-0013_task-stopsignal_acq-seq_events.tsv: 6 issues

Individual files:

   sub-0013_task-stopsignal_acq-seq_events.tsv: 1 sidecar files
      task-stopsignal_acq-seq_events.json has no issues
      sub-0013_task-stopsignal_acq-seq_events.tsv issues:
            HED_UNKNOWN_COLUMN: WARNING: Column named 'onset' found in file, but not specified as a tag column or identified in sidecars.
            HED_UNKNOWN_COLUMN: WARNING: Column named 'duration' found in file, but not specified as a tag column or identified in sidecars.
            HED_UNKNOWN_COLUMN: WARNING: Column named 'response_time' found in file, but not specified as a tag column or identified in sidecars.
            HED_UNKNOWN_COLUMN: WARNING: Column named 'response_accuracy' found in file, but not specified as a tag column or identified in sidecars.
            HED_UNKNOWN_COLUMN: WARNING: Column named 'response_hand' found in file, but not specified as a tag column or identified in sidecars.
            HED_SIDECAR_KEY_MISSING[row=0,column=2]: WARNING: Category key 'baloney' does not exist in column.  Valid keys are: ['succesful_stop', 'unsuccesful_stop', 'go']

This summary was produced using HED schema version hed_version="8.1.0" when creating the dispatcher and using the sample remodel sidecar file in the do_op.

Notes

  • Requires HED schema: You must specify a HED schema version when creating the Dispatcher

  • Requires sidecar: Usually needs JSON sidecar with HED annotations

  • Two-stage validation: Sidecar validated first, then assembled annotations

  • Sidecar errors prevent data file validation (one error can cascade)

  • Warnings vs. errors: Errors are more severe; set check_for_warnings to see both

  • Each issue includes error code, severity level, and specific location

  • Essential quality assurance step before using HED-dependent operations

  • Run this operation early in your pipeline to catch annotation problems