Tools¶
Utility functions and data processing tools for HED operations.
Analysis tools¶
EventManager¶
- class hed.tools.analysis.event_manager.EventManager(input_data, hed_schema, extra_defs=None)[source]¶
Bases:
objectManager of events of temporal extent.
- __init__(input_data, hed_schema, extra_defs=None)[source]¶
Create an event manager for an events file. Manages events of temporal extent.
- Parameters:
input_data (TabularInput) – Represents an events file with its sidecar.
hed_schema (HedSchema) – HED schema used.
extra_defs (DefinitionDict) – Extra definitions not included in the input_data information.
- Raises:
HedFileError – If there are any unmatched offsets.
Notes: Keeps the events of temporal extend by their starting index in events file. These events are separated from the rest of the annotations, which are contained in self.hed_strings.
- unfold_context(remove_types=None)[source]¶
Unfold the event information into a tuple based on context.
- Parameters:
remove_types (list or None) – List of types to remove. If None, defaults to empty list.
- Returns:
Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.
- Return type:
tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]
- get_type_defs(types)[source]¶
Return a list of definition names (lower case) that correspond to any of the specified types.
EventChecker¶
- class hed.tools.analysis.event_checker.EventChecker(hed_obj, line_number, original_line_number=None, error_handler=None)[source]¶
Bases:
objectValidates that HED-annotated events meet quality requirements such as having a top-level event tag.
- EVENT_TAGS = {'Agent-action', 'Data-feature', 'Event', 'Experiment-control', 'Experiment-structure', 'Measurement-event', 'Sensory-event'}¶
- NON_TASK_EVENTS = {'Data-feature', 'Experiment-control', 'Experiment-structure', 'Measurement-event'}¶
- TASK_ROLES = {'Cue', 'Experimental-stimulus', 'Feedback', 'Incidental', 'Instructional', 'Mishap', 'Participant-response', 'Task-activity', 'Warning'}¶
- ACTION_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Done-indication', 'Imagined-action', 'Inappropriate-action', 'Incorrect-action', 'Indeterminate-action', 'Miss', 'Near-miss', 'Omitted-action', 'Ready-indication'}¶
- STIMULUS_ROLES = {'Distractor', 'Expected', 'Extraneous', 'Go-signal', 'Meaningful', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Penalty', 'Planned', 'Priming', 'Query', 'Reward', 'Stop-signal', 'Target', 'Threat', 'Timed', 'Unexpected', 'Unplanned'}¶
- ALL_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Cue', 'Distractor', 'Done-indication', 'Expected', 'Experimental-stimulus', 'Extraneous', 'Feedback', 'Go-signal', 'Imagined-action', 'Inappropriate-action', 'Incidental', 'Incorrect-action', 'Indeterminate-action', 'Instructional', 'Meaningful', 'Mishap', 'Miss', 'Near-miss', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Omitted-action', 'Participant-response', 'Penalty', 'Planned', 'Priming', 'Query', 'Ready-indication', 'Reward', 'Stop-signal', 'Target', 'Task-activity', 'Threat', 'Timed', 'Unexpected', 'Unplanned', 'Warning'}¶
- __init__(hed_obj, line_number, original_line_number=None, error_handler=None)[source]¶
Constructor for the EventChecker class.
- Parameters:
hed_obj (HedString) – The HED string to check.
line_number (int or None) – The index of the HED string in the file.
original_line_number (int or None) – The original line number in the file.
error_handler (ErrorHandler) – The ErrorHandler object to use for error handling.
EventsChecker¶
- class hed.tools.analysis.event_checker.EventsChecker(hed_schema, input_data, name=None)[source]¶
Bases:
objectClass to check for event tag quality errors in an event file.
- REMOVE_TYPES = ['Condition-variable', 'Task']¶
- __init__(hed_schema, input_data, name=None)[source]¶
Constructor for the EventChecker class.
- Parameters:
hed_schema (HedSchema) – The HedSchema object to check.
input_data (TabularInput) – The input data object to check.
name (str) – The name to display for this file for error purposes.
- validate_event_tags()[source]¶
Verify that the events in the HED strings validly represent events.
- Returns:
each element is a dictionary with ‘code’ and ‘message’ keys,
- Return type:
- insert_issue_details(issues)[source]¶
Inserts issue details as part of the ‘message’ key for a list of issues.
- Parameters:
issues (list) – List of issues to get details for.
- static get_issue_details(data_info, side_data)[source]¶
Get the source details for the issue.
- Parameters:
data_info (pd.Series) – The row information from the original tsv.
side_data (pd.Series) – The sidecar data.
- Returns:
The HED associated with the relevant columns.
- Return type:
EventsSummary¶
- class hed.tools.analysis.events_summary.EventsSummary(hed_schema, file, sidecar=None, name=None)[source]¶
Bases:
objectSummarizes HED event annotations for a tabular file, grouping tags by stimulus/response categories.
- REMOVE_TYPES = ['Condition-variable', 'Task']¶
- MATCH_TYPES = ['Experimental-stimulus', 'Participant-response', 'Cue', 'Feedback', 'Instructional', 'Sensory-event', 'Agent-action']¶
- EXCLUDED_PARENTS = {'data-marker', 'data-resolution', 'grayscale', 'hsv-color', 'informational-property', 'luminance', 'luminance-contrast', 'opacity', 'organizational-property', 'quantitative-value', 'relation', 'rgb-color', 'spatiotemporal-value', 'statistical-value', 'task-effect-evidence', 'task-relationship'}¶
- CUTOFF_TAGS = {'blue-color', 'brown-color', 'cyan-color', 'gray-color', 'green-color', 'orange-color', 'pink-color', 'purple-color', 'red-color', 'visual-presentation', 'white-color', 'yellow-color'}¶
- FILTERED_TAGS = {'action', 'agent', 'agent-cognitive-state', 'agent-emotional-state', 'agent-physiological-state', 'agent-postural-state', 'agent-property', 'agent-state', 'agent-task-role', 'agent-trait', 'anatomical-item', 'auditory-attribute', 'auditory-device', 'biological-artifact', 'biological-item', 'body-part', 'categorical-class-value', 'categorical-judgment-value', 'categorical-level-value', 'categorical-location-value', 'categorical-orientation-value', 'categorical-value', 'computing-device', 'dara-source-type', 'data-property', 'data-value', 'data-variability-attribute', 'device', 'display-device', 'document', 'environmental-property', 'event', 'face-part', 'geometric-object', 'gustatory-attribute', 'head-part', 'input-device', 'io-device', 'item', 'language-item', 'lower-extremity-part', 'man-made-object', 'media', 'media-clip', 'move-body-part', 'natural-object', 'nonbiological-artifact', 'object', 'olfactory-attribute', 'output-device', 'physical-value', 'property', 'recording-device', 'sensory-attribute', 'sensory-presentation', 'sensory-property', 'spatial-property', 'spectral-property', 'tactile-attribute', 'task-action-type', 'task-attentional-demand', 'task-event-role', 'task-property', 'task-stimulus-role', 'temporal-property', 'torso-part', 'upper-extremity-part', 'visual-attribute', 'visualization'}¶
HedTagManager¶
- class hed.tools.analysis.hed_tag_manager.HedTagManager(event_manager, remove_types=None)[source]¶
Bases:
objectManager for the HED tags from a columnar file.
- __init__(event_manager, remove_types=None)[source]¶
Create a tag manager for one tabular file.
- Parameters:
event_manager (EventManager) – an event manager for the tabular file.
remove_types (list or None) – List of type tags (such as condition-variable) to remove. If None, defaults to empty list.
- get_hed_objs(include_context=True, replace_defs=False)[source]¶
Return a list of HED string objects of same length as the tabular file.
HedTagCount¶
- class hed.tools.analysis.hed_tag_counts.HedTagCount(hed_tag, file_name)[source]¶
Bases:
objectCounts for a particular HedTag in particular file.
- set_value(hed_tag)[source]¶
Update the tag term value counts for a HedTag.
- Parameters:
hed_tag (HedTag or None) – Item to use to update the value counts.
- get_summary() dict[source]¶
Return a dictionary summary of the events and files for this tag.
- Returns:
dictionary summary of events and files that contain this tag.
- Return type:
HedTagCounts¶
- class hed.tools.analysis.hed_tag_counts.HedTagCounts(name, total_events=0)[source]¶
Bases:
objectCounts of HED tags for a group of columnar files.
- Parameters:
- update_tag_counts(hed_string_obj, file_name)[source]¶
Update the tag counts based on a HedString object.
- organize_tags(tag_template) tuple[source]¶
Organize tags into categories as specified by the tag_template.
- Parameters:
tag_template (dict) – A dictionary whose keys are titles and values are lists of HED tags (str).
- Returns:
A tuple containing two elements. - dict: Keys are tags (strings) and values are list of HedTagCount for items fitting template. - list: HedTagCount objects corresponding to tags that don’t fit the template.
- Return type:
- merge_tag_dicts(other_dict)[source]¶
Merge the information from another dictionary with this object’s tag dictionary.
- Parameters:
other_dict (dict) – Dictionary of tag, HedTagCount to merge.
- get_summary() dict[source]¶
Return a summary object containing the tag count information of this summary.
- Returns:
Keys are ‘name’, ‘files’, ‘total_events’, and ‘details’.
- Return type:
- static create_template(tags) dict[source]¶
Creates a dictionary with keys based on list of keys in tags dictionary.
- Parameters:
tags (dict) – dictionary of tags and key lists.
- Returns:
Dictionary with keys in key lists and values are empty lists.
- Return type:
Note: This class is used to organize the results of the tags based on a template for display.
HedTypeManager¶
- class hed.tools.analysis.hed_type_manager.HedTypeManager(event_manager)[source]¶
Bases:
objectManager for type factors and type definitions.
- __init__(event_manager)[source]¶
Create a variable manager for one tabular file for all type variables.
- Parameters:
event_manager (EventManager) – An event manager for the tabular file.
- Raises:
HedFileError – On errors such as unmatched onsets or missing definitions.
- property types¶
Return a list of types managed by this manager.
- Returns:
Type tags names.
- Return type:
- add_type(type_name)[source]¶
Add a type variable to be managed by this manager.
- Parameters:
type_name (str) – Type tag name of the type to be added.
- get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]¶
Return a DataFrame of factor vectors for the indicated HED tag and values.
- Parameters:
- Returns:
DataFrame containing the factor vectors as the columns.
- Return type:
Union[pd.DataFrame, None]
- get_type_tag_factor(type_tag, type_value)[source]¶
Return the HedTypeFactors a specified value and extension.
HedType¶
- class hed.tools.analysis.hed_type.HedType(event_manager, name, type_tag='condition-variable')[source]¶
Bases:
objectManager of a type variable and its associated context.
- __init__(event_manager, name, type_tag='condition-variable')[source]¶
Create a variable manager for one type-variable for one tabular file.
- Parameters:
event_manager (EventManager) – Event manager instance
name (str) – Name of the tabular file as a unique identifier.
type_tag (str) – Lowercase short form of the tag to be managed.
- Raises:
HedFileError – On errors such as unmatched onsets or missing definitions.
- property total_events¶
Return the total number of events in the associated event list.
- Returns:
Number of events.
- Return type:
- get_type_value_factors(type_value)[source]¶
Return the HedTypeFactors associated with type_name or None.
- Parameters:
type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).
- Returns:
Union[HedTypeFactors, None]
- get_type_value_level_info(type_value)[source]¶
Return type variable corresponding to type_value.
- Parameters:
type_value (str)
Returns:
- property type_variables¶
Return the set of type-value names (keys) found in this HedType.
- get_summary()[source]¶
Return a summary dict mapping each type-value name to its factor summary.
- Returns:
Keys are type-value name strings; values are factor summary dicts.
- Return type:
- get_type_factors(type_values=None, factor_encoding='one-hot')[source]¶
Create a dataframe with the indicated type tag values as factors.
HedTypeDefs¶
- class hed.tools.analysis.hed_type_defs.HedTypeDefs(definitions, type_tag='condition-variable')[source]¶
Bases:
objectManager for definitions associated with a type such as condition-variable.
- Properties:
def_map (dict): keys are definition names, values are dict {type_values, description, tags}.
Example: A definition ‘famous-face-cond’ with contents:
‘(Condition-variable/Face-type,Description/A face that should be recognized.,(Image,(Face,Famous)))’
would have type_values [‘face_type’]. All items are strings not objects.
- __init__(definitions, type_tag='condition-variable')[source]¶
Create a definition manager for a type of variable.
- Parameters:
definitions (dict or DefinitionDict) – A dictionary of DefinitionEntry objects.
type_tag (str) – Lower-case HED tag string representing the type managed.
- property type_def_names¶
Return list of names of definition that have this type-variable.
- Returns:
definition names that have this type.
- Return type:
- property type_names¶
Return list of names of the type-variables associated with type definitions.
- Returns:
type names associated with the type definitions
- Return type:
HedTypeFactors¶
- class hed.tools.analysis.hed_type_factors.HedTypeFactors(type_tag, type_value, number_elements)[source]¶
Bases:
objectHolds index of positions for a variable type for A columnar file.
- ALLOWED_ENCODINGS = ('categorical', 'one-hot')¶
HedTypeCount¶
- class hed.tools.analysis.hed_type_counts.HedTypeCount(type_value, type_tag, file_name=None)[source]¶
Bases:
objectManager of the counts of tags for one type tag such as Condition-variable or Task.
- Parameters:
Examples
HedTypeCounts(‘SymmetricCond’, ‘condition-variable’) keeps counts of Condition-variable/Symmetric.
HedTypeCounts¶
- class hed.tools.analysis.hed_type_counts.HedTypeCounts(name, type_tag)[source]¶
Bases:
objectManager for summaries of tag counts for columnar files.
- update_summary(type_sum, total_events=0, file_id=None)[source]¶
Update this summary based on the type variable map.
- add_descriptions(type_defs)[source]¶
Update this summary based on the type variable map.
- Parameters:
type_defs (HedTypeDefs) – Contains the information about the value of a type.
- update(counts)[source]¶
Update count information based on counts in another HedTypeCounts.
- Parameters:
counts (HedTypeCounts) – Information to use in the update.
TabularSummary¶
- class hed.tools.analysis.tabular_summary.TabularSummary(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]¶
Bases:
objectSummarize the contents of columnar files.
- __init__(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]¶
Constructor for a BIDS tabular file summary.
- extract_sidecar_template() dict[source]¶
Extract a BIDS sidecar-compatible dictionary.
- Returns:
A sidecar template that can be converted to JSON.
- Return type:
- update(data, name=None)[source]¶
Update the counts based on data (DataFrame, filename, or list of filenames).
- update_summary(tab_sum)[source]¶
Add TabularSummary values to this object.
- Parameters:
tab_sum (TabularSummary) – A TabularSummary to be combined.
Notes
The value_cols and skip_cols are updated as long as they are not contradictory.
A new skip column cannot be used.
- static extract_summary(summary_info) TabularSummary[source]¶
Create a TabularSummary object from a serialized summary.
- static get_columns_info(dataframe, skip_cols=None) dict[str, dict][source]¶
Extract unique value counts for columns.
- static make_combined_dicts(file_dictionary, skip_cols=None) tuple[TabularSummary, dict[str, TabularSummary]][source]¶
Return combined and individual summaries.
- Parameters:
file_dictionary (FileDictionary) – Dictionary of file name keys and full path.
skip_cols (list) – Name of the column.
- Returns:
A combined summary of all files in the dictionary.
A dictionary where keys are file names and values are individual TabularSummary objects.
- Return type:
ColumnNameSummary¶
- class hed.tools.analysis.column_name_summary.ColumnNameSummary(name='')[source]¶
Bases:
objectSummarize the unique column names in a dataset.
FileDictionary¶
- class hed.tools.analysis.file_dictionary.FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶
Bases:
objectA file dictionary keyed by entity pair indices.
Notes
The entities are identified as 0, 1, … depending on order in the base filename.
The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.
- __init__(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶
Create a dictionary with full paths as values.
- Parameters:
Notes
This dictionary is used for cross listing BIDS style files for different studies.
Examples
If key_indices is (0, 2), the key generated for /tmp/sub-001_task-FaceCheck_run-01_events.tsv is sub_001_run-01.
- property name¶
Name of this dictionary.
- property key_list¶
Keys in this dictionary.
- property file_dict¶
Dictionary of path values in this dictionary.
- property file_list¶
List of path values in this dictionary.
- iter_files()[source]¶
Iterator over the files in this dictionary.
- Yields:
- str – Key into the dictionary. - file: File path.
- key_diffs(other_dict)[source]¶
Return symmetric key difference with another dict.
- Parameters:
other_dict (FileDictionary)
- Returns:
The symmetric difference of the keys in this dictionary and the other one.
- Return type:
- static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]¶
Return a dictionary of files using entity keys.
KeyMap¶
- class hed.tools.analysis.key_map.KeyMap(key_cols, target_cols=None, name='')[source]¶
Bases:
objectA map of unique column values for remapping columns.
- target_cols¶
Optional list of column names that will be inserted into data and later remapped.
- Type:
list or None
Notes: This mapping converts all columns in the mapping to strings. The remapping does not support other types of columns.
- __init__(key_cols, target_cols=None, name='')[source]¶
Information for remapping columns of tabular files.
- property columns¶
Return the column names of the columns managed by this map.
- Returns:
Column names of the columns managed by this map.
- Return type:
- make_template(additional_cols=None, show_counts=True)[source]¶
Return a dataframe template.
- Parameters:
- Returns:
A dataframe containing the template.
- Return type:
DataFrame
- Raises:
HedFileError – If additional columns are not disjoint from the key columns.
Notes
The template consists of the unique key columns in this map plus additional columns.
- remap(data)[source]¶
Remap the columns of a dataframe or columnar file.
- Parameters:
data (DataFrame, str) – Columnar data (either DataFrame or filename) whose columns are to be remapped.
- Returns:
New dataframe with columns remapped.
List of row numbers that had no correspondence in the mapping.
- Return type:
- Raises:
HedFileError – If data is missing some of the key columns.
- update(data, allow_missing=True)[source]¶
Update the existing map with information from data.
- Parameters:
- Raises:
HedFileError – If there are missing keys and allow_missing is False.
TemporalEvent¶
- class hed.tools.analysis.temporal_event.TemporalEvent(contents, start_index, start_time)[source]¶
Bases:
objectA single event process with starting and ending times.
Note: the contents have the Onset and duration removed.
Annotation utilities¶
Utilities to facilitate annotation of events in BIDS.
- hed.tools.analysis.annotation_util.check_df_columns(df, required_cols=('column_name', 'column_value', 'description', 'HED')) list[str][source]¶
Return a list of the specified columns that are missing from a dataframe.
- hed.tools.analysis.annotation_util.df_to_hed(dataframe, description_tag=True) dict[source]¶
Create sidecar-like dictionary from a 4-column dataframe.
- Parameters:
dataframe (DataFrame) – A four-column Pandas DataFrame with specific columns.
description_tag (bool) – If True description tag is included.
- Returns:
A dictionary compatible with BIDS JSON tabular file that includes HED.
- Return type:
Notes
The DataFrame must have the columns with names: column_name, column_value, description, and HED.
- hed.tools.analysis.annotation_util.extract_tags(hed_string, search_tag) tuple[str, list[str]][source]¶
Extract all instances of specified tag from a tag_string.
- hed.tools.analysis.annotation_util.generate_sidecar_entry(column_name, column_values=None) dict[source]¶
Create a sidecar column dictionary for column.
- Parameters:
column_name (str) – Name of the column.
column_values – List of column values.
- hed.tools.analysis.annotation_util.hed_to_df(sidecar_dict, col_names=None) DataFrame[source]¶
Return a 4-column dataframe of HED portions of sidecar.
- Parameters:
- Returns:
Four-column spreadsheet representing HED portion of sidecar.
- Return type:
DataFrame
Notes
The returned DataFrame has columns: column_name, column_value, description, and HED.
- hed.tools.analysis.annotation_util.merge_hed_dict(sidecar_dict, hed_dict)[source]¶
Update a JSON sidecar based on the hed_dict values.
- hed.tools.analysis.annotation_util.series_to_factor(series) list[int][source]¶
Convert a series to an integer factor list.
- Parameters:
series (pd.Series) – Series to be converted to a list.
- Returns:
list[int] - contains 0’s and 1’s, empty, ‘n/a’ and np.nan are converted to 0.
- hed.tools.analysis.annotation_util.str_to_tabular(tsv_str, sidecar=None) TabularInput[source]¶
Return a TabularInput a tsv string.
- Parameters:
tsv_str (str) – A string representing a tabular input.
sidecar – An optional Sidecar object.
- hed.tools.analysis.annotation_util.strs_to_hed_objs(hed_strings, hed_schema) list[HedString] | None[source]¶
Returns a list of HedString objects from a list of strings.
- Parameters:
hed_strings (string or list) – String or strings representing HED annotations.
hed_schema (HedSchema or HedSchemaGroup) – Schema version for the strings.
- Returns:
A list of HedString objects or None.
- Return type:
- hed.tools.analysis.annotation_util.strs_to_sidecar(sidecar_strings) Sidecar | None[source]¶
Return a Sidecar from a sidecar as string or as a list of sidecars as strings.
- hed.tools.analysis.annotation_util.to_factor(data, column=None) list[int][source]¶
Convert data to an integer factor list.
BIDS tools¶
BidsDataset¶
- class hed.tools.bids.bids_dataset.BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶
Bases:
objectA BIDS dataset representation primarily focused on HED evaluation.
- schema¶
The schema used for evaluation.
- Type:
- __init__(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶
Constructor for a BIDS dataset.
- Parameters:
root_path (str) – Root path of the BIDS dataset.
schema (HedSchema or HedSchemaGroup) – A schema that overrides the one specified in dataset.
suffixes (list or None) – File name suffixes of items to include. If not provided, defaults to [‘events’, ‘participants’]. If None or empty list, includes all files.
exclude_dirs (list or None) – Directory names to exclude from traversal. If not provided, defaults to [‘sourcedata’, ‘derivatives’, ‘code’, ‘stimuli’]. If None or empty list, no directories are excluded.
- get_file_group(suffix)[source]¶
Return the file group of files with the specified suffix.
- Parameters:
suffix (str) – Suffix of the BidsFileGroup to be returned.
- Returns:
The requested tabular group.
- Return type:
Union[BidsFileGroup, None]
- validate(check_for_warnings=False, schema=None)[source]¶
Validate the dataset.
- Parameters:
check_for_warnings (bool) – If True, check for warnings.
schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.
- Returns:
List of issues encountered during validation. Each issue is a dictionary.
- Return type:
BidsFile¶
- class hed.tools.bids.bids_file.BidsFile(file_path)[source]¶
Bases:
objectA BIDS file with entity dictionary.
Notes
This class may hold the merged sidecar giving metadata for this file as well as contents.
- __init__(file_path)[source]¶
Constructor for a file path.
- Parameters:
file_path (str) – Full path of the file.
- property contents¶
Return the current contents of this object.
- get_key(entities=None)[source]¶
Return a key for this BIDS file given a list of entities.
- Parameters:
entities (tuple) – A tuple of strings representing entities.
- Returns:
A key based on this object.
- Return type:
Notes
If entities is None, then the file path is used as the key.
- set_contents(content_info=None, overwrite=False)[source]¶
Set the contents of this object.
- Parameters:
content_info (Any) – JSON dictionary The contents appropriate for this object.
overwrite (bool) – If False and the contents are not empty, do nothing.
Notes
Do not set if the contents are already set and no_overwrite is True.
BidsFileGroup¶
- class hed.tools.bids.bids_file_group.BidsFileGroup(root_path, file_list, suffix='events')[source]¶
Bases:
objectContainer for BIDS files with a specified suffix.
- suffix¶
The file suffix specifying the class of file represented in this group (e.g., events).
- Type:
- sidecar_dir_dict¶
Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.
- Type:
- summarize(value_cols=None, skip_cols=None)[source]¶
Return a BidsTabularSummary of group files.
- Parameters:
- Returns:
A summary of the number of values in different columns if tabular group.
- Return type:
Union[TabularSummary, None]
Notes
The columns that are not value_cols or skip_col are summarized by counting
the number of times each unique value appears in that column.
- get_task_names()[source]¶
Return a sorted list of unique task names found in the file group’s TSV and JSON filenames.
- Returns:
Sorted list of unique task name strings (the
xxxxportion oftask-xxxxentities).- Return type:
Notes
Parses both
sidecar_dictanddatafile_dictfile paths.The BIDS
task-entity is matched case-insensitively.
- validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]¶
Validate the sidecars and datafiles and return a list of issues.
- Parameters:
hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
check_for_warnings (bool) – If True, include warnings in the check.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
- validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶
Validate merged sidecars.
- Parameters:
hed_schema (HedSchema) – HED schema for validation.
extra_def_dicts (DefinitionDict) – Extra definitions.
error_handler (ErrorHandler) – Error handler to use.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
- validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶
Validate the datafiles and return an error list.
- Parameters:
hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
error_handler (ErrorHandler) – Error handler to use.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
Notes: This will clear the contents of the datafiles if they were not previously set.
- static create_file_group(root_path, file_list, suffix)[source]¶
Construct a BidsFileGroup from a list of files sharing the given suffix.
- Parameters:
- Returns:
The constructed group, or None if it contains no sidecars or data files.
- Return type:
BidsFileGroup or None
BidsSidecarFile¶
- class hed.tools.bids.bids_sidecar_file.BidsSidecarFile(file_path)[source]¶
Bases:
BidsFileA BIDS sidecar file.
- __init__(file_path)[source]¶
Constructs a bids sidecar from a file.
- Parameters:
file_path (str) – The real path of the sidecar.
- is_sidecar_for(obj)[source]¶
Return True if this is a sidecar for obj.
- Parameters:
obj (BidsFile) – A BidsFile object to check.
- Returns:
True if this is a BIDS parent of obj and False otherwise.
- Return type:
Notes
A sidecar is a sidecar for itself.
- set_contents(content_info=None, name='unknown', overwrite=False)[source]¶
Set the contents of the sidecar.
- Parameters:
Notes
- The handling of content_info is as follows:
None: This object’s file_path is used.
dict: This is interpreted as a JSON dictionary.
BidsTabularFile¶
- class hed.tools.bids.bids_tabular_file.BidsTabularFile(file_path)[source]¶
Bases:
BidsFileA BIDS tabular file including its associated sidecar.
- __init__(file_path)[source]¶
Constructor for a BIDS tabular file.
- Parameters:
file_path (str) – Path of the tabular file.
- set_contents(content_info=None, overwrite=False)[source]¶
Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.
- Parameters:
content_info (None) – This always uses the internal file_path to create the contents.
overwrite (bool) – If False (The Default), do not overwrite existing contents if any.
BIDS utilities¶
BIDS utility functions for schema loading, sidecar merging, and inheritance chain resolution.
- hed.tools.bids.bids_util.get_schema_from_description(root_path)[source]¶
Load the HED schema version declared in the BIDS dataset_description.json.
- hed.tools.bids.bids_util.parse_bids_filename(file_path)[source]¶
Split a filename into BIDS-relevant components.
- Parameters:
file_path (str) – Path to be parsed.
- Returns:
Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.
- Return type:
Notes
Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.
- hed.tools.bids.bids_util.update_entity(name_dict, entity)[source]¶
Update the dictionary with a new entity.
- hed.tools.bids.bids_util.get_merged_sidecar(root_path, tsv_file)[source]¶
Return a merged sidecar dict following BIDS inheritance rules for a given TSV file.
- hed.tools.bids.bids_util.walk_back(root_path, file_path)[source]¶
Yield inherited sidecar file paths from the directory of file_path back toward root_path.
Traverses parent directories from the file’s location up to root_path, yielding any sidecar JSON files that apply to the given TSV according to BIDS inheritance rules.
- hed.tools.bids.bids_util.get_candidates(source_dir, tsv_file_dict)[source]¶
Return sidecar JSON files in source_dir that are applicable to tsv_file_dict.
- hed.tools.bids.bids_util.matches_criteria(json_file_dict, tsv_file_dict)[source]¶
Return True if a candidate sidecar JSON file applies to the given TSV file.
A sidecar applies when its extension is
.json, its suffix matches the TSV, and all BIDS entities in the JSON filename have equal values in the TSV filename.
Utility functions¶
DataFrame utilities¶
Data handling utilities involving dataframes.
- hed.tools.util.data_util.add_columns(df, column_list, value='n/a')[source]¶
Add specified columns to df if not there.
- hed.tools.util.data_util.check_match(ds1, ds2, numeric=False)[source]¶
Check two Pandas data series have the same values.
- hed.tools.util.data_util.delete_columns(df, column_list)[source]¶
Delete the specified columns from a dataframe.
- Parameters:
df (DataFrame) – Pandas dataframe from which to delete columns.
column_list (list) – List of candidate column names for deletion.
Notes
The deletion of columns is done in place.
This does not raise an error if df does not have a column in the list.
- hed.tools.util.data_util.delete_rows_by_column(df, value, column_list=None)[source]¶
Delete rows where columns have this value.
- Parameters:
Notes
All values are converted to string before testing.
Deletion is done in place.
- hed.tools.util.data_util.get_eligible_values(values, values_included)[source]¶
Return a list of the items from values that are in values_included or None if no values_included.
- hed.tools.util.data_util.get_new_dataframe(data)[source]¶
Get a new dataframe representing a tsv file.
- Parameters:
data (DataFrame or str) – DataFrame or filename representing a tsv file.
- Returns:
- A dataframe containing the contents of the tsv file or if data was
a DataFrame to start with, a new copy of the DataFrame.
- Return type:
DataFrame
- Raises:
A filename is given, and it cannot be read into a Dataframe.
- hed.tools.util.data_util.get_row_hash(row, key_list)[source]¶
Get a hash key from key column values for row.
- Parameters:
row (DataSeries)
key_list (list)
- Returns:
Hash key constructed from the entries of row in the columns specified by key_list.
- Return type:
- Raises:
If row doesn’t have all the columns in key_list HedFileError is raised.
- hed.tools.util.data_util.get_value_dict(tsv_path, key_col='file_basename', value_col='sampling_rate')[source]¶
Get a dictionary of two columns of a dataframe.
- Parameters:
- Returns:
Dictionary with key_col values as the keys and the corresponding value_col values as the values.
- Return type:
- Raises:
HedFileError – When tsv_path does not correspond to a file that can be read into a DataFrame.
- hed.tools.util.data_util.make_info_dataframe(col_info, selected_col)[source]¶
Get a dataframe from selected columns.
- Parameters:
- Returns:
- A two-column dataframe with first column containing values from the
dictionary whose key is selected_col and whose second column are the corresponding counts. The returned value is None if selected_col is not a top-level key in col_info.
- Return type:
dataframe
- hed.tools.util.data_util.replace_na(df)[source]¶
Replace (in place) the n/a with np.nan taking care of categorical columns.
- hed.tools.util.data_util.replace_values(df, values=None, replace_value='n/a', column_list=None)[source]¶
Replace string values in specified columns.
- Parameters:
df (DataFrame) – Dataframe whose values will be replaced.
values (list, None) – List of strings to replace. If None, only empty strings are replaced.
replace_value (str) – String replacement value.
column_list (list, None) – List of columns in which to do replacement. If None all columns are processed.
- Returns:
number of values replaced.
- Return type:
- hed.tools.util.data_util.reorder_columns(data, col_order, skip_missing=True)[source]¶
Create a new dataframe with columns reordered.
- Parameters:
- Returns:
A new reordered dataframe.
- Return type:
DataFrame
- Raises:
HedFileError – If col_order contains columns not in data and skip_missing is False.
If data corresponds to a filename from which a dataframe cannot be created. –
File/IO utilities¶
Utilities for generating and handling file names.
- hed.tools.util.io_util.check_filename(test_file, name_prefix=None, name_suffix=None, extensions=None)[source]¶
Return True if correct extension, suffix, and prefix.
- Parameters:
test_file (str) – Path of filename to test.
name_prefix (list, str, None) – An optional name_prefix or list of prefixes to accept for the base filename.
name_suffix (list, str, None) – An optional name_suffix or list of suffixes to accept for the base file name.
extensions (list, str, None) – An optional extension or list of extensions to accept for the extensions.
- Returns:
True if file has the appropriate format.
- Return type:
Notes
Everything is converted to lower case prior to testing so this test should be case-insensitive.
None indicates that all are accepted.
- hed.tools.util.io_util.get_allowed(value, allowed_values=None, starts_with=True)[source]¶
Return the portion of the value that matches a value in allowed_values or None if no match.
- Parameters:
- Returns:
portion of value that matches the various allowed_values.
- Return type:
Notes
match is done in lower case.
- hed.tools.util.io_util.get_alphanumeric_path(pathname, replace_char='_')[source]¶
Replace sequences of non-alphanumeric characters in string (usually a path) with specified character.
- hed.tools.util.io_util.get_full_extension(filename)[source]¶
Return the full extension of a file, including the period.
- hed.tools.util.io_util.get_unique_suffixes(file_paths, extensions=None)[source]¶
Get unique suffixes from file paths with specified extensions.
- hed.tools.util.io_util.extract_suffix_path(path, prefix_path)[source]¶
Return the suffix of path after prefix path has been removed.
Notes
This function is useful for creating files within BIDS datasets.
- hed.tools.util.io_util.clean_filename(filename)[source]¶
Replace invalid characters with under-bars.
- hed.tools.util.io_util.get_basename(file_path)[source]¶
Return the base filename (without extension) for the given path.
- hed.tools.util.io_util.get_filtered_by_element(file_list, elements)[source]¶
Filter a file list by whether the base names have a substring matching any of the members of elements.
- hed.tools.util.io_util.get_filtered_list(file_list, name_prefix=None, name_suffix=None, extensions=None)[source]¶
Get list of filenames satisfying the criteria.
Everything is converted to lower case prior to testing so this test should be case-insensitive.
- hed.tools.util.io_util.get_file_list(root_path, name_prefix=None, name_suffix=None, extensions=None, exclude_dirs=None)[source]¶
Return paths satisfying various conditions.
- Parameters:
root_path (str) – Full path of the directory tree to be traversed (no ending slash).
name_prefix (list, str, None) – An optional prefix for the base filename.
name_suffix (list, str, None) – An optional suffix for the base filename.
extensions (list, None) – A list of extensions to be selected.
exclude_dirs (list, None) – A list of paths to be excluded.
- Returns:
The full paths.
- Return type:
Notes: Exclude directories are paths relative to the root path.
- hed.tools.util.io_util.get_path_components(root_path, this_path)[source]¶
Get a list of the remaining components after root path.
- Parameters:
- Returns:
A list with the remaining elements directory components to the file.
- Return type:
Union[list, None]
Notes: this_path must be a descendant of root_path.
- hed.tools.util.io_util.get_timestamp()[source]¶
Return a timestamp string suitable for using in filenames.
- Returns:
Represents the current time.
- Return type:
- hed.tools.util.io_util.get_task_from_file(file_path)[source]¶
Returns the task name entity from a BIDS-type file path.
Schema utilities¶
Utilities