Tools¶
Utility functions and data processing tools for HED operations.
Analysis tools¶
EventManager¶
- class EventManager(input_data, hed_schema, extra_defs=None)[source]¶
Bases:
objectManager of events of temporal extent.
- static compress_strings(list_to_compress)[source]¶
Compress a list of lists of strings into a single str with comma-separated elements.
- get_type_defs(types)[source]¶
Return a list of definition names (lower case) that correspond to any of the specified types.
- unfold_context(remove_types=None)[source]¶
Unfold the event information into a tuple based on context.
- Parameters:
remove_types (list or None) – List of types to remove. If None, defaults to empty list.
- Returns:
Union[list(str), HedString]: The information without the events of temporal extent. Union[list(str), HedString, None]: The onsets of the events of temporal extent. Union[list(str), HedString, None]: The ongoing context information.
- Return type:
tuple[Union[list(str), HedString], Union[list(str), HedString, None], Union[list(str), HedString, None]]
EventChecker¶
- class EventChecker(hed_obj, line_number, original_line_number=None, error_handler=None)[source]¶
Bases:
objectValidates that HED-annotated events meet quality requirements such as having a top-level event tag.
- ACTION_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Done-indication', 'Imagined-action', 'Inappropriate-action', 'Incorrect-action', 'Indeterminate-action', 'Miss', 'Near-miss', 'Omitted-action', 'Ready-indication'}¶
- ALL_ROLES = {'Appropriate-action', 'Correct-action', 'Correction', 'Cue', 'Distractor', 'Done-indication', 'Expected', 'Experimental-stimulus', 'Extraneous', 'Feedback', 'Go-signal', 'Imagined-action', 'Inappropriate-action', 'Incidental', 'Incorrect-action', 'Indeterminate-action', 'Instructional', 'Meaningful', 'Mishap', 'Miss', 'Near-miss', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Omitted-action', 'Participant-response', 'Penalty', 'Planned', 'Priming', 'Query', 'Ready-indication', 'Reward', 'Stop-signal', 'Target', 'Task-activity', 'Threat', 'Timed', 'Unexpected', 'Unplanned', 'Warning'}¶
- EVENT_TAGS = {'Agent-action', 'Data-feature', 'Event', 'Experiment-control', 'Experiment-structure', 'Measurement-event', 'Sensory-event'}¶
- NON_TASK_EVENTS = {'Data-feature', 'Experiment-control', 'Experiment-structure', 'Measurement-event'}¶
- STIMULUS_ROLES = {'Distractor', 'Expected', 'Extraneous', 'Go-signal', 'Meaningful', 'Newly-learned', 'Non-informative', 'Non-target', 'Not-meaningful', 'Novel', 'Oddball', 'Penalty', 'Planned', 'Priming', 'Query', 'Reward', 'Stop-signal', 'Target', 'Threat', 'Timed', 'Unexpected', 'Unplanned'}¶
- TASK_ROLES = {'Cue', 'Experimental-stimulus', 'Feedback', 'Incidental', 'Instructional', 'Mishap', 'Participant-response', 'Task-activity', 'Warning'}¶
EventsChecker¶
- class EventsChecker(hed_schema, input_data, name=None)[source]¶
Bases:
objectClass to check for event tag quality errors in an event file.
- REMOVE_TYPES = ['Condition-variable', 'Task']¶
- static get_issue_details(data_info, side_data)[source]¶
Get the source details for the issue.
- Parameters:
data_info (pd.Series) – The row information from the original tsv.
side_data (pd.Series) – The sidecar data.
- Returns:
The HED associated with the relevant columns.
- Return type:
- get_onset_lines(line)[source]¶
Get the lines in the input data with the same line numbers as the data_frame.
EventsSummary¶
- class EventsSummary(hed_schema, file, sidecar=None, name=None)[source]¶
Bases:
objectSummarizes HED event annotations for a tabular file, grouping tags by stimulus/response categories.
- CUTOFF_TAGS = {'blue-color', 'brown-color', 'cyan-color', 'gray-color', 'green-color', 'orange-color', 'pink-color', 'purple-color', 'red-color', 'visual-presentation', 'white-color', 'yellow-color'}¶
- EXCLUDED_PARENTS = {'data-marker', 'data-resolution', 'grayscale', 'hsv-color', 'informational-property', 'luminance', 'luminance-contrast', 'opacity', 'organizational-property', 'quantitative-value', 'relation', 'rgb-color', 'spatiotemporal-value', 'statistical-value', 'task-effect-evidence', 'task-relationship'}¶
- FILTERED_TAGS = {'action', 'agent', 'agent-cognitive-state', 'agent-emotional-state', 'agent-physiological-state', 'agent-postural-state', 'agent-property', 'agent-state', 'agent-task-role', 'agent-trait', 'anatomical-item', 'auditory-attribute', 'auditory-device', 'biological-artifact', 'biological-item', 'body-part', 'categorical-class-value', 'categorical-judgment-value', 'categorical-level-value', 'categorical-location-value', 'categorical-orientation-value', 'categorical-value', 'computing-device', 'dara-source-type', 'data-property', 'data-value', 'data-variability-attribute', 'device', 'display-device', 'document', 'environmental-property', 'event', 'face-part', 'geometric-object', 'gustatory-attribute', 'head-part', 'input-device', 'io-device', 'item', 'language-item', 'lower-extremity-part', 'man-made-object', 'media', 'media-clip', 'move-body-part', 'natural-object', 'nonbiological-artifact', 'object', 'olfactory-attribute', 'output-device', 'physical-value', 'property', 'recording-device', 'sensory-attribute', 'sensory-presentation', 'sensory-property', 'spatial-property', 'spectral-property', 'tactile-attribute', 'task-action-type', 'task-attentional-demand', 'task-event-role', 'task-property', 'task-stimulus-role', 'temporal-property', 'torso-part', 'upper-extremity-part', 'visual-attribute', 'visualization'}¶
- MATCH_TYPES = ['Experimental-stimulus', 'Participant-response', 'Cue', 'Feedback', 'Instructional', 'Sensory-event', 'Agent-action']¶
- REMOVE_TYPES = ['Condition-variable', 'Task']¶
HedTagManager¶
- class HedTagManager(event_manager, remove_types=None)[source]¶
Bases:
objectManager for the HED tags from a columnar file.
- get_hed_obj(hed_str, remove_types=False, remove_group=False)[source]¶
Return a HED string object with the types removed.
HedTagCount¶
- class HedTagCount(hed_tag, file_name)[source]¶
Bases:
objectCounts for a particular HedTag in particular file.
- get_empty()[source]¶
Return a copy of this entry with counts reset to zero.
- Returns:
A new instance with the same tag name but zeroed event/file counts.
- Return type:
HedTagCounts¶
- class HedTagCounts(name, total_events=0)[source]¶
Bases:
objectCounts of HED tags for a group of columnar files.
- Parameters:
- static create_template(tags) dict[source]¶
Creates a dictionary with keys based on list of keys in tags dictionary.
- Parameters:
tags (dict) – dictionary of tags and key lists.
- Returns:
Dictionary with keys in key lists and values are empty lists.
- Return type:
Note: This class is used to organize the results of the tags based on a template for display.
- get_summary() dict[source]¶
Return a summary object containing the tag count information of this summary.
- Returns:
Keys are ‘name’, ‘files’, ‘total_events’, and ‘details’.
- Return type:
- merge_tag_dicts(other_dict)[source]¶
Merge the information from another dictionary with this object’s tag dictionary.
- Parameters:
other_dict (dict) – Dictionary of tag, HedTagCount to merge.
- organize_tags(tag_template) tuple[source]¶
Organize tags into categories as specified by the tag_template.
- Parameters:
tag_template (dict) – A dictionary whose keys are titles and values are lists of HED tags (str).
- Returns:
A tuple containing two elements. - dict: Keys are tags (strings) and values are list of HedTagCount for items fitting template. - list: HedTagCount objects corresponding to tags that don’t fit the template.
- Return type:
HedTypeManager¶
- class HedTypeManager(event_manager)[source]¶
Bases:
objectManager for type factors and type definitions.
- add_type(type_name)[source]¶
Add a type variable to be managed by this manager.
- Parameters:
type_name (str) – Type tag name of the type to be added.
- get_factor_vectors(type_tag, type_values=None, factor_encoding='one-hot')[source]¶
Return a DataFrame of factor vectors for the indicated HED tag and values.
- Parameters:
- Returns:
DataFrame containing the factor vectors as the columns.
- Return type:
Union[pd.DataFrame, None]
- get_type_tag_factor(type_tag, type_value)[source]¶
Return the HedTypeFactors a specified value and extension.
HedType¶
- class HedType(event_manager, name, type_tag='condition-variable')[source]¶
Bases:
objectManager of a type variable and its associated context.
- get_summary()[source]¶
Return a summary dict mapping each type-value name to its factor summary.
- Returns:
Keys are type-value name strings; values are factor summary dicts.
- Return type:
- get_type_factors(type_values=None, factor_encoding='one-hot')[source]¶
Create a dataframe with the indicated type tag values as factors.
- static get_type_list(type_tag, item)[source]¶
Find a list of the given type tag from a HedTag, HedGroup, or HedString.
- get_type_value_factors(type_value)[source]¶
Return the HedTypeFactors associated with type_name or None.
- Parameters:
type_value (str) – The tag corresponding to the type’s value (such as the name of the condition variable).
- Returns:
Union[HedTypeFactors, None]
- get_type_value_level_info(type_value)[source]¶
Return type variable corresponding to type_value.
- Parameters:
type_value (str)
Returns:
- property total_events¶
Return the total number of events in the associated event list.
- Returns:
Number of events.
- Return type:
HedTypeDefs¶
- class HedTypeDefs(definitions, type_tag='condition-variable')[source]¶
Bases:
objectManager for definitions associated with a type such as condition-variable.
- Properties:
def_map (dict): keys are definition names, values are dict {type_values, description, tags}.
Example: A definition ‘famous-face-cond’ with contents:
‘(Condition-variable/Face-type,Description/A face that should be recognized.,(Image,(Face,Famous)))’
would have type_values [‘face_type’]. All items are strings not objects.
- property type_def_names¶
Return list of names of definition that have this type-variable.
- Returns:
definition names that have this type.
- Return type:
HedTypeFactors¶
- class HedTypeFactors(type_tag, type_value, number_elements)[source]¶
Bases:
objectHolds index of positions for a variable type for A columnar file.
- ALLOWED_ENCODINGS = ('categorical', 'one-hot')¶
HedTypeCount¶
- class HedTypeCount(type_value, type_tag, file_name=None)[source]¶
Bases:
objectManager of the counts of tags for one type tag such as Condition-variable or Task.
- Parameters:
Examples
HedTypeCounts(‘SymmetricCond’, ‘condition-variable’) keeps counts of Condition-variable/Symmetric.
HedTypeCounts¶
- class HedTypeCounts(name, type_tag)[source]¶
Bases:
objectManager for summaries of tag counts for columnar files.
- add_descriptions(type_defs)[source]¶
Update this summary based on the type variable map.
- Parameters:
type_defs (HedTypeDefs) – Contains the information about the value of a type.
- get_summary()[source]¶
Return the information in the manager as a dictionary.
- Returns:
Dict with keys ‘name’, ‘type_tag’, ‘files’, ‘total_events’, and ‘details’.
- Return type:
- update(counts)[source]¶
Update count information based on counts in another HedTypeCounts.
- Parameters:
counts (HedTypeCounts) – Information to use in the update.
TabularSummary¶
- class TabularSummary(value_cols=None, skip_cols=None, name='', categorical_limit=None)[source]¶
Bases:
objectSummarize the contents of columnar files.
- extract_sidecar_template() dict[source]¶
Extract a BIDS sidecar-compatible dictionary.
- Returns:
A sidecar template that can be converted to JSON.
- Return type:
- static extract_summary(summary_info) TabularSummary[source]¶
Create a TabularSummary object from a serialized summary.
- static get_columns_info(dataframe, skip_cols=None) dict[str, dict][source]¶
Extract unique value counts for columns.
- static make_combined_dicts(file_dictionary, skip_cols=None) tuple[TabularSummary, dict[str, TabularSummary]][source]¶
Return combined and individual summaries.
- Parameters:
file_dictionary (FileDictionary) – Dictionary of file name keys and full path.
skip_cols (list) – Name of the column.
- Returns:
A combined summary of all files in the dictionary.
A dictionary where keys are file names and values are individual TabularSummary objects.
- Return type:
- update(data, name=None)[source]¶
Update the counts based on data (DataFrame, filename, or list of filenames).
- update_summary(tab_sum)[source]¶
Add TabularSummary values to this object.
- Parameters:
tab_sum (TabularSummary) – A TabularSummary to be combined.
Notes
The value_cols and skip_cols are updated as long as they are not contradictory.
A new skip column cannot be used.
ColumnNameSummary¶
- class ColumnNameSummary(name='')[source]¶
Bases:
objectSummarize the unique column names in a dataset.
- get_summary(as_json=False)[source]¶
Return summary as an object or in JSON.
- Parameters:
as_json (bool) – If False (the default), return the underlying summary object, otherwise transform to JSON.
FileDictionary¶
- class FileDictionary(collection_name, file_list, key_indices=(0, 2), separator='_')[source]¶
Bases:
objectA file dictionary keyed by entity pair indices.
Notes
The entities are identified as 0, 1, … depending on order in the base filename.
The entity key-value pairs are assumed separated by ‘_’ unless a separator is provided.
- property file_dict¶
Dictionary of path values in this dictionary.
- property file_list¶
List of path values in this dictionary.
- iter_files()[source]¶
Iterator over the files in this dictionary.
- Yields:
- str – Key into the dictionary. - file: File path.
- key_diffs(other_dict)[source]¶
Return symmetric key difference with another dict.
- Parameters:
other_dict (FileDictionary)
- Returns:
The symmetric difference of the keys in this dictionary and the other one.
- Return type:
- property key_list¶
Keys in this dictionary.
- static make_file_dict(file_list, key_indices=(0, 2), separator='_')[source]¶
Return a dictionary of files using entity keys.
- static make_key(key_string, indices=(0, 2), separator='_')[source]¶
Create a key from specified entities.
- property name¶
Name of this dictionary.
KeyMap¶
- class KeyMap(key_cols, target_cols=None, name='')[source]¶
Bases:
objectA map of unique column values for remapping columns.
- target_cols¶
Optional list of column names that will be inserted into data and later remapped.
- Type:
list or None
Notes: This mapping converts all columns in the mapping to strings. The remapping does not support other types of columns.
- property columns¶
Return the column names of the columns managed by this map.
- Returns:
Column names of the columns managed by this map.
- Return type:
- make_template(additional_cols=None, show_counts=True)[source]¶
Return a dataframe template.
- Parameters:
- Returns:
A dataframe containing the template.
- Return type:
DataFrame
- Raises:
HedFileError – If additional columns are not disjoint from the key columns.
Notes
The template consists of the unique key columns in this map plus additional columns.
- remap(data)[source]¶
Remap the columns of a dataframe or columnar file.
- Parameters:
data (DataFrame, str) – Columnar data (either DataFrame or filename) whose columns are to be remapped.
- Returns:
New dataframe with columns remapped.
List of row numbers that had no correspondence in the mapping.
- Return type:
- Raises:
HedFileError – If data is missing some of the key columns.
- static remove_quotes(df, columns=None)[source]¶
Remove quotes from the specified columns and convert to string.
- Parameters:
df (Dataframe) – Dataframe to process by removing quotes.
columns (list) – List of column names. If None, all columns are used.
Notes
Replacement is done in place.
- update(data, allow_missing=True)[source]¶
Update the existing map with information from data.
- Parameters:
- Raises:
HedFileError – If there are missing keys and allow_missing is False.
TemporalEvent¶
Annotation utilities¶
Utilities to facilitate annotation of events in BIDS.
- check_df_columns(df, required_cols=('column_name', 'column_value', 'description', 'HED')) list[str][source]¶
Return a list of the specified columns that are missing from a dataframe.
- df_to_hed(dataframe, description_tag=True) dict[source]¶
Create sidecar-like dictionary from a 4-column dataframe.
- Parameters:
dataframe (DataFrame) – A four-column Pandas DataFrame with specific columns.
description_tag (bool) – If True description tag is included.
- Returns:
A dictionary compatible with BIDS JSON tabular file that includes HED.
- Return type:
Notes
The DataFrame must have the columns with names: column_name, column_value, description, and HED.
- extract_tags(hed_string, search_tag) tuple[str, list[str]][source]¶
Extract all instances of specified tag from a tag_string.
- generate_sidecar_entry(column_name, column_values=None) dict[source]¶
Create a sidecar column dictionary for column.
- Parameters:
column_name (str) – Name of the column.
column_values – List of column values.
- hed_to_df(sidecar_dict, col_names=None) DataFrame[source]¶
Return a 4-column dataframe of HED portions of sidecar.
- Parameters:
- Returns:
Four-column spreadsheet representing HED portion of sidecar.
- Return type:
DataFrame
Notes
The returned DataFrame has columns: column_name, column_value, description, and HED.
- series_to_factor(series) list[int][source]¶
Convert a series to an integer factor list.
- Parameters:
series (pd.Series) – Series to be converted to a list.
- Returns:
list[int] - contains 0’s and 1’s, empty, ‘n/a’ and np.nan are converted to 0.
- str_to_tabular(tsv_str, sidecar=None) TabularInput[source]¶
Return a TabularInput a tsv string.
- Parameters:
tsv_str (str) – A string representing a tabular input.
sidecar – An optional Sidecar object.
- strs_to_hed_objs(hed_strings, hed_schema) list[HedString] | None[source]¶
Returns a list of HedString objects from a list of strings.
- Parameters:
hed_strings (string or list) – String or strings representing HED annotations.
hed_schema (HedSchema or HedSchemaGroup) – Schema version for the strings.
- Returns:
A list of HedString objects or None.
- Return type:
- strs_to_sidecar(sidecar_strings) Sidecar | None[source]¶
Return a Sidecar from a sidecar as string or as a list of sidecars as strings.
BIDS tools¶
BidsDataset¶
- class BidsDataset(root_path, schema=None, suffixes=<object object>, exclude_dirs=<object object>)[source]¶
Bases:
objectA BIDS dataset representation primarily focused on HED evaluation.
- schema¶
The schema used for evaluation.
- Type:
- get_file_group(suffix)[source]¶
Return the file group of files with the specified suffix.
- Parameters:
suffix (str) – Suffix of the BidsFileGroup to be returned.
- Returns:
The requested tabular group.
- Return type:
Union[BidsFileGroup, None]
- validate(check_for_warnings=False, schema=None)[source]¶
Validate the dataset.
- Parameters:
check_for_warnings (bool) – If True, check for warnings.
schema (HedSchema or HedSchemaGroup or None) – The schema used for validation.
- Returns:
List of issues encountered during validation. Each issue is a dictionary.
- Return type:
BidsFile¶
- class BidsFile(file_path)[source]¶
Bases:
objectA BIDS file with entity dictionary.
Notes
This class may hold the merged sidecar giving metadata for this file as well as contents.
- property contents¶
Return the current contents of this object.
- get_key(entities=None)[source]¶
Return a key for this BIDS file given a list of entities.
- Parameters:
entities (tuple) – A tuple of strings representing entities.
- Returns:
A key based on this object.
- Return type:
Notes
If entities is None, then the file path is used as the key.
- set_contents(content_info=None, overwrite=False)[source]¶
Set the contents of this object.
- Parameters:
content_info (Any) – JSON dictionary The contents appropriate for this object.
overwrite (bool) – If False and the contents are not empty, do nothing.
Notes
Do not set if the contents are already set and no_overwrite is True.
BidsFileGroup¶
- class BidsFileGroup(root_path, file_list, suffix='events')[source]¶
Bases:
objectContainer for BIDS files with a specified suffix.
- suffix¶
The file suffix specifying the class of file represented in this group (e.g., events).
- Type:
- sidecar_dir_dict¶
Dictionary whose keys are directory paths and values are list of sidecars in the corresponding directory.
- Type:
- static create_file_group(root_path, file_list, suffix)[source]¶
Construct a BidsFileGroup from a list of files sharing the given suffix.
- Parameters:
- Returns:
The constructed group, or None if it contains no sidecars or data files.
- Return type:
BidsFileGroup or None
- get_task_names()[source]¶
Return a sorted list of unique task names found in the file group’s TSV and JSON filenames.
- Returns:
Sorted list of unique task name strings (the
xxxxportion oftask-xxxxentities).- Return type:
Notes
Parses both
sidecar_dictanddatafile_dictfile paths.The BIDS
task-entity is matched case-insensitively.
- summarize(value_cols=None, skip_cols=None)[source]¶
Return a BidsTabularSummary of group files.
- Parameters:
- Returns:
A summary of the number of values in different columns if tabular group.
- Return type:
Union[TabularSummary, None]
Notes
The columns that are not value_cols or skip_col are summarized by counting
the number of times each unique value appears in that column.
- validate(hed_schema, extra_def_dicts=None, check_for_warnings=False)[source]¶
Validate the sidecars and datafiles and return a list of issues.
- Parameters:
hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
check_for_warnings (bool) – If True, include warnings in the check.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
- validate_datafiles(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶
Validate the datafiles and return an error list.
- Parameters:
hed_schema (HedSchema) – Schema to apply to the validation.
extra_def_dicts (DefinitionDict) – Extra definitions that come from outside.
error_handler (ErrorHandler) – Error handler to use.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
Notes: This will clear the contents of the datafiles if they were not previously set.
- validate_sidecars(hed_schema, extra_def_dicts=None, error_handler=None)[source]¶
Validate merged sidecars.
- Parameters:
hed_schema (HedSchema) – HED schema for validation.
extra_def_dicts (DefinitionDict) – Extra definitions.
error_handler (ErrorHandler) – Error handler to use.
- Returns:
A list of validation issues found. Each issue is a dictionary.
- Return type:
BidsSidecarFile¶
- class BidsSidecarFile(file_path)[source]¶
Bases:
BidsFileA BIDS sidecar file.
- clear_contents()¶
Set the contents attribute of this object to None.
- property contents¶
Return the current contents of this object.
- get_entity(entity_name)¶
Return the entity value for the specified entity.
- get_key(entities=None)¶
Return a key for this BIDS file given a list of entities.
- Parameters:
entities (tuple) – A tuple of strings representing entities.
- Returns:
A key based on this object.
- Return type:
Notes
If entities is None, then the file path is used as the key.
- is_sidecar_for(obj)[source]¶
Return True if this is a sidecar for obj.
- Parameters:
obj (BidsFile) – A BidsFile object to check.
- Returns:
True if this is a BIDS parent of obj and False otherwise.
- Return type:
Notes
A sidecar is a sidecar for itself.
- static merge_sidecar_list(sidecar_list, name='merged_sidecar.json')[source]¶
Merge a list of sidecars into a single sidecar.
BidsTabularFile¶
- class BidsTabularFile(file_path)[source]¶
Bases:
BidsFileA BIDS tabular file including its associated sidecar.
- clear_contents()¶
Set the contents attribute of this object to None.
- property contents¶
Return the current contents of this object.
- get_entity(entity_name)¶
Return the entity value for the specified entity.
- get_key(entities=None)¶
Return a key for this BIDS file given a list of entities.
- Parameters:
entities (tuple) – A tuple of strings representing entities.
- Returns:
A key based on this object.
- Return type:
Notes
If entities is None, then the file path is used as the key.
- set_contents(content_info=None, overwrite=False)[source]¶
Set the contents of this tabular file (a TabularInput object). It’s sidecar should already be set.
- Parameters:
content_info (None) – This always uses the internal file_path to create the contents.
overwrite (bool) – If False (The Default), do not overwrite existing contents if any.
BIDS utilities¶
BIDS utility functions for schema loading, sidecar merging, and inheritance chain resolution.
- get_candidates(source_dir, tsv_file_dict)[source]¶
Return sidecar JSON files in source_dir that are applicable to tsv_file_dict.
- get_merged_sidecar(root_path, tsv_file)[source]¶
Return a merged sidecar dict following BIDS inheritance rules for a given TSV file.
- get_schema_from_description(root_path)[source]¶
Load the HED schema version declared in the BIDS dataset_description.json.
- matches_criteria(json_file_dict, tsv_file_dict)[source]¶
Return True if a candidate sidecar JSON file applies to the given TSV file.
A sidecar applies when its extension is
.json, its suffix matches the TSV, and all BIDS entities in the JSON filename have equal values in the TSV filename.
- parse_bids_filename(file_path)[source]¶
Split a filename into BIDS-relevant components.
- Parameters:
file_path (str) – Path to be parsed.
- Returns:
Dictionary with keys ‘basename’, ‘suffix’, ‘prefix’, ‘ext’, ‘bad’, and ‘entities’.
- Return type:
Notes
Splits into BIDS suffix, extension, and a dictionary of entity name-value pairs.
Utility functions¶
DataFrame utilities¶
Data handling utilities involving dataframes.
- delete_columns(df, column_list)[source]¶
Delete the specified columns from a dataframe.
- Parameters:
df (DataFrame) – Pandas dataframe from which to delete columns.
column_list (list) – List of candidate column names for deletion.
Notes
The deletion of columns is done in place.
This does not raise an error if df does not have a column in the list.
- delete_rows_by_column(df, value, column_list=None)[source]¶
Delete rows where columns have this value.
- Parameters:
Notes
All values are converted to string before testing.
Deletion is done in place.
- get_eligible_values(values, values_included)[source]¶
Return a list of the items from values that are in values_included or None if no values_included.
- get_new_dataframe(data)[source]¶
Get a new dataframe representing a tsv file.
- Parameters:
data (DataFrame or str) – DataFrame or filename representing a tsv file.
- Returns:
- A dataframe containing the contents of the tsv file or if data was
a DataFrame to start with, a new copy of the DataFrame.
- Return type:
DataFrame
- Raises:
A filename is given, and it cannot be read into a Dataframe.
- get_row_hash(row, key_list)[source]¶
Get a hash key from key column values for row.
- Parameters:
row (DataSeries)
key_list (list)
- Returns:
Hash key constructed from the entries of row in the columns specified by key_list.
- Return type:
- Raises:
If row doesn’t have all the columns in key_list HedFileError is raised.
- get_value_dict(tsv_path, key_col='file_basename', value_col='sampling_rate')[source]¶
Get a dictionary of two columns of a dataframe.
- Parameters:
- Returns:
Dictionary with key_col values as the keys and the corresponding value_col values as the values.
- Return type:
- Raises:
HedFileError – When tsv_path does not correspond to a file that can be read into a DataFrame.
- make_info_dataframe(col_info, selected_col)[source]¶
Get a dataframe from selected columns.
- Parameters:
- Returns:
- A two-column dataframe with first column containing values from the
dictionary whose key is selected_col and whose second column are the corresponding counts. The returned value is None if selected_col is not a top-level key in col_info.
- Return type:
- reorder_columns(data, col_order, skip_missing=True)[source]¶
Create a new dataframe with columns reordered.
- Parameters:
- Returns:
A new reordered dataframe.
- Return type:
DataFrame
- Raises:
HedFileError – If col_order contains columns not in data and skip_missing is False.
If data corresponds to a filename from which a dataframe cannot be created. –
- replace_values(df, values=None, replace_value='n/a', column_list=None)[source]¶
Replace string values in specified columns.
- Parameters:
df (DataFrame) – Dataframe whose values will be replaced.
values (list, None) – List of strings to replace. If None, only empty strings are replaced.
replace_value (str) – String replacement value.
column_list (list, None) – List of columns in which to do replacement. If None all columns are processed.
- Returns:
number of values replaced.
- Return type:
File/IO utilities¶
Utilities for generating and handling file names.
- check_filename(test_file, name_prefix=None, name_suffix=None, extensions=None)[source]¶
Return True if correct extension, suffix, and prefix.
- Parameters:
test_file (str) – Path of filename to test.
name_prefix (list, str, None) – An optional name_prefix or list of prefixes to accept for the base filename.
name_suffix (list, str, None) – An optional name_suffix or list of suffixes to accept for the base file name.
extensions (list, str, None) – An optional extension or list of extensions to accept for the extensions.
- Returns:
True if file has the appropriate format.
- Return type:
Notes
Everything is converted to lower case prior to testing so this test should be case-insensitive.
None indicates that all are accepted.
- extract_suffix_path(path, prefix_path)[source]¶
Return the suffix of path after prefix path has been removed.
Notes
This function is useful for creating files within BIDS datasets.
- get_allowed(value, allowed_values=None, starts_with=True)[source]¶
Return the portion of the value that matches a value in allowed_values or None if no match.
- Parameters:
- Returns:
portion of value that matches the various allowed_values.
- Return type:
Notes
match is done in lower case.
- get_alphanumeric_path(pathname, replace_char='_')[source]¶
Replace sequences of non-alphanumeric characters in string (usually a path) with specified character.
- get_file_list(root_path, name_prefix=None, name_suffix=None, extensions=None, exclude_dirs=None)[source]¶
Return paths satisfying various conditions.
- Parameters:
root_path (str) – Full path of the directory tree to be traversed (no ending slash).
name_prefix (list, str, None) – An optional prefix for the base filename.
name_suffix (list, str, None) – An optional suffix for the base filename.
extensions (list, None) – A list of extensions to be selected.
exclude_dirs (list, None) – A list of paths to be excluded.
- Returns:
The full paths.
- Return type:
Notes: Exclude directories are paths relative to the root path.
- get_filtered_by_element(file_list, elements)[source]¶
Filter a file list by whether the base names have a substring matching any of the members of elements.
- get_filtered_list(file_list, name_prefix=None, name_suffix=None, extensions=None)[source]¶
Get list of filenames satisfying the criteria.
Everything is converted to lower case prior to testing so this test should be case-insensitive.
- get_path_components(root_path, this_path)[source]¶
Get a list of the remaining components after root path.
- Parameters:
- Returns:
A list with the remaining elements directory components to the file.
- Return type:
Union[list, None]
Notes: this_path must be a descendant of root_path.
- get_task_dict(files)[source]¶
Return a dictionary of the tasks that appear in the file names of a list of files.
- get_timestamp()[source]¶
Return a timestamp string suitable for using in filenames.
- Returns:
Represents the current time.
- Return type:
Schema utilities¶
Utilities