Summarize column names¶

The summarize_column_names tracks the unique column name patterns found in data files across the dataset and which files have these column names. This summary is useful for determining whether there are any non-conforming data files.

Often event files associated with different tasks have different column names, and this summary can be used to verify that the files corresponding to the same task have the same column names.

A more problematic issue is when some event files for the same task have reordered column names or use different column names.

Purpose¶

Use this operation to:

Verify consistent column structure across dataset files
Identify files with non-standard column arrangements
Check for column order inconsistencies
Document column patterns for different tasks

Parameters¶

The summarize_column_names operation only has the two parameters required of all summaries.

Parameters for the summarize_column_names operation.

Parameter	Type	Description
summary_name	str	A unique name used to identify this summary.
summary_filename	str	A unique file basename to use for saving this summary.
append_timecode	bool	(Optional: Default false) If true, append a time code to filename.

Example¶

The following example remodeling file produces a summary, which when saved will appear with file name AOMIC_column_names_xxx.txt or AOMIC_column_names_xxx.json where xxx is a timestamp.

A JSON file with a single summarize_column_names summarization operation.

[{
    "operation": "summarize_column_names",
    "description": "Summarize column names.",
    "parameters": {
        "summary_name": "AOMIC_column_names",
        "summary_filename": "AOMIC_column_names"
    }    
}]

Results¶

When this operation is applied to the sample remodel event file, the following text summary is produced.

Result of applying summarize_column_names to the sample remodel file.

Summary name: AOMIC_column_names
Summary type: column_names
Summary filename: AOMIC_column_names

Summary details:

Dataset: Number of files=1
    Columns: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
        sub-0013_task-stopsignal_acq-seq_events.tsv

Individual files:

sub-0013_task-stopsignal_acq-seq_events.tsv: 
   ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
		

Since we are only summarizing one event file, there is only one unique pattern – corresponding to the columns: onset, duration, trial_type, stop_signal_delay, response_time, response_accuracy, response_hand, and response_time.

When the dataset has multiple column name patterns, the summary lists unique pattern separately along with the names of the data files that have this pattern.

The JSON version of the summary is useful for programmatic manipulation, while the text version shown above is more readable.

Notes¶

Simple but powerful operation for quality assurance
Helps identify files with different column structures
Each unique column pattern is listed with its associated files
Use early in analysis pipelines to catch structural issues
JSON output enables automated consistency checking
No complex parameters required - straightforward to use