Remap columns

The remap_columns operation maps combinations of values in m specified columns of a data file into values in n columns using a defined mapping. Remapping is useful during analysis to create columns in event files that are more directly useful or informative for a particular analysis.

Remapping is also important during the initial generation of event files from experimental logs. The log files generated by experimental control software often generate a code for each type of log entry. Remapping can be used to convert the column containing these codes into one or more columns with more informative information.

Purpose

Use this operation to:

  • Convert experimental log codes into meaningful categorical values

  • Combine multiple columns into a single informative column

  • Split one column into multiple columns based on value mappings

  • Translate between different coding schemes

Parameters

Parameters for the remap_columns operation.

Parameter

Type

Description

source_columns

list

A list of m names of the source columns for the map.

destination_columns

list

A list of n names of the destination columns for the map.

map_list

list

A list of mappings. Each element is a list of m source
column values followed by n destination values.
Mapping source values are treated as strings.

ignore_missing

bool

If false, source column values not in the map generate “n/a”
destination values instead of errors.

integer_sources

list

(Optional) A list of source columns that are integers.
The integer_sources must be a subset of source_columns.

A column cannot be both a source and a destination, and all source columns must be present in the data files. New columns are created for destination columns that are missing from a data file.

The remap_columns operation only works for columns containing strings or integers, as it is meant for remapping categorical codes. You must specify which source columns contain integers so that n/a values can be handled appropriately.

The map_list parameter specifies how each unique combination of values from the source columns will be mapped into the destination columns. If there are m source columns and n destination columns, then each entry in map_list must be a list with m + n elements. The first m elements are the key values from the source columns. The map_list should have targets for all combinations of values that appear in the m source columns unless ignore_missing is true.

After remapping, the tabular file will contain both source and destination columns. If you wish to replace the source columns with the destination columns, use a remove_columns transformation after the remap_columns.

Example

The remap_columns operation in the following example creates a new column called response_type based on the unique values in the combination of columns response_accuracy and response_hand.

A JSON file with a single remap_columns transformation operation.

[{ 
    "operation": "remap_columns",
    "description": "Map response_accuracy and response hand into a single column.",
    "parameters": {
        "source_columns": ["response_accuracy", "response_hand"],
        "destination_columns": ["response_type"],
        "map_list": [["correct", "left", "correct_left"],
                     ["correct", "right", "correct_right"],
                     ["incorrect", "left", "incorrect_left"],
                     ["incorrect", "right", "incorrect_left"],
                     ["n/a", "n/a", "n/a"]],
        "ignore_missing": true
    }
}]

In this example there are two source columns and one destination column, so each entry in map_list must be a list with three elements two source values and one destination value. Since all the values in map_list are strings, the optional integer_sources list is not needed.

Results

The results of executing the previous remap_column command on the sample remodel event file are:

Mapping columns response_accuracy and response_hand into a response_type column.

onset

duration

trial_type

stop_signal_delay

response_time

response_accuracy

response_hand

sex

response_type

0.0776

0.5083

go

n/a

0.565

correct

right

female

correct_right

5.5774

0.5083

unsuccesful_stop

0.2

0.49

correct

right

female

correct_right

9.5856

0.5084

go

n/a

0.45

correct

right

female

correct_right

13.5939

0.5083

succesful_stop

0.2

n/a

n/a

n/a

female

n/a

17.1021

0.5083

unsuccesful_stop

0.25

0.633

correct

left

male

correct_left

21.6103

0.5083

go

n/a

0.443

correct

left

male

correct_left

In this example, remap_columns combines the values from columns response_accuracy and response_hand to produce a new column called response_type that specifies both response hand and correctness information using a single code.

Notes

  • Source and destination columns remain after remapping; use remove_columns to clean up

  • Each map_list entry must have exactly m + n elements

  • Source values are treated as strings for matching

  • Use integer_sources to specify which source columns contain integers

  • Set ignore_missing to true to handle unmapped value combinations gracefully

  • Useful for both simplifying (many-to-one) and expanding (one-to-many) column structures