Associative Learning and Reinforcement

Scope: Learning of stimulus-stimulus, stimulus-response, and action-outcome contingencies under reinforcement; Pavlovian and instrumental conditioning; extinction; reversal; model-based and model-free reinforcement learning; reward prediction error.

Out of scope: Structure learning without scalar reinforcement (that is Implicit and Statistical Learning).

This category contains 13 processes.


Associative learning

Process ID: hed_associative_learning

Learning of co-occurrence relations between stimuli or between stimuli and responses.

Tasks

The following tasks engage this process:

Recent references

  • Shanks (2010) Annual Review of Psychology 61:273–301


Extinction

Process ID: hed_extinction

Decrease in a previously reinforced response when reinforcement is withheld; a form of new inhibitory learning rather than erasure.

Tasks

The following tasks engage this process:

Fundamental references

  • Pavlov (1927) Conditioned Reflexes

  • Bouton (2004) Learning & Memory 11:485–494

Recent references

  • Dunsmoor, Niv, Daw & Phelps (2015) Neuron 88:47–63


Goal-directed behavior

Process ID: hed_goal_directed_behavior

Behavior that is sensitive to current outcome value, characteristic of action–outcome learning.

Tasks

The following tasks engage this process:

Fundamental references

  • Dickinson & Balleine (1994) Animal Learning & Behavior 22:1–18

Recent references

  • Balleine & O’Doherty (2010) Neuropsychopharmacology 35:48–69


Habit

Process ID: hed_habit

Behavior that is insensitive to the current value of its outcome, characteristic of stimulus–response learning.

Tasks

The following tasks engage this process:

Recent references

  • Balleine & O’Doherty (2010) Neuropsychopharmacology 35:48–69


Instrumental conditioning

Process ID: hed_instrumental_conditioning

Also known as: Operant conditioning — Skinnerian terminology; emphasizes the operant response and reinforcement schedules. Merged from separate entry 2026-04-18.

Learning that an action produces an outcome; also called operant conditioning. Encompasses both goal-directed (action–outcome) and habitual (stimulus–response) control, studied via reinforcement schedules and outcome-devaluation procedures.

Tasks

The following tasks engage this process:

Recent references

  • Staddon & Cerutti (2003) Annual Review of Psychology 54:115–144


Model-based learning

Process ID: hed_model_based_learning

Reinforcement learning that uses an internal model of the environment’s transition and reward structure to plan.

Tasks

The following tasks engage this process:

Fundamental references

  • Daw, Niv & Dayan (2005) Nature Neuroscience 8:1704–1711

Recent references

  • Daw, Gershman, Seymour, Dayan & Dolan (2011) Neuron 69:1204–1215


Model-free learning

Process ID: hed_model_free_learning

Reinforcement learning from cached value estimates updated by prediction errors, without an explicit model of the environment.

Tasks

The following tasks engage this process:

Recent references

  • Daw, Niv & Dayan (2005) Nature Neuroscience 8:1704–1711


Pavlovian conditioning

Process ID: hed_pavlovian_conditioning

Learning that a neutral stimulus predicts a biologically significant outcome, leading to conditioned responding.

Tasks

The following tasks engage this process:

Fundamental references

  • Pavlov (1927) Conditioned Reflexes

Recent references

  • LeDoux (2014) PNAS 111:2871–2878


Policy learning

Process ID: hed_policy_learning

Direct learning of a mapping from states to actions without necessarily estimating values.

No tasks in the current catalog are linked to this process.


Reinforcement learning

Process ID: hed_reinforcement_learning

Learning to select actions that maximize cumulative reward through experience with reward prediction errors.

Tasks

The following tasks engage this process:

Fundamental references

  • Schultz, Dayan & Montague (1997) Science 275:1593–1599

Recent references

  • Niv (2009) Journal of Mathematical Psychology 53:139–154


Reversal learning

Process ID: hed_reversal_learning

Relearning after contingencies between stimuli (or responses) and outcomes are switched.

Tasks

The following tasks engage this process:

Fundamental references

  • Iversen & Mishkin (1970) Experimental Brain Research 11:376–386

Recent references

  • Izquierdo, Brigman, Radke, Rudebeck & Holmes (2017) Neuroscience 345:12–26


Reward prediction error

Process ID: hed_reward_prediction_error

Signed difference between received and expected reward, instantiated by phasic midbrain dopamine firing.

Tasks

The following tasks engage this process:

Fundamental references

  • Schultz, Dayan & Montague (1997) Science 275:1593–1599

Recent references

  • Glimcher (2011) PNAS 108(Suppl 3):15647–15654


Value learning

Process ID: hed_value_learning

Acquisition of the expected value of stimuli, actions, or states from experience with outcomes.

Tasks

The following tasks engage this process:

Recent references

  • Rangel, Camerer & Montague (2008) Nature Reviews Neuroscience 9:545–556