From Scholarpedia - Reading time: 8 min
Contents |
As toddlers, we already know how to attract our parents’ attention by pretending to cry. Learning to anticipate the consequences of our actions is central to shaping our personalities, and is accomplished through many different means, from processing social feedback to acquiring the motor skills for sports, crafts, or handiwork. In our daily lives, much of this fundamental type of predictive learning takes place unnoticed as the brain subconsciously processes the constant stream of stimuli, assesses the importance of each one, and cross-correlates them with our behavior. Operant (or instrumental) conditioning is the process by which we learn about the consequences of our actions, e.g. not to touch a hot plate. The most famous operant conditioning experiment involves the “Skinner-Box” in which the psychologist B.F. Skinner trained rats to press a lever for a food reward. The animals were placed in the box and after some exploring would also press the lever, which would lead to food pellets being dispensed into the box. The animals quickly learned that they could control food delivery by pressing the lever.
To understand the neurobiological processes that perform these tasks, investigators must reduce the complexity of the environment to controlled, experimental circumstances, ideally involving only a single behavior and its consequence. One of the major obstacles is that in operant conditioning, the neural workings of the operant behavior are not so easy to trace. Gastropods in general are fantastic model systems for elucidating the neural control of behavior and Aplysia in particular is a renowned model system for the study of learning and memory. Using Aplysia to study the neurobiology of operant conditioning is a relatively straightforward strategy.
Aplysia is a snail with virtually no natural predators. In its natural habitat, it is surrounded by its food (seaweed) and only has to raise its head and bite to eat. Probably for these reasons, the animals exhibit only a comparatively small repertoire of spontaneous behaviors that would be suitable for operant conditioning. The logical choice is to study feeding behavior. The situation for studying operant conditioning of Aplysia feeding behavior is almost ideal:
During experiments in which neural activity is measured when an intact animal is taking bites that fail to grasp food, the esophageal nerve shows little activity. However, when the animal grasps and swallows seaweed, bursts of electrical activity in the esophageal nerve accompany the ingestion of food (Brembs, Lorenzetti, Reyes, Baxter, & Byrne, 2002). Presumably, the esophageal nerve transmits information about the presence of food during swallowing to the buccal ganglia.
The activity in the esophageal nerve that accompanies swallowing may be a reward signal. If so, Aplysia that receive stimulation of the esophageal nerve immediately after each bite (contingent reinforcement), so that each stimulation might function as virtual food, should exhibit more biting behavior than a yoked control group, that is, a group in which the animals receive the same sequence of stimulation independently of their behavior. Indeed, in a study testing this prediction, this virtual food appeared to function as a reward for biting: Compared with both the yoked control group and a group that never received any stimulation, Aplysia that received the stimulation after each bite subsequently produced more bites in a test phase without any stimulation. This increase in biting was seen not only immediately after the training, but also 24 hr later (Brembs et al., 2002).
Apparently, the reward signal from the esophageal nerve converges on the neural activity in the buccal CPG responsible for the behavior. This finding simplified the task of investigating operant conditioning in Aplysia: Instead of behavioral experiments involving the entire animal, researchers could focus on a well-characterized network of comparatively large neurons, numbering in the hundreds. Consequentially, the next steps were to characterize the reward signal further and to find the neurons that are modified by the signal. Such detailed experiments required removal of the buccal ganglia from the animal so that researchers could study the neurons neurophysiologically and apply drug treatments that would not be feasible in the intact animal.
Isolated buccal ganglia in a petri dish (in vitro) containing artificial seawater continue to spontaneously produce, in seemingly random order, neural patterns of excitation (buccal motor programs, BMPs) that can be related to the different feeding-related movements in the intact animal (Morton & Chiel, 1993). If these patterns are rewarded with the same type of electric stimulation of the esophageal nerve as in the experiment just described, in vitro operant conditioning takes place. Thus, isolated buccal ganglia that receive electrical stimulation after each BMP (contingent reinforcement) resembling a bite in the intact animal (i.e. an ingestion-like BMP, or iBMP) produce more iBMPs than ganglia of the yoked control group (Nargeot, Baxter, & Byrne, 1997). This effect is blocked when a substance that blocks the effect of the neurotransmitter dopamine, methyl-ergonovine, is added to the bath, implicating dopamine as the transmitter for the reward signal (Nargeot, Baxter, Patterson, & Byrne, 1999). Dopamine is also considered to be the prime transmitter for reward-related signals in humans and other mammals (Fiorillo, Tobler, & Schultz, 2003; O'Doherty, Dayan, Friston, Critchley, & Dolan, 2003).
Where in the feeding CPG in the buccal ganglion does dopamine act to make it produce more iBMPs? Neurons that can act as switches in the CPG, altering the output to produce different types of BMPs, are good candidates for playing a role in this function. Buccal neuron 51 (B51; Plummer & Kirk, 1990) is active late during an iBMP and is silent when the BMP resembles a movement that would reject an inedible item (a rejection-like BMP, or rBMP; Nargeot et al., 1997). Experimentally activating B51 during a BMP increases the likelihood that the BMP will become an iBMP. Conversely, silencing B51 during a BMP increases the likelihood that the BMP will become an rBMP (Nargeot, Baxter, & Byrne, 1999a). Thus, B51 seems to be a pattern-switching (or decision-making) neuron whose activation state largely determines the type of pattern the CPG will produce: If B51 is easily excited and likely to be active, iBMPs are more likely to occur, but if B51 is more difficult to activate, rBMPs are more likely to be produced. After in vitro operant conditioning, B51 is more easily activated in ganglia that received contingent reward after iBMPs than in yoked controls (Nargeot, Baxter, & Byrne, 1999a). Thus, one mechanism by which in vitro contingent reinforcement may bring about operant learning is by modifying the properties of a pattern-switching neuron to render the CPG more likely to produce the rewarded behavior. Indeed, if stimulations of the esophageal nerve are made contingent simply upon activity in B51 (i.e., when this activity is experimentally induced and not part of a spontaneous BMP), the resulting increase in excitability in B51 alone is sufficient to reproduce some of the results of the in vitro operant conditioning just described (Nargeot, Baxter, & Byrne, 1999b). It is unknown how B51 changes if rBMPs are rewarded. Is B51 relevant only in the isolated buccal ganglia, or does the in vitro preparation actually provide an accurate picture of the processes that occur inside the intact animal’s central nervous system (i.e., in vivo)? B51 neurons from animals that have undergone the in vivo operant conditioning procedure show a higher excitability than B51 neurons dissected from yoked control animals (Brembs et al., 2002), mirroring the differences seen after in vitro operant conditioning. These experiments show that in vivo and in vitro operant conditioning of Aplysia feeding behavior produce the same kind of neural correlates of the operant memory. Thus, we really can learn about the neural mechanisms of operant conditioning in vivo by studying parts of the isolated nervous system.
Studies of operant conditioning in Aplysia have covered all levels of complexity, from behavior, neural network, and single cells down to the molecules involved in changing the neurons’ properties. Aplysia neurons are so big and robust that they can be taken out of the ganglion and cultured in petri dishes for several days. Based on the evidence for the convergence of a dopamine signal onto B51 activity during iBMPs, a single-cell analogue of operant conditioning can be established (Brembs et al., 2002), as in the following example. B51 is active late during an iBMP, and such activity can be triggered in cultured B51 neurons. Immediately following this activity, a pulse of dopamine is applied, to mimic the dopaminergic reward signal that follow an iBMP (in vitro) or a bite (in vivo) in the kind of experiments described above. B51 neurons that have received seven such contingent dopamine applications show a higher excitability than B51 neurons that have received the dopamine exactly between two activations (Brembs et al., 2002). In other words, the effects of the contingent dopamine treatments parallel the effects found after both in vivo and in vitro operant conditioning. The molecular processes inside B51 that are involved in establishing these effects are currently under investigation. Together, the results obtained thus far are consistent with the following model: In the intact animal, the dopamine-mediated food reward is contingent on B51 activity late during the rewarded behavior. The convergence of behavioral predictor and rewarding consequence in B51 leads to a modification of the biophysical properties of the neuron so that it is more likely to be active. These changes last for at least 24 hr. At least in part, these biophysical changes in B51, in turn, contribute to the increased frequency of bites seen after in vivo training.