The basal ganglia are located interior to the cerebral cortex, and they receive prominent input from essentially all of the pallium, both isocortex and allocortex (Swanson 2000). Generally speaking, these inputs form relatively discrete channels that loop back to the same area of cerebral cortex from which they originated, a feature that is similar to the loops that form between the cerebellum and the cerebral cortex (Middleton and Strick 2000). This loop architecture, summarized in Figure 1, provides important perspective for this entry on models of basal ganglia. In the entry basal ganglia, anatomy and physiology are reviewed and the conclusion is reached that the two essential functions of the basal ganglia are action selection and reinforcement learning. In the present entry, key features of the computational architecture of the loops through the basal ganglia are first described. Then specific models of action selection are reviewed. Some of these models are anatomically and physiologically constrained, whereas others are abstract but are nevertheless motivated by behavioral functions of the basal ganglia. The level of modeling also varies from abstract, to population-based, to connectionist, to biophysical, to molecular. Hybrid models combine salient features at different neuroscientific levels.
Contents |
This entry focuses on the loops through the basal ganglia shown on the left side of Figure 1. Each of these loops functions as a macroscopic signal processing module, being comprised of thousands of microscopic loops. Figure 2 provides an overview of the computational architecture of one macroscopic BG module. The architecture of just two of its microscopic loops is diagrammed and shown interacting with a reinforcement learning (RL) circuit. There is a highly divergent projection from large numbers of cerebral cortical neurons (eight CCs are shown) to the two input nuclei of the BG network, namely the striatum (shaded box containing six spiny neurons (SpNs)) and the subthalamic nucleus (STN). For simplicity, only the divergent projections from CC5 are shown explicitly. The outputs of the two microscopic modules loop back to focal sites in cortex (not explicitly shown) via thalamic neurons T1 and T2. The dashed box labeled RL controls dopamine (DA) neuron firing to regulate reinforcement learning.
SpNs in the striatum are endowed with several features that make them ideal for classifying the complex patterns of activity sent from the cerebral cortex. These features include: (1) a high convergence ratio (Kincaid, Zheng and Wilson 1998) that presents nearly 20,000 different cortical inputs to any given spiny neuron, (2) a 3-factor learning rule that uses reward-predicting training signals from dopamine neurons to consolidate LTP learning (Houk, Adams and Barto 1995), (3) an attentional neuromodulatory factor (Nicola, Surmeier and Malenka 2000) that induces bistability and nonlinear amplification in spiny neurons (Gruber, Solla, Surmeier and Houk 2003), (4) competition among spiny neurons mediated by presynaptic and postsynaptic collateral inhibition (Plenz 2003; Houk, Bastianen, Fansler, Fishbach, Fraser, Reber, Roy and Simo 2007).
An unusual feature of loops through the basal ganglia is the presence of multiple inhibitory stages mediated by the neurotransmitter GABA. These are shown by the open arrow heads in the projections colored red. The most direct projection has two inhibitory stages. Since the output neurons (ONs) have high spontaneous rates, promoted by the excitatory projections from STN, when SpN1 fires a burst, it causes ON1 to pause, and this disinhibits discharge of T1, the targeted neuron in thalamus. This appears to initiate repetitive discharge in the reciprocal thalamocortical loop which activates a small cluster of CC neurons, amounting to the selection of action 1.
Indirect pathways through the basal ganglia encounter three inhibitory stages which results in increased firing in targeted ONs. For example, if SpN4 bursts, GPe2 pauses, which disinhibits ON2. This inhibits T2, thus helping to prevent action 2 from being selected. The projection via STN has a balancing action, since it excites both the GPe neurons and the ONs.
A newly reported architectural feature (Lei, Jiao, Mar and Reiner 2004) relevant here is that the cortical cells that participate in the thalamocortical loops that are disinhibited by BG output are not the output cells of cortex. These non-output cells probably project to pons and initiate activity in the loop through the cerebellum. This fits with the theory that the loop through the basal ganglia selects a ballpark estimate of an action which then needs to be amplified and refined by the loop through the cerebellum (Houk, Bastianen, Fansler, Fishbach, Fraser, Reber, Roy and Simo 2007). Cortical output cells instead project via striatum to the GPe cells that inhibit undesired actions.
One of the first modern models of the basal ganglia, the Albin-DeLong model (Albin, Young and Penney 1989; DeLong 1990), was motivated by hypokinetic clinical manifestations of Parkinson’s disease. According to this model, over-activity of BG output neurons, due to under-activity of spiny neurons in the direct pathway, inhibits thalamocortical activity to cause akinesia. The corollary model for normal subjects is that bursts of spiny neuron activity in the direct pathway disinhibit their thalamocortical targets to initiate desired actions and bursts of spiny neuron discharge in the indirect pathways inhibit their thalamocortical targets to suppress unwanted actions ( Figure 2). This model has been applied to limb movements (Mink 1996) and to eye movements (Hikosaka, Matsumura, Kojima and Gardiner 1993), although the circuit for eye movements is slightly different since they are controlled by a subcortical structure, the superior colliculus (McHaffie, Stanford, Stein, Coizet and Redgrave 2005). The clinical model tends to emphasize the M1 loop in Figure 1 for the control of action, but this has been extended to cognitive loops for the control of thinking (Graybiel 1997).
Wickens’(1993) theory of the striatum invoked a combination of local biophysical and global connectionist mechanisms – it is a hybrid model. The biophysical level dealt with the neurodynamics of competition between spiny neurons in the striatum (Wickens, Alexander and Miller 1991; Wickens and Oorschot 2000). Spiny neurons were given a potassium ion channel and synaptic input, and the large aspiny interneurons that project to them were given cholinergic synaptic transmission that had modulatory effects on the spiny neuron potassium channel. The aspiny interneurons (not explicitly shown in Figure 2) were given neuromodulatory effects of DA input to them. DA and cholinergic activity reciprocally modulated the degree of competition. Another key feature was the organization of spiny neurons into domains of competition that were based on cellular anatomy. The model predicted many peaks of activity surrounded by troughs of inhibition, which the authors related to the selection of muscle groups controlling actions. This is a more intricate version of the Hikosaka, Takikawa & Kawagoe (2000) neurophysiological model of gaze control, which explains how a salient target for eye movement can be selected through the direct pathway while many alternative targets that are less salient can be de-selected through indirect pathways.
At the global level, Wickens’ (1993) theory expanded on the Albin-DeLong model, postulating that disinhibition of thalamocortical loops leads to amplification in the cortical cell assemblies envisioned by Hebb (1949). Closely associated is Hebb’s postulate that a local learning rule (LTP) would allow the formation of cell assemblies as a function of repeated reactivations, as might occur when one practices an action. Transmission through BG in direct pathways that target a cell assembly would “ignite” positive feedback as in the active state of an attractor network, whereas transmission through indirect BG pathways would serve to dampen positive feedback and flatten the attractor landscape. An intricate balance of basal ganglionic control over the cerebral cortex would be required for stable performance of the cortical network on combined operational and learning time scales. Miller & Wickens (1991) discuss these problems and relate the model to selective attention and to the representation of predictable and controllable events.
The model of motor planning and control by Houk & Wise (1994) includes loops through the cerebellum (CB) in addition to loops through BG. This model’s architecture is founded on anatomical modularity in the spatially distributed network summarized in Figure 1. The signal processing operations in the loops were deduced from extracellular recordings of neuronal activity in behaving animals. The most recent version of this model (Houk 2005) incorporates key cellular and molecular neuroscientific data that highlight unique computational features and special learning rules for spiny neurons in the BG loop and for Purkinje cells in the loop through cerebellar cortex. Each of the CB loops shown in Figure 1 includes a reciprocal excitatory loop through the cerebellar nucleus that transmits positive feedback and behaves like an attractor network, analogous to the Hebbian cortical assemblages in Wickens’ global model. The prominent inhibitory projections from Purkinje cells to nuclear cells are believed to exert powerful control over the fixed points of the CB attractor networks (Houk and Mugnaini 2003). This mechanism might resolve the stability problems faced by Wickens’ global model.
Redgrave, Prescott & Gurney (1999a) combine bottom-up and top-down approaches toward modeling the loops through the basal ganglia. The bottom-up models stem from neuroscientific data and resemble the models already discussed. In contrast, the top down approach starts with a major behavioral problem that the organism faces. They draw on ideas from ethology and cybernetics to identify and characterize the action selection problem faced by an autonomous being. Their theory is that the basal ganglia are the vertebrate solution to the action selection problem. This theory fits well with the Actor-Critic model discussed in the section below on reinforcement learning. Abstract models of the Actor implement action selection in an AI framework (see Chapter 2 of Sutton & Barto (1998)).
Gurney, Prescott & Redgrave’s (2001) population-based model gave a somewhat different interpretation of the computational architecture of the BG, particularly the GPe stage in Figure 2. They left action selection in the direct pathway through BG and de-selection in the indirect pathway from cortex to STN to ON, but they conceived the pathways to and from GPe as a control loop that stabilizes the former selection pathways and enhances selectivity. In a later article, Humphries & Gurney (2002) introduced a population-based model of the thalamocortical circuit and showed how it was compatible with the intrinsic model of BG processing and actually improved the former’s performance of action selection.
Since many of the loops through the basal ganglia subserve cognitive regions of cerebral cortex, BG architecture clearly participates in the selection of thoughts and plans as well as actions (reviewed by Prescott, Bryson & Seth (2007)). The PBWM model of prefrontal cortex and its loops through the basal ganglia (Hazy, Frank and O'Reilly 2006) has served to explain decision-making performance in several cognitive tasks. PBWM is an abstract implementation of the signal processing operations schematized in Figure 2. Rather than modeling a choice between several potential actions, these authors simulate GO versus NoGO decisions in a variety of cognitive tasks. Since there are actually thousands of microscopic modules that comprise any given macroscopic module, large numbers of potential actions or thoughts might be interesting to investigate. Frank has begun to enhance the PBWM model with neuroscientific features (Frank 2005; O'Reilly and Frank 2006).
The models by Rubchinsky, Kopell & Sigvardt (2003), and by Humphries, Stewart & Gurney (2006) are quite detailed in their biophysics. For example, the latter authors included effects of dopamine in STN and GPe, transmission delays between neurons, and specific distributions of synaptic input over dendrites, parameters that were derived from experimental studies. While the focus was on explaining how various oscillations resulted from feedback between STN and GPe, the model was also able to do action selection in a manner compatible with the Gurney et al. (2001) population-level model. The STN-GPe feedback loop was found to be functionally decoupled by tonic dopamine under normal conditions and recoupled by dopamine depletion which simulates many in vivo experimental conditions and the situation in Parkinson’s disease. There are starting to be many similarly detailed models of BG cells or circuits and more focus on temporal detail, namely the presence of high frequency oscillations in the networks and their models. It is not yet clear whether high frequency oscillations are simply epiphenomena or are actually used by the brain for signal processing.
The striatal beat frequency (SBF) model by Matell and Meck (2000; 2004) relies on lower frequency oscillations (∼5-15 Hz) to learn time intervals of seconds to minutes by rewarding correct actions. To summarize the operations of this model (see Figure 2 in Lustig et al 2005), a reset stimulus starts oscillatory activity in cortex which reverberates at different frequencies in different cortical units. Spiny neurons learn to detect patterns of activity in the cortical input vector, presumably through the operation of competitive pattern classification as summarized in Figure 2. Different spatial patterns of coincidence occur at different time intervals, so spiny neurons learn to detect the pattern that corresponds to the desired time interval on the basis of reinforcement learning. This model was tested by simultaneous recordings from ensembles of cortical and striatal neurons (Matell, Meck & Nicolelis 2003). The data supported an important role for this circuit in interval timing but did not support the oscillatory assumptions.
Serial order processing, a crucial feature of higher order intelligence (Lashley 1951), was the focus of the Beiser & Houk (1998) hybrid model -- it was mainly connectionist but also included critical biophysical features. The rebound responses of thalamic neurons to disinhibitory output from BG loops are captured by T-type calcium ion channels and competitive pattern classification utilized GABAa collateral inhibition between SpNs. The model’s capacity for encoding the serial order of sensory events results from three computational features that combine in a cooperative manner: (1) a classification of patterns in the cortical input vector by computations within the striatum, (2) working memory of the outcome of pattern classification in cortical-thalamic loops, (3) a recursion-like operation brought about because the loop deposits the working memory of prior classifications into an updated input vector to striatum from cortex. The updated vector represents not only current events but also prior events that function as temporal context.
Spiny neurons have extensive GABAergic inhibitory collaterals within the striatum (shaded box in Figure 2) capable of making the SpN pattern classification operation competitive. Collateral inhibition is deemed an effective mechanism for mediating competition in some models (Beiser and Houk 1998; Plenz 2003) and an ineffective mechanism in others (Tepper, Koos & Wilson (2004)). In Houk et al. (2007), two kinds of collateral inhibition were modeled – postsynaptic and presynaptic. While postsynaptic inhibition was found to be effective, presynaptic inhibition was appreciably more so. Since presynaptic inhibition in the striatum appears to be phylogenetically more recent, this might help to explain the central paradox of schizophrenia that is being modeled by Crow (1997).
In the Botvinick & Plaut (2006) connectionist model, sequence information is encoded through sustained patterns of activation within a simple recurrent network (SRN) architecture. This computationally simple model provides a parsimonious account for numerous benchmark characteristics of immediate serial recall, including data that have been considered to preclude the application of recurrent neural networks in this domain. Unlike most competing accounts, the model deals naturally with findings concerning the role of background knowledge in serial order recall from working memory.
Many of our sequential actions rely on long-term memories that might be stored in BG or in cortex. Wickens & Arbuthnott (1993) model sequential actions by interactions in a network of SpNs. In contrast, Berns & Sejnowski (1998) propose a systems-level model based on delays in the loop between GPe and STN (not explicitly shown in Figure 2). When monkeys perform sequential saccades, interesting covariances develop between the striatum and frontal cortex that may serve to mark onsets and offsets of learned sequences (Fujii and Graybiel 2005).
Numerous models have addressed the basal ganglia's role in learning, based on the uncanny resemblance of dopaminergic cell firing to the requirements of an error signal in temporal difference (TD) learning. Such models map the Actor-Critic implementation of TD learning on to the basal ganglia (Barto, 1995; Houk, Adams & Barto, 1995; Montague, Dayan & Sejnowski, 1996; Schultz, Dayan & Montague, 1997), where, roughly, the Actor is mapped on to the selection function of the basal ganglia, and the Critic is mapped on to the RL circuit in Figure 2. As such the dopamine signal is envisaged as the teaching signal that alters the Actor's responses to maximise future reward.
Joel, Niv and Ruppin (2002) evaluated both anatomical and computational perspectives of Actor-Critic models. They review several models and conclude that they are not compatible with current anatomical data.
Though computationally elegant in approach, critics of these models have variously argued for its biological implausibility, citing:
The Bar-Gad, Morris & Bergman (2003) model promotes an alternative to action selection. It postulates that the main function of loops through BG is to reduce the dimensionality, perhaps using principal components analysis, of the representation of an action that is present in the area of cortex that provides the input. Anatomical evidence supports this theory. There are about 10 times as many cortical inputs to the striatum as there are spiny neurons and about 100 times as many spiny neurons as there are output neurons. The input information needs to be prioritized and compressed. This seems quite compatible with the action selection hypothesis rather than being an alternative. The dimensionality of potential actions (or thoughts) that are salient is likely to be much less than the dimensionality of the activations that are sent to BG from any given area of cerebral cortex.
Models of basal ganglia networks have also been studied for their dynamic properties rather than to pursue overt functional hypotheses. A trio of STN-GP network models explored separate but complementary dynamical properties of this sub-system. The model of Humphries & Gurney (2001) explored the slow-bursting properties of STN and GP neurons in-vitro (Plenz & Kitai, 1999); the population-level model of Gillies, Willshaw & Li (2002) showed the range of dynamic responses that could be achieved, including oscillatory responses; and the more detailed biophysical model of Terman et al (2002) generalised these results to show how dynamics of the network depended heavily on the pattern of connections within and between the STN and GP.
The uniform architecture of the many loops through the basal ganglia (Figures 1 & 2) has been an important constraint for most models of BG. Spiny neurons in the striatum have computational properties that are well suited for a pattern classification of the input vectors from different areas of cerebral cortex (CC) or from thalamus. The output of any given loop then selects approximate actions or thoughts, or even time intervals. Activity in a corresponding loop through the cerebellum (CB) then proceeds to amplify and refine the approximate action or thought that was selected by the BG, so as to achieve more accuracy in guiding behavior and in thinking. According to this overall model, cooperative computations by BG—CC—CB distributed processing modules leads to more success in the many trying experiences that the world presents.
Internal references
Action Selection, Basal Ganglia, Cerebellum, Models of Cerebellum, PCA, Reinforcement Learning, Reward, Reward Signals