Back to my home page.

NOTE: this page is more than 10 years old and more recent models exist from our group and others. But many of these principles still apply and the animations may still be informative for some. The basal ganglia (BG) participate in various aspects of cognition and behavior by interacting with and modulating different parts of frontal cortex. Various theorists propose that the BG participate in "action selection" and reinforcement learning. Because of the complexity of the network interactions, computational models have been particularly useful for investigating dynamics of various BG areas during response selection and learning. Extensions of the model described below to include interactions with areas of prefrontal cortex have also enabled examination of BG roles in higher level cognitive functions such as working memory, attentional shifting, and decision making (pfc papers). Other work incorporates the effects of noradrenaline modulation in cortex, integrated within the context of our overall BG modeling framework. This enables us to explore systems-level interactions between dopamine and noradrenaline in learning and action selection (paper, movies), and in response inhibition (forthcoming).

Basal Ganglia Model in Action

NOTE: These mpegs are viewable best with MPlayer (available for Linux, Windows and Mac OSX) or Windows Media Player (for Windows and Mac). The movies run very quickly -- they are much easier to follow if you play them in slow motion. Under linux, you can download the mpegs, then run "mplayer -speed 0.1 [filename]". This will play the movie at 10% speed. Under Windows, you should download the GUI version of mplayer, in which you can have a control panel that allows you to replay the video after it ends so that it does not vanish. While the video is playing use the "{" command (that is, press Shift + "[" ) . Every hit of { will slow down play by 50%. You can then hit play again and see it under this new speed. Press } to speed up if it becomes too slow.

Step by Step Components

To understand what you're looking at, it is best to read the more detailed description of the computational models and associated biology in the published modeling papers (see Cohen & Frank, 2009 or Wiekci & Frank, 2010 for recent reviews). Here the conceptual components of the model are motivated step by step in a basic model selecting between two responses, before showing movies of a larger model under various conditions. If you are already familiar with BG circuitry, you may want to skip to the full model.

First let's consider the basic "default" function of the BG to suppress responses. A stimulus input pattern (where input units represent the features of the stimulus and environment) is presented, and all potential competing responses initially become noisily activated (or "considered") in pre/motor cortex. In the absence of a striatum, neurons in the BG output nucleus GPi (globus pallidus internal segment) are tonically active, and send inhibitory projections onto the the thalamus. Because bottom-up thalamic-cortical activity is required for a response to become sufficiently activated, this inhibition of the thalamus suppresses all responses from getting executed, leading to only noisy cortical activity and no action selection (movie).

Here, the Striatum is added to the model, with "Go" units in the left half of the Striatum layer, and separate columns for each response (here just R1 and R2). To facilitate a response, striatal "Go" units become activated, and inhibit the corresponding column in GPi. This leads to "disinhibition" of corresponding thalamus units, ultimately facilitating the execution of the selected response in cortex (movie). Note that in the above example Go units for R2 also became weakly active, but did not sufficiently inhibit the GPi. This shows how the disinhibition circuitry of the basal ganglia serves to gate the execution of cortical actions, but does not directly excite them.

Here we add "NoGo" units to the right half of the striatum, each again with their corresponding column of units representing NoGo-R1 and NoGo-R2. These NoGo units inhibit the GPe (external segment), which in turn disinhibits the GPi, and therefore has an opposing effect on response selection -- to suppress a particular response from getting executed (movie). Note that in this example noisy cortical activity initially activates the incorrect R2. NoGo-R2 units become activated to suppress the execution of R2 (which had previously been associated with negative feedback in the context of this input stimulus), allowing R1 to become subsequently facilitated.

Further contributing to the control circuitry, the subthalamic nucleus (STN) is driven directly by cortical activity via the "hyperdirect pathway" and sends an initial "Global NoGo" or "Hold your Horses" signal, preventing the BG from prematurely selecting any response (movie). The greater the conflict between competing responses, the greater the initial Global NoGo signal. Thus, this dynamically adaptive STN signal allows the striatum to first properly integrate BG Go and NoGo signals to determine the most appropriate response in a given context (in this case R1 and R2 are both good, but R1 ends up winning).

Finally, dopamine from the SNc modulates the relative balance of Go versus NoGo activity via simulated D1 and D2 receptors. Dopamine leads to excitatory responses on Go units via D1 receptors ( in reality D1 stimulation excites only units that are already excited by synaptic input, or in the so-called "up-state", while inhibiting units that have less activity, such as those in the "down-state". This function increases the signal to noise ratio of DA effects onto task-relevant Go responses, and is implemented in the model by increasing the gain of the activation function of Go units with increased DA ). In contrast, DA is uniformally inhibitory on NoGo units via D2 receptors. This differential effect of DA on Go and NoGo units, via D1 and D2 receptors, affects performance (i.e more tonic DA leads to more Go and associated response vigor, faster reaction times) and, critically, learning (see also below) (positive feedback; DA burst follows response), (negative feedback DA dip follows response).

The net result, as can be seen in the full BG circuitry movies below, is that the BG selects one response if a particular "Go" signal in the striatum is stronger than its corresponding "NoGo" signal, while concurrently suppressing alternative responses. (In actuality, the response that is selected is that with the greatest Go-NoGo difference). Following each response selection, the model is given either positive or negative feedback, which translates into a burst or dip in SNc dopamine, respectively. The resulting effects on striatal activity drive Go or NoGo learning about the response just selected. Don't blink, or you might miss it!

Full Model Movies

Below are some mpeg video captures of a BG model that selects among four competing responses (R1-R4). This model has been applied to account for patterns of learning deficits in patients with dsyfunction of the basal ganglia dopamine system, such as those with Parkinson's disease and ADHD, and has made novel behavioral predictions, some of which have been confirmed in PD and other populations. See our online papers for these studies and for more detailed biological justifications and modeling considerations.

Early training / positive feedback trial #1

Early training / positive feedback trial #2

In these early training trials, all responses are initially equally active in motor cortex, leading to a high degree of response conflict and a strong initial STN Global NoGo signal. Nevertheless the Go signal for the correct response prevails in Striatum (via random initial synaptic weights or as a result of prior learning). A dopamine burst subsequently "rewards" the response, allowing the striatum to learn about Go vs NoGo activity states for each response at the time of the DA burst. This learning allows the striatum to facilitate selection of the rewarding response in future presentations of the same stimulus context. The mechanisms by which DA affects activity and plasticity are motivated by several biological experiments and are consistent with effects of dopamine D1 and D2 pharmacological agents on activity and long term plasticity.

Early training / negative feedback trial #1

Early training / negative feedback trial #2

In these trials a dopamine dip during negative feedback leads to increased NoGo activity and decreased Go activity (relative to the response selection phase), allowing the striatum to learn to avoid selecting that particular response in the future. Note that the model is never "told" the correct response (as in error-driven learning algorithms). Motor cortex continues to activate the incorrect response during negative feedback, but instead has to learn which responses lead to positive vs negative reinforcement by trial and error and associated changes in BG/DA. (The Output layer shows the response the correct response during feedback, but this is only used for display purposes and does not affect learning. )

After more extended training: Cortical response selection

In this trial the model has been trained for 25 "epochs" on a probabilistic learning task. Note that here, R1 "wins" the competition in motor cortex and sends output activation prior to its associated BG Go signal. This is because as a response is increasingly facilitated by the BG in response to an input stimulus, Hebbian principles drive learning directly between the stimulus input and response units in premotor cortex. In effect, the BG/DA system trains the cortical system. This is consistent with observations in both animals and humans that learning related activity occurs in the striatum prior to frontal cortex, and that the BG/DA system is particularly important in the learning of new behaviors, but less important for well ingrained habits. In this particular example, due to the probabilistic nature of the task, negative feedback was delivered.

Response Sequencing trial:

This trial begins with one response (R3) already active due to it having been selected just before the input stimulus is presented. Note that at the beginning of the trial, R3 motor cortical units are fully active while others are suppressed. But because R3 had not been positively reinforced in response to this new input stimulus, a BG NoGo signal suppresses the initial R3 selection and allows switching to an alternative response (R2). This R2 response is then reinforced with a dopamine burst because it was the correct choice in this particular task context.

Choosing among two conflicting responses: STN Intact

Choosing among two conflicting responses: STN "Lesion"

In these trials, networks were faced with making a choice in response to two simultaneously presented stimulus cues (two columns of input units), each of which had been separately associated with a different response in the past. In this case, R4 is the correct choice, because it had been associated with the highest probability (80%) of reward, whereas R1 had been associated with 60% reward in response to the other stimulus. This is a high-conflict "win/win" decision, in which the STN is important for preventing premature responding. Noise in motor cortex was increased in this example for demonstration purposes. When the STN is intact it prevents early responding and allows integration of noisy signals; as result the model correctly chooses R4. In contrast, with the STN lesion (inactive STN), the model responds prematurely to R1 as it happens to become more active early in settling of network activity states and is impulsively facilitated.

Selecting 2 Responses: STN Self-Correction

In this spurious trial, the BG initially facilitates two responses simultaneously, which is not a good thing when having to make choices! However, note that when these two responses are fully excited in premotor cortex, the additional response conflict drives a second STN Global NoGo signal; this leads to excitation of GPi and inhibition of the Thalamus. The lack of bottom-up support for both responses makes it easier for one to dominate and suppress the other (via lateral inhibition that is present in cortex), leading to the selection of just one response. At this point, the conflict in cortex goes down, and the STN Global NoGo signal turns off.

Incorporating Norepinephrine Function into the Model

This model explores the effects of norepinephrine (NE) in modulating cortical response selection processes, as simulated by Aston-Jones, Cohen and colleagues (see Aston-Jones & Cohen, 2005, Annual Review of Neuroscience). Like DA cells in the SNc, firing states of NE-releasing neurons in the locus coeruleus (LC) come in both tonic and phasic modes. In electrophysiological recordings, LC cells release phasic NE bursts during periods of focused attention, infrequent target detection, and good task performance. This phasic NE burst is thought to reflect the outcome of the response selection process and serves to facilitate response execution. In contrast, poor performance is accompanied by a high tonic, but low phasic, state of LC firing. The authors simulated the effects of these LC modes on action selection such that NE modulated the gain of the activation function in cortical response units (Usher et al, 1999). They showed that phasic NE release leads to ``sharper'' cortical representations and a tighter distribution of reaction times, whereas the high tonic state was associated with noisy activity and more RT variability, as observed in their empirical work with monkeys. They further hypothesized that increases in tonic NE during poor performance may be adaptive, in that it may enable the representation of alternate competing cortical actions during exploration of new behaviors.

The below simulations explore how these effects play out within the context of the overall BG/DA action selection circuitry (see Frank, Scheres & Sherman (2007) and Frank, Santamaria, O'Reilly & Willcutt (2007) for simulation results, discussion, and implications for ADHD). We showed that (a) the tonic LC mode leads to increased representation of multiple cortical responses, (b) more reaction time variability, and (c) more erratic trial-to-trial response switching. In the phasic LC mode, tonic LC firing is low but punctate phasic bursts are elicited via top-down excitatory projections from premotor cortex. In this manner stimulus-evoked premotor activity (which arises from prior stimulus-response learning; see above) elicits a phasic LC burst, which in turn reciprocally modulates the gain of premotor units and facilitates the selection and execution of the desired response. These effects turn out to be especially critical in the presence of noisy cortical activity. To explore effects of LC/NE on noisy premotor activity, we delay the stimulus onset so that noisy activity is present in premotor cortex prior to processing of a task-relevant stimulus (as is likely the case in natural environments, but is typically not simulated).

LC Tonic ("good" noise)

LC Phasic ("good" noise)

In these trials, noisy activity in premotor units prior to stimulus onset happens to favor the correct cortical response (R1) associated with the particular stimulus. In this "good noise" case, both tonic and phasic LC modes are associated with swift facilitation of the correct response.

In the "bad noise" case, noisy premotor activity prior to stimulus onset happens to favor R1 units, but R2 is the correct response for the particular input stimulus. Once the stimulus is presented premotor R2 units begin to become active. This is because the network had already been trained sufficiently such that cortical units had developed strong synaptic strengths directly from the simulus units in the input layer.

LC Tonic ("bad" noise)
High tonic LC activity and associated NE nondiscriminately enhances cortical activity, including initial noisy representations. Thus when R2 units become active in response to the stimulus, these have to compete with the already-active R1 units. In the BG model, in addition to leading to increased inhibitory competition in premotor cortex itself, the resulting increased response conflict drives a strong STN Global NoGo signal (see above), slowing reaction time until R2 can be selected and R1 suppressed. If the correct response units happen to be more active during initial noisy activity (as in the "good noise" case above), this slowing does not occur. The resulting effects across multiple noisy trials lead to increased RT variability, and somewhat slowed overall responses, as is seen in ADHD. This same tonic NE / cortical noise can also lead to exploration of new responses and erratic trial-to-trial switching of responses. Indeed, we found that increased RT variability and trial-to-trial switching were strongly correlated in non-medicated ADHD (and not in controls), suggestive of a common mechanism ( Frank et al (2007)). Critically, these measures were independent from other, putative DA-dependent measures such as positive reinforcment learning and working memory updating.

LC Phasic ("bad" noise)
In contrast, the phasic LC mode performs more efficiently in the face of noise: lower tonic LC activity prior to stimulus onset reduces the effects of membrane potential noise in cortex. The phasic LC burst selectively enhances premotor responses evoked by the stimulus, leading to reduced conflict and swift response facilitation. The resulting distribution of reaction times is more narrow for the phasic case, because it is less susceptible to pre-stimulus noise. This phasic mode thereby supports exploitation in our model, as in the models proposed by Aston Jones & Cohen, and McClure et al (2006).

This graph from the supplement of Frank, Scheres & Sherman (2007) shows RT variability (standard deviation of the number of processing cycles needed to facilitate a response) and choice accuracy in a simple selection task, as a function of tonic LC firing rate (with no phasic response). Intermediate tonic LC levels are associated with high RT variability, while high tonic (supra-tonic) levels are associated with narrower distributions, with a cost in accuracy. This demonstrates the need for a dynamic LC modulation of cortical gain during exploitative behavior, with low tonic levels prior to a response, but increased levels to facilitate swift response execution when appropriate. This dynamic mode is associated with minimal RT variability (as in the very high tonic levels above), and also high accuracy (not shown here).

The models are implemented in a reinforcement learning version of the Leabra framework (O'Reilly, 1996; see Frank, 2005; 2006 for the reinforcement version), using a middle ground between biophysically detailed neurons and highly abstract connectionist units. Leabra uses point neurons with excitatory, inhibitory, and leak conductances contributing to an integrated membrane potential, which is then thresholded and transformed via an x/(x+1) sigmoidal function to produce a rate code output communicated to other units (discrete spiking can also be used, but produces noisier results). Units in different parts of the BG areas have different underlying parameters determining their baseline excitability, etc, in an effort to be consistent with biology (see Frank, 2006 for a table of specific parameters and relation to BG function).

Back to my home page.