The avian vocal organ (the syrinx) is a versatile organ located at the junction of the primary bronchi and the trachea, where free moveable connective tissue membranes, the labia, are set in oscillatory motion through an energy exchange from the airstream propelled from the air sacs. The rapid membrane oscillations generate air pressure perturbations, sound waves that travel through the trachea and beak before leaving the bird under the form of complex songs. At the base of the song organization stand the syllables, stereotyped acoustic elements arranged into motifs that can be in turn recombined to form the songs, ultimately revealing the action of a sophisticated neural vocal program. The syrinx is therefore a key element in the articulation between the neural vocal program and the singing behavior.
This article deals with the mathematical models of the syrinx, whose nonlinear nature allows to explain the bulk of acoustical properties of the songs, while capable of producing synthetic songs driven by real physiological instructions recorded from the activity of muscles and air sac pressure during spontaneous singing, showing good qualitative agreement with experimental song recordings.
Contents |
Songbirds can be divided in two classes. About roughly one half of the known species are vocal learners, i.e. they need tutors to incorporate their songs. For these birds, learning through imitation is essential for normal communication. In particular, the group of birds belonging to the suborder Oscines possess a neural vocal architecture with a motor pathway and a pathway involved in song learning (Amador et al. 2008). The suborder Suboscines presents many neural and anatomical differences with respect to the Oscines: they are believed to be vocal non-learners and this difference is manifested in a different central motor control of song. In fact, some Suboscines lack the forebrain vocal nuclei (Farries 2004). Since Oscines are capable of vocal learning, they have been largely studied, not only at the level of their neural organization but also at the physiological level.
The Oscine tracheobronquial syrinx is a double, bilateral structure located at the junction of the bronchi and the trachea as schematized in figure 1. In each sound source, the labia are surrounded by a family of muscles. The tracheobronchialis ventralis (vTB) and tracheolateralis (TL) are thought to actively control the separation of the labia, while the siringealis dorsalis (dS) and the dorsal tracheobronquial (dTB) close the air passage through the labia. This group of muscles is believed to be the responsible for adjusting the oscillatory conditions (Goller F. 1996). On the other hand, the siringealis ventralis (vS) muscle is correlated with the frequency of the vocalizations and have been proposed to control the stiffness of the labia by adjusting the distance between the supporting cartilaginous rings.
Two other types of syrinx have been reported: the tracheal syrinx, with a single sound source, and the bronquial syrinx, in which the two sound sources are assymetrically located in each bronchus.
Despite the differences between Oscines and Suboscines, the physical mechanisms that lead to sound production are general for all birds. Once the oscillations at the syrinx are reached, the air perturbation travels upwards through the tract formed by the trachea and the beak. This passage through a confined region affects the final acoustic signal. Most importantly, a filtering effect takes place (Riede et al. 2006). From a physical point of view, the problem is that of a tube excited by a sound source at one end and opened at the other. The processing of the acoustic signal can be described as follows: the sound waves travel back and forth into the tube, bouncing at the ends and interfering with themselves. Approximating this vocal tract by a tube of length \(L\) characterized by an impedance with respect to atmospheric pressure, i.e. a reflection coefficient \(\alpha\ ,\) the sound wave at the tube entrance \(a(t)\) can be calculated as the result of the reflections in the tube\[\tag{1} a(t)=p_{in}(t)-\alpha a\left(t-\frac{2L}{v}\right)\ ,\]
where \(p_{in}\) stands for the pressure fluctuations injected into the tract due to the airflow modulations. The interference enhances the so-called resonant frequencies \(f_n\) and depresses the others. For an open-end tube of length \(L\) and no losses, \(f_n=(2n-1)c/4L\ .\) When the source signal is filtered, the original sound spectrum is modulated by the spectrum of the tract.
From an acoustical point of view, birdsongs are complex, organized acoustic compositions. The conventional song unit is the syllable, the smallest repetitive sound gesture. Syllables are easily recognized in the standard representation of birdsong, the sonogram (or spectrogram), in which the time evolution of the spectral content of sound is displayed following a color code associated with the power intensity. Throughout this article (figures 2, 3, 4 and 5) darker shades of gray correspond to higher spectral intensities.
In figures 2, 3 and 4 we show sonograms from Oscine birds: a chingolo sparrow (Zonotrichia capensis), a cardinal (Cardinalis cardinalis) and a zebra finch (Taeniopygia guttata) respectively. On the other hand, in figure 6 a very stereotyped song from the Suboscine Great Kiskadee (Pitangus sulfuratus) is displayed. As readily seen from the figures, the different song programs can be the result of quite complicated combinations of syllabic repertoires (for some species like Brown thrashers, repertoires can have thousands of syllables), sometimes organized out of nearly fixed blocks of syllables called motifs. From pressure studies, it is known that songs are not completed with a single breath at the beginning but instead very short pressure falls below atmospheric pressure occur between syllables that enables the bird to execute very long song sequences (Hartley 1990). These minibreaths are observable in the pressure time traces of figures 3 and 6. A spectroscopic analysis of the syllables reveals many different fundamental frequency traces: constant, up-sweeps, down-sweeps, n shaped and more complex curves. Frequency and duration ranges vary across the different bird species; for instance, the chingolo sparrow presents syllabic fundamental frequencies typically ranging from 2 to 7 kHz and durations from 10 to 300 ms (figure 3).
One of the simplest models that captures the physical principle of energy transfer from the airflow to the tissue membranes was originally introduced in Titze 1988 for the vocal folds and subsequently observed using videography in the syrinx (Larsen et al. 1999). It is based upon the observation of surface waves travelling upwards through the folds during the oscillatory cycle, presenting a syrinx of convergent profile when the labia move away from each other and a syrinx of divergent profile in the closing semicycle (figure 1). While presenting a convergent profile during the opening cycle, the interlabial pressure is closer to the bronchial pressure, while it approaches atmospheric pressure when presenting a divergent profile, in the closing cyle. This asymmetry of the syringeal pressure values in the complete cycle assures an overall force in the direction of the velocity of the labia, transferring energy from the airstream to the moving tissue, eventually compensating its dissipative effects and allowing sustained oscillations.
Taking straight lines to simplify the geometry of the membranes (figure 1), the positions of the upper edge \(a\) and lower edge \(b\) of the labium can be written as \( a = a_0 + x + \tau \dot x \) and \(b = b_0 + x - \tau \dot x \ .\) The phenomenological constant \(\tau\) as described in Gardner et al. 2001, corresponds to the period of the resulting standing wave. Substituting these equations into a Newton equation of motion proposed in Titze 1988 for the average syringeal pressure \(p_a=p_{sub}(1-a/b)\) in terms of the sublabial pressure \(p_{sub}\ ,\) the dynamics of the labium results\[\tag{2} \ddot x = -kx-\beta\dot x + a_{lab}p_{sub}\frac{\Delta +2\tau \dot x}{b_0+x+\tau \dot y}-f_0, \]
where \(\Delta=a_0-b_0\) is the syrinx profile difference at rest,
\(k\) is the stiffness of the labium tissue and
\(\beta\) a constant characterizing the dissipation of the tissue.
The constant force term \(f_0\)
controls the stationary position of the labia, all these three parameters per
unit mass.
The core of this flapping model can be investigated through an extremely simplified system that preserves the basic mechanism of self-sustained oscillations\[\tag{3} \ddot x = -kx -\left(\beta_1+\beta_2x^2-p_0\right)\dot x - f_0. \]
The first-order nonlinear function
\(\beta=\beta_1+\beta_2x^2\) preseves the symmetry of the dissipation,
i.e. it is an odd function of the velocity and even function of the space.
In particular, the non-linear term \(\beta_2\) prevents the system to reach
big amplitude oscillations, accounting for the effect
of collision of the labia.
From a dynamical point of view, this system is a slight variation
of the van der Pol oscillator: at the point of instability, relaxation
oscillations are born in a Hopf bifurcation with no spectral content
and they become spectrally richer as they grow in amplitude.
In order to generate song, it is assumed then that the bird produces a set of basic gestures which are identified with the parameters of the model: the elasticity term \(k\) is associated with the activity of syringeal muscles controlling the stiffness of the labia (vS activity is shown to be correlated with the frequency of the vocalization and not with airflow gating (Goller et al. 1995)), the parameter \(p_0\) with the air sac pressure and \(f_0\) with the active airflow gating controlled by muscles dS and vTB (Laje et al. 2002). As displayed in the lower inset of Figure 2, the bifurcation diagram of the system shows a critical value of pressure \(p_0\) above which the self-oscillations are induced through a Hopf bifurcation (shaded region). Considering that the time scales associated with the duration and the fundamental frequency of the syllables are well separated, the time course of \(k\) will essentially trace the time course of the fundamental frequency. Therefore, each syllable type can be expressed as elliptic paths in parameter space (\(p_0,k\)) as seen superimposed to the bifuration diagram of figure 2 (lower inset). The onset and termination of phonation is controlled by \(f_0\ ,\) as phonation is prevented when a threshold is exceeded (Laje et al. 2002). The song that results from the driving of the syrinx and vocal tract model through the elliptic paths is shown in the sonogram of the upper right panel of figure 2, which is to be compared with the original song spectrogram of the chingolo, in the upper left inset. Minibreaths between syllables are easily explained from this description: they correspond to the little portion of the excursion in parameter space suspended outside the oscillation region for negative pressure values.
There is evidence that not only juvenile songbirds reconfigure their brains when acquiring song, but also that adult birds are subjected to active auditory feedback to maintain the quality and stability of their crystallized songs, as some experiments on adult deafened birds show (Leonardo & Konishi 1999, Nordeen & Nordeen 1992 ). It is of special interest then to design experiments in order to manipulate auditory feedback in real time, without interfering with either the motor or the auditory pathways.
In Zysman et al. 2005, a biomimetic device emulating the action of the avian vocal organ was constructed. This electronic circuit analogically integrate the equations (3) for the dynamics of the syrinx, constituting a robust, low time-processing device suitable for carrying out online alterations of auditory feedback. Yet, these experiments require measuring physiological variables for long periods of time, keeping cannulae inserted in the air sacs or muscles wired for days after surgery, which is unlikely to be feasible. A solution was explored by extracting the controlling parameters of the syrinx out of birdsongs in real time. For that sake, two transducers were built: one that reconstructs the air sac pressure from the sound signal in real time, and the other for the tension of the vS muscle. Tests were performed in which recorded vocalizations of cardinals were reconstructed in two steps: first, air sac pressure and tension of the labia were reconstructed using the transducers and second, the reconstructed functions were fed to the electronic syrinx, producing an output with qualitatively the same features as the recorded experimental songs. Devices like this one open new perspectives for the experimental work in the field, making possible to perform specific changes in biologically relevant parameters can be made within very short response times (of approximately 10% of the syllabic duration).
Beyond these suggestive features, the model can also be tested and validated when driven with experimental instructions from singing birds. In Mindlin et al. 2003, simultaneous recordings of dTB and vS muscle activities along with air sac pressure were recorded from 13 samples of spontaneous songs from two cardinals (Cardinalis cardinalis). With this specific choice, i.e. one gating and one labium stiffness-related muscles, experimental signals were associated to the parameters of the model f0 and k. The muscle activity was recorded through electromyographs (EMGs), measured by electrodes implanted into the syringeal muscles. EMG records consist of spiky signals that need to be smoothed and rectified in order to obtain envelope curves that can be used inputs for the model's equations (3).
On the other hand, pressure time traces were captured by a cannula inserted into the anterior thoracic air sac and connected to piezoresistive pressure transducer. The details of the procedure can be found elsewhere (Goller & Suthers 1996). These time traces were associated to the parameter \(p_0\ .\) Linear relationships were assumed between smoothed vS and dS muscle activity, and air sac pressure and the corresponding parameters of the model. Figure 3 displays experimental and synthesized songs generated with the syrinx model fed with the physiological parameters.
The model predicts that for upsweeps, the lower value of \(f_0\) that prevents phonation at the beginning of the syllable is smaller than the value of \(f_0\) at its end. This prediction of the model was observed for all the experimental data analyzed.
A concern might be raised however on whether the sounds produced by these models are perceived by the bird as actual conspecific song. This biologically relevant test was performed by Gardner and coworkers (Gardner et al. 2005). The authors exposed juvenile (25 days old) canaries to a playback of a synthetic song every 2 hours during daylight. Although the synthetic song was designed to avoid the stereotyped canary phrasing structure (where each syllable is repeated many times before changing to a different syllable), the juveniles adopted the song as viable tutor song and achieved a close imitation by 200 days of age (Figure 4).
For bird species that produce nearly tonal syllables, like sparrows (figure 2) and Cardinals (figure 3), the model described so far holds. Nevertheless, other species produce a wide variety of syllables with variable spectral load (figure 5). In particular, Zebra finches present a strong dependence between fundamental frequency and spectral load in each syllable. In figure 5, adapted from Sitt et al. 2008, syllables were characterized by plotting their fundamental frequency \(f_{aff}\) as a function of their spectral centroid or spectral content index, \(SCI=\sum_i f_i\epsilon_i/(E f_{aff})\ ,\) where \(E\) is the total spectral energy and \(\epsilon_i\) the spectral energy corresponding to the frequency \(f_i\ .\) A large amount of experimental data was analyzed and clustered into a thin region of the parameter space.
An explanation for this behavior is found using the model described in eqs. (2), expanding to first nonlinear order the restitution \(k(x)=k_1+k_2x^2\) and dissipation \(\beta(x,y)=\beta_1+\beta_2 x^2+\beta_3y^2\ .\)
The bifurcation diagram for this system is displayed in figure 6, adapted from Amador et al. 2008, in the natural parameter space (\(p_{sub},k_1\)). The red line corresponds to a fixed point losing stability in a Hopf bifurcation, already identified as a basic oscillatory mechanism of the syrinx. Interestingly, some other solutions appear in this fully developed model. Whenever region 5 is abandoned to reach region 2, the saddle fixed point and the attractor collide in a saddle-node bifurcation. But as the unstable manifold of the saddle is part of the stable manifold of the attractor, an oscillation is born as the fixed points collide. This oscillation will have important differences with one born in a Hopf bifurcation. In the last case, the oscillation is born with zero amplitude and a finite frequency. On the contrary, an oscillation born in a saddle-node on limit cycle (SNLC) bifurcation will present a finite amplitude, and zero frequency. As the parameter is further moved, the frequency will start to increase, but for a region of the parameter space of nonzero measure, the oscillation will present a critical slowing down whenever the variables approach the region of the phase space where the saddle and the attractor collided (the period \(T\) is proportional to \((a_c-c)^{1/2}\ ,\) where \(a\) is a control parameter and \(a_c\) its value at the bifurcation). As the oscillations are born with finite amplitude and zero period they display a rich spectral content.
The bifurcation line corresponding to the SNLC can be crossed by changing \(p_{sub}\) sublabial pressure only, and the frequency of the oscillations will strongly depend on the distance of the pressure to the bifurcation point. Therefore, the model suggests that it is possible to control the fundamental frequency of the vocalizations with the modulations of the air sac pressure. In this way, placing the parameters in the vicinity of the bifurcation and classifying the simulated sounds in the \(SCI\) space as a function of the pressure, a curve that closely fits the experimental points (dotted line of figure 4) for low-frequency syllables is obtained. This prediction of the model is consistent with experimental observations of no ventral syringeal muscle activity for low-frequency sounds in Zebra finches (Vicario et al. 1991). A plausible interpretation in the framework of this model is that those vocalizations are originated in SNLC bifurcations and the fundamental frequency is only controlled by the air sac pressure (Amador et al. 2008).
In this way, the dynamical signature revealed by the spectral content vs. fundamental frequency in the Zebra finch allows a simple biological explanation in terms of the model: a vocal organ `tuned' in the vicinity of a SNILC bifurcation can produce a variety of spectrally rich sounds by the driving action of the air sac pressure or a simple combination of air sac pressure and dorsal muscles.
Yet another very interesting case supporting the key role of nonlinearities in the syrinx comes from the Order Subscines, a group of birds that does not belong to the songbirds.
The Great Kiskadee (Pitangus sulfuratus) is a Suboscine presenting a tracheosyringeal syrinx with two independently controlled sound generators, each of which consists on a pair of oscillating labia. Three pairs of muscles take control of the syrinx: the extrinsic muscles m. sternotrachealis and m. tracheolateralis and the intrinsic muscle m. obliquus lateralis. The song of the Great Kiskadee consists of a nearly fixed sequence of three syllables that presents an interesting property: simultaneous recordings of air sac pressure and fundamental frequency look very alike (both time traces are showed in the right panel of figure 7).
This resemblance was quantified in Amador et al. 2008, and the analysis confirmed a highly significant linear correlation between fundamental frequency of the syllables and air sac pressure for this bird, suggesting a weak (if any) dependence of the song with the activity of the syringeal muscles. To confirm the independency of the sound generation with activity of syringeal muscles, experiments were performed comparing songs of normal birds and birds with both tracheosyringel muscles transected. While equivalent experiments performed in songbirds reveal striking changes in song production and even complete loss of phonation (Suthers & Zollinger 2004, Daley & Goller 2004), the air sac pressure patterns and songs of the Kiskadee remained remarkably unaffected by denervation. Moreover, numerical integration of a set of equations for a flapping model of the syrinx was performed in Amador et al. 2008, optimizing the fitting between synthetic and natural songs when the restitution constant \(k\) was expanded to first order nonlinear terms, \(k=k_1x+k_2x^3\ ,\) while the linear case performed poorly (figure 7, left inset).
In this way, a very simple mechanism of transducing air sac pressure into frequency modulations is revealed: when the hypothesis of a linear restitution approximation is replaced by a first order nonlinear one, the air sac pressure does not only account for the onset of oscillations but it also shifts the midpoint of the folds, transducing higher pressure into higher oscillation frequencies.
Even though a good deal of the acoustic richness of birdsong emerges from the nonlinear nature of the vocal organ (see for the case of humans Herzel et al 1994), the isolated sound source is not the only possible origin of rich acoustic behavior. In fact, much of the models rely on the assumption of the source-filter hypothesis, according to which the source of sound and the filtering of the tract are acoustically uncoupled. Although this hypothesis holds in general, there is strong evidence that birds can take advantage of coupling effects to generate complex songs (figure 8). Laje et al. 2005 used a two-dimensional model for the syrinx (as the one of eq. 2) restricting to a non-chaotic case to specifically study the coupling effects. The coupling is taken into account by considering an expression for the interlabial pressure \(p_0\) that explicitly depends on both the sublabial pressure \(p_s\) and input pressure at the vocal tract \(p_i\ .\) In this case, \(p_i\) is a combination of both the pressure perturbations injected locally into the vocal tract by the sound source and the pressure resulting from the reflections at the end of the tract, \(p_i(t)=s(t)-\gamma p_i(t-T)\ ,\) where \(s(t)=\alpha(x-\tau y)+\beta(y-\tau \dot y)\ .\) \(T\) is the time for the sound wave to propagate back and forth through the tract, \(\gamma\) is the reflection coefficient for the tract-atmosphere interface.
From the simulations of the complete model of interacting sources and tract, it is observed that complex periodic and aperiodic solutions appear, as well as period-doubling bifurcations. When considering only two sources active at the same time, sound simulations display subharmonic content that is not present in the individual source spectra.
Interestingly, coupling parameters \(\alpha\propto\sqrt\rho\) and \(\beta\propto\sqrt\rho\) can be modified if, for instance, atmospheric nitrogen were replaced with the lighter helium (Nowicki 1987). Then, some specific effects could be expected from strong enough coupling effects: in particular, numerical simulations of a single source with feedback show that as the pressure parameter \(p_s\) is decreased, subharmonic frequencies disappear in a period-halving bifurcation in normal air, while they are not present at all in a helium-like atmosphere. Interestingly, this effect would disappear if the dynamical origin was different: simulations of more complex syrinx models like the asymmetric two-mass model without vocal tract coupling are less sensitive to density changes in the atmosphere and consequently the subharmonic spectral content is present in either the normal and the heliox atmosphere.
Song production is an interesting animal model for studying the neural bases of complex, experience driven motor sequence generation. The instructions driving the syrinx are generated by the avian song control system, which is made up of a discrete interconnected network of nuclei (Nottebohm 1976). These nuclei, built from thousands of neurons, act in a coordinate way to control the syrinx and the muscles of respiration (Wild et al 2000). At the output, this neural structure is made up of brainstem nuclei (hypoglossal nucleus nXIIts) which controls the syrinx and others controlling indirectly the respiration as (nucleus retroambigualis, RAm and parambigualis, PAm). These output nuclei receive motor commands from the forebrain nucleus RA (nucleus robustus arcopallialis), which is innervated by nucleus HVC.
Chronic recording in singing birds have opened a new perspective in songbird research (see A.C. Yu et al. 1996) Specifically, measurements of HVC and RA during song revealed that, in zebra finches, RA-projecting HVC neurons are sparsely active, each producing at most a brief burst of spikes at one time in the song motif, that these neurons burst sequentially, with each neuron bursting at a different time in the song (Hanslosser et al. 2001). These observations suggested to Michael Fee and collaborators that the pattern of activation in RA depends on the synaptic connections from HVC(RA) neurons onto RA neurons, and therefore that it is the pattern of HVC(RA)–RA synapses that codes the acoustic features of the bird’s song. An computational implementation of this model was presented in (Abarbanel et al. 2004). S. Seung and collaborators are using theoretical techniques to unveil the implications of this sparse coding for song learning, and for premotor neuronal representations in nucleus RA (Fiete et al 2007), following a line of research aiming at integrating the still sparse experimental data into operational models, in order to understand how the motor pathway is reconfigured during learning (Doya et al 1995, Troyer et al.2000 ).
Internal references