Neuroelectromagnetic source imaging (NSI) is the scientific field devoted to modeling and estimating the spatiotemporal dynamics of neuronal currents throughout the brain that generate the electric potentials and magnetic fields measured with noninvasive or invasive electromagnetic (EM) recording technologies (Ha93; Ba01; Ha05). Unlike the images produced by fMRI , which are only indirectly related to neuroelectrical activity through neurovascular coupling (e.g., the BOLD signal), the current source density or activity images that NSI techniques generate are direct estimates of electrical activity in neuronal populations. In the past few decades, researchers have developed better source localization techniques that are robust to noise and that are well informed by anatomy, neurophysiology, magnetic resonance imaging (MRI), and realistic volume conduction physics (Io90; Ge95; Fu99; Gr01; Vr01; Li02; Wo03). State-of-the-art algorithms can localize many simultaneously active sources and even determine their variable spatial extents and single-trial dynamics.
Contents |
The measured EM signals that are generated by the brain are thought to be primarily due to ionic current flow in the apical dendrites of cortical pyramidal neurons and the associated return currents throughout the volume conductor (Ok93; Ok97). Neurons that have dendritic arbors with closed field geometry (e.g., interneurons) produce no externally measurable signals (Hu69). However, some non-pyramidal neurons, such as the Purkinje neurons, do produce externally measurable signals (Ni71). Although still controversial, it is believed that some source localization methods can accurately image the activity of deep brain structures, such as the basal ganglia, amygdala, hippocampus, brain stem, and thalamus (Ri91; Vo96; Te97; Li99). However, single neurons produce weak fields and if the current flow is spatiotemporally incoherent (e.g., a local desynchronization) the fields end up canceling. Thus, EM recordings are particularly suited for studying spatiotemporally coherent and locally synchronized collective neural dynamics. There is a limit to how much current density a patch of cortex can support and this seems to be similar across species (Ok93). Thus, large amplitude fields/potentials entail distributed synchronized oscillations.
All the EM recording technologies share the important benefit of high temporal resolution (< 1 ms). However, they measure different, yet closely related, physical quantities at different spatial scales that are generated by the current density source distribution located throughout the brain, heart, muscles, eyes, and the environment. In principle, the inverse methods described here can be applied to data acquired using magnetoencephalography (MEG), scalp electroencephalography (sEEG), intracranial EEG (iEEG) or Electrocorticography (ECoG), or combinations of these measurement modalities, as long as the appropriate realistic forward models are available to compute the gain vectors needed for localization (Wo03; Wo04; Wo06).
In MEG, an array of sensors is used to measure components of the magnetic vector field surrounding the head (Ha93; Vr01). Although the first magnetoencephalogram was recorded with a single coil (Co68), after the invention of the superconducting quantum interference device (SQUID) (Zi71), MEG data has been recorded with SQUIDs (Co72). Current state-of-the-art systems have many sensors (>275) organized as a helmet-array of magnetometers and/or gradiometers (planar or radial). Distant reference sensors are used to eliminate noise (Vr01). Also, magnetic shielded rooms have been improved and active shielding technology has been developed to further reduce the effects of sources outside the shielded room and to reduce the need for heavily shielded rooms.
In sEEG, an array of electrodes is placed on the scalp surface to measure the electric potential scalar field relative to a reference electrode (Ni05; Nu05). EEG recording technology has progressed much since the first human recordings by Hans Berger in 1924 (Be29) and the later work by Edgar Adrian (Ad34). Modern state-of-the-art EEG systems use caps with as many as 256 active electrodes. Some EEG systems can be used for simultaneous EEG/MEG or EEG/fMRI recordings. Researchers are also working on wireless acquisition and on dry electrode technologies that do not use conductive gel.
In patients undergoing iEEG or ECoG, grid or strip electrode arrays are neurosurgically placed to record the electric potential more closely and undistorted by the skull (La03; Le05; Ca06). Grid electrode arrays are commonly rectangular in shape and have inter-electrode distances of about 1 cm or lower. Invasive measurements of the local field potential (LFP) can be recorded by depth electrodes, electrode grids, and laminar electrodes (Pe54; Ul01; Sa03). Invasive recordings are usually not localized because the LFP is treated as strictly local. However, there is also a local mixing process that generates the LFPs, and electrodes can pick up far field potentials of strong sources. Thus, inverse models are still desired.
A quick overview of several aspects of modeling which directly affect source estimation even though they are not technically inverse modeling per se is presented in this section.
The source model refers to the mathematical model used to approximate the current density. The source model most often used is the equivalent electric dipole model, which approximates the primary current density within a patch or volume as a point source \(\mathbf{j^p(r')}=\mathbf{q}\delta(\mathbf{r'}-\mathbf{r})\ ,\) where \(\delta(\mathbf{r})\) is the Dirac delta function with moment \(\mathbf{q}=\textstyle \int \mathbf{j^p(r')}\mathbf{dr'}\) (Ha93; Ba01). Other source models, such as the blurred dipole (i.e., a dipole with surrounding monopoles), the quadrupole, or higher multipolar expansions, can also be used for forward modeling. In addition, an alternative irrotational current density source model, which excludes the solenoidal component of the current density, can be used to formulate the forward problem as a mapping from the scalar field of local field potentials throughout the brain to the scalar field of measured potentials of EEG or ECoG. This approach reduces the number of unknowns, thereby making the inverse problem less underdetermined. This irrotational source model approach for forward/inverse modeling is called ELECTRA (Gr00). Once the LFP vector is estimated, the irrotational current density vector can be computed by taking its gradient. Although we will assume a dipole source model for the rest of this article, the linear system of equations of ELECTRA can also be solved with many of the inverse methods explained here.
An MRI scan must be obtained from each individual for doing subject-specific anatomical modeling. The T1 scan is segmented into white and gray matter volumes from which topologically correct white and gray matter cortical surfaces are tessellated. Sources can then be constrained to reside only in the gray matter volume by constructing a volumetric source space (i.e., a set of coordinates where candidates dipoles are allowed to have nonzero amplitudes). However, the cortical gray matter volume is usually modeled as a surface since NSI technology probably does not have the spatial resolution to discriminate sources at different cortical layers. With this surface approach, dipole orientations can easily be constrained to point normal to the cortical surfaces (i.e., in the direction of the apical dendrites of pyramidal neurons). Non-cortical structures can be modeled as volumetric source sub-spaces without dipole orientations constraints. MRI scans are also used to segment other head subvolumes (e.g., scalp/skin, skull, CSF, and brain envelope volume) and to tesselate their associated surfaces (e.g., scalp/skin, inner and outer skull, and brain envelope surfaces) which are used for forward modeling of volume conduction. If a subject's MRI is not available, a standardized MRI brain atlas (e.g., MNI) can be be warped to optimally fit the subject's anatomy based on the subject's digitized head shape. The warped atlas can then be used as a standardized volumetric source space and for standardized forward modeling.
To compute the forward EM signals that would be produced by a unit dipole somewhere in the brain, both the sensor and dipole positions and orientations must be expressed in the same coordinate system. This is usually done by transforming (i.e., translating and rotating based on fiducial markers) the sensor positions and orientations to the coordinate system of the MRI where the dipoles are modeled. Errors in finding the true fiducial marker coordinates in the MRI space can result in poor co-registration. For better co-registration an automatic algorithm (e.g., iterative closest point (ICP) matching) can be used to match the digitized head-shape or EEG electrode surface to the skin surface segmented from the MRI. Even with optimal co-registration, one must keep in mind that some of the modeled meshes may be misaligned with the true brain structures not only due to transformation errors but also due to intrinsic structural errors in the MRI data sets (e.g., due to susceptibility artifacts and/or mistaken tissue segmentation).
In order to localize the primary current density one has to model the EM signals produced by both the primary (i.e., impressed) and the secondary (i.e., volume, return) current density throughout the head volume conductor, which in reality has an inhomogenous and anisotropic conductivity profile. Analytic MEG forward solutions can be computed if the volume conductor is approximated by an isotropic sphere or a set of overlapping spheres (Sa87; Hu99). The same is true for EEG but using cocentric spherical shells with different isotropic conductivities. The level of realism of the forward volume conduction head model is particularly important since the measured signals have significant contributions from volume currents. The different volume conductor materials (e.g., scalp, skull layers, CSF, gray matter, white matter) have different conductivities and different levels of anisotropy. These need to be modeled in great spatial detail to compute a realistic bases of gain vectors used to represent the data. Much progress has been made toward realistic EM forward modeling using numerical techniques such as the Boundary Element Method (BEM) and the Finite Element Method (FEM) (Ha87; Ak04; Wo06). BEM assumes homogenous and isotropic conductivity through the volume of each tissue shell (e.g., brain, CSF, skull, skin) but not across the boundaries of the shells. FEM usually also assumes homogeneity and isotropy within each tissue type, but in contrast to BEM, can also be used to model the anisotropy of white matter (using DTI) and that of the skull's spongiform and compact layers (using geometric models). Although realistic modeling exploits any available subject-specific information from Magnetic Resonance Imaging (MRI) (e.g., T1, T2, PD, DTI), standardized BEM or FEM head models can be used as a first approximation for subjects without an MR scan (Fu02; Da06).
The goal of inverse modeling, the topic of this paper, is to estimate the location and strengths of the current sources that generate the measured EM data. This problem of source localization is an ill-posed inverse problem. There are an infinite number of solutions that explain the measured data equally well because silent sources (i.e., sources that generate no measurable EM signals) exist, and these can always be added to a solution without affecting the data fit (He53). Because of this nonuniqueness, a priori information is needed to constrain the space of feasible solutions (Sa87; Ha93). Nonuniqueness is handled by making assumptions about the nature of the sources (e.g., number of sources, anatomical and neurophysiological constraints, prior probability density functions, norms, smoothness, correlation, covariance models, sparsity, diversity measures, spatial extent constraints, etc.). Thus, the accuracy and validity of the estimates depend to some extent on the biological correctness of the assumptions and priors adopted in our models. This is why priors should not only be informed by neurophysiology domain knowledge, but should also be flexible and adaptive to particular data sets. The rest of this paper focuses on different inverse modeling approaches. In such a short space it would be impossible to mention all inverse algorithms that exist. Thus, we focus on three basic approaches that encompass most algorithms: 1) non-linear dipole or source fitting; 2) spatial scanning or beamforming; and 3) source-space based algorithms explained within a general Bayesian framework.
Before localization, data is usually preprocessed with uni- and multivariate techniques. The measured univariate time-series of each electric or magnetic recording channel is typically represented as a row vector, and is usually cleaned by mean subtraction, high- and low-pass filtering, detrending, harmonic analysis, and/or downsampling. Signals can also be transformed to the time-frequency/scale domains with Fourier/wavelet methods, or can be band-pass filtered and Hilbert transformed to extract the instantaneous amplitude and phase (Ta96; Se99; Gr01; Fr07). Frequency-specific noise can easily be filtered out in the frequency or time domains. Filtering or time-frequency analysis is also useful for studying wavelike activity and oscillations at specific frequency bands: the slow oscillation (<1 Hz), delta (1-4 Hz), theta (5-8Hz), alpha (8-12 Hz), mu (8-12 Hz and 18-25 Hz), spindles (~14 Hz), low beta (13-20 Hz), high beta (20-30 Hz), gamma (30-80 Hz), high gamma or omega (80-200 Hz), ripples (~200 Hz), and sigma bursts (~600 Hz).
The continuously and simultaneously acquired time-series of all MEG, EEG, and/or ECoG channels can be concatenated to form a multivariate data matrix \(\mathbf{B}\in\Re^{{d_m}\times{d_t} }\ ,\) where \(d_m\) is the number of measurement channels and \(d_t\) is the number of time points. Correlated noise, which is generated by non-brain sources such as the heart, eyes, and muscles, can be filtered out using blind source separation, subspace methods, and machine learning methods (Ma97; Uu97; Ma02; Ta02; Pa05; Zu07).. Different linear transformations that use only the second-order statistics of the data can be applied to the data for noise removal (e.g. signal space separation (SSP), Principal Component Analysis (PCA)). The denoised \(\mathbf{B}\) matrix can be cut into epochs time-locked to an event (e.g., stimulus onset) for single trial analysis or averaged across epochs to extract the event-related potential and/or field (ERP/F) (Lu05). The ERP/F can then be localized by many different inverse methods. Alternatively, blind source separation algorithms that use higher order statistics or temporal information (e.g., infomax/maximum-likelihood independent component analysis (ICA) or second-order blind identification (SOBI)), can be applied to the entire unaveraged multivariate time series to learn a data representation basis of sensor mixing vectors (associated with maximally independent time-courses) that can be localized separately, and to reject non-brain components (i.e., denoising) (Be95; Ma97; Ma02; Ta02). Time-frequency analysis can be performed on the activation functions of each component, and the source event related spectral perturbation (ERSP), or inter-trial coherence (ITC) can be computed (De04). Convolutive ICA algorithms in the time and time-frequency domains can model more complex spatiotemporal source dynamics (An03; Dr07; Le07).
Regardless of any transformation or averaging on the data, the data to be simultaneously solved can be represented as a \({d_m}\times{d_v}\) real or complex matrix \(\mathbf{B}\ ,\) which contains \(d_v\) measurement column vectors. For example, if \(\mathbf{B}\) is a time-series, \({\mathbf{b}}_\tau = (b_1 ,...,b_\tau )^T\) is the \(d_m\) dimensional measurement column vector at time \(\tau\ .\) But \(\mathbf{B}\) needs not be a time-series, it can also be any given set of vectors obtained from the data that benefit from simultaneous source localization (e.g., a subspace of the data). When \(d_v=1\ ,\) the single measurement problem is recovered. This case is also used for localizing individual sensor maps obtained from a decomposition of the data (e.g., ICA).
The goal of inverse methods is to estimate the location and strengths of the current sources that generate \(\mathbf{B}\ .\) However, because of nonuniqueness a priori assumptions are needed to constrain the space of feasible solutions. The most common assumption is that the measurements were generated by a small number of brain regions that can be modeled using equivalent dipoles. These algorithms minimize a data-fit cost function defined in a multidimensional space with dimension equal to the number of parameters. Once the algorithm converges to a local minima in the multidimensional space of parameters, the optimal parameters (each corresponding to a dimension) are found. The algorithms estimates five nonlinear parameters per dipole: the x, y, and z dipole position values, and the two angles necessary to define dipole orientations in 3D space. However, in the MEG spherical volume conductor model only one angle (on the tangent space of the sphere) is necessary because the radial dipole component is silent. The amplitudes are linear parameters estimated directly from the data as explained below for the cases of uncorrelated or correlated noise.
Parametric dipole fitting algorithms, minimize a data fit cost function such as the Frobenius norm of the residual, \[\tag{1} \min_{\mathbf{s}} ||\mathbf{B}-\mathbf{\hat B}||_F^2=||\mathbf{B}-\mathbf{L_s\hat J_s}||_F^2=||\mathbf{B}-\mathbf{L_sL_s^{\dagger}B}||_F^2=||(\mathbf{I}-\mathbf{L_sL_s^{\dagger}})\mathbf{B})||_F^2=||\mathbf{P_{L_s}^\perp}\mathbf{B}||_F^2\]
where \(\mathbf{s}\) refers to all the nonlinear parameters, \(\mathbf{s}=\lbrace \mathbf{r}_i , \mathbf{\theta}_i \rbrace\ ,\) which are the positions and orientations of all the dipoles in the model that are optimized to minimize the data fit cost (Sc91; Ba01). \(\mathbf{\hat B}\) is the explained forward data given by the generative linear model\[\mathbf{\hat B}=\mathbf{L_s\hat J_s}\ ,\] where \(\mathbf{\hat J_s}=\mathbf{L_s^{\dagger}B}\in\Re^{{d_s}\times{d_v} }\) is the estimated current matrix containing the moments of \(d_s\) dipoles. The ith row vector of \(\mathbf{\hat J_s}\) contains the moments of a dipole located at position \(\mathbf{r}_i\) with orientation \(\mathbf{\theta}_i\ .\) \(\mathbf{L_s}\) is the lead-field matrix containing \(d_s\) m-dimensional column vectors called gain vectors. They are computed for \(d_s\) unit dipoles located at different positions \(\mathbf{r}_i=(x_i,y_i,z_i)^T\) and with different orientations \(\mathbf{\theta}_i=(\phi_i,\omega_i)^T\ .\) The orientations of the dipoles however can be obtained linearly if we optimize only the positions and include the gain vectors of all 3 dipole components pointing in the (x,y,z) directions, so that \(\mathbf{L_s}\) is a \(d_m\) by \(3d_s\) matrix and \(\mathbf{J_s}\) is a \(3d_s\) by \(d_v\) matrix. \(\mathbf{I}\) is the \(d_m\)-dimensional identity matrix, \(\mathbf{L_s}^{\dagger}\) is the pseudoinverse of \(\mathbf{L_s}\ ,\) and \(\mathbf{P_{L_s}^\perp}\) is the orthogonal projection operator onto the null space of \(\mathbf{L_s}\ .\) Note that in this approach the gain matrix needs to be recomputed at each iteration for any given parameters. Also, note that this approach is equivalent to a maximum likelihood estimate of the parameters using a Gaussian likelihood noise model.
In the presence of correlated noise, a modified cost function can be minimized (Sa87). If \(\mathbf{C}_\varepsilon\) is the noise covariance matrix, which can be decomposed as \(\mathbf{C}_\varepsilon=\mathbf{V}\Lambda^2 \mathbf{V}^T\ ,\) where \(\mathbf{V}\) is an orthonormal matrix, and \(\Lambda=diag(\lambda _1 , \ldots ,\lambda_{d_m})\ ,\) then the new cost function can be expressed as \[\tag{2} \min_{\mathbf{s}} \left\| {{\mathbf{PB} } - {\mathbf{P\hat B} } } \right\|_F^2 = \left\| {{\mathbf{PB} } - {\mathbf{PL_s} } {\mathbf{\hat J_s} } } \right\|_F^2\ ,\] where \(\mathbf{P}= \mathbf{V}\Lambda^{-1}\mathbf{V}^T\ .\)
These cost functions are usually minimized using nonlinear optimization algorithms (e.g., Nelder-Meade downhill simplex, Levenberg-Marquardt). Unfortunately, when the number of dipoles is increased (e.g., \(d_s>2\)), the cost suffers from many local minima. Furthermore, it should be noted that by adding a spatial term to the data-fit cost function dipoles can be constrained to reside as close as desired to the gray matter volume. However, such spatial penalties can introduce more local minima problems. Global minimization can be achieved for a time-series containing maybe up to seven dipoles by using more computationally intensive algorithms such as simulated annealing, multistart simplex algorithms, or genetic algorithms (Hu98; Uu98). Alternatively, instead of selecting a point estimate, one can use Markov Chain Monte Carlo algorithms to make Bayesian inferences about the number of sources and their spatial extents and to compute probabilistic maps of activity anatomically constrained to gray matter (Sc99; Be01). It is important to distinguish the cost function from the optimization algorithms. Although the standard costs for dipole fitting have many local minima, other costs, like for example the negative log marginal likelihood (see sparse Bayesian learning section), have less local minima, and can also be minimized with nonlinear dipole fitting algorithms.
An alternative approach to the ill-posed bioelectromagnetic inverse problem is to independently scan for dipoles within a grid containing candidate locations (i.e., source points). Here the goal is to estimate the activity at a source point or region while avoiding the crosstalk from other regions so that these affect as little as possible the estimate at the region of interest.
The simplest spatial filter, a matched filter, is obtained by normalizing the columns of the leadfield matrix and transposing this normalized dictionary. The spatial filter for location s is given by \[\tag{3} {\mathbf{W}}_s^T = \frac{{ {{\mathbf{L} }_s^T } } }{ {\left\| {{\mathbf{L} }_s } \right\|_F } }\ .\] This approach essentially projects the data to the column vectors of the dictionary. Although this guarantees that when only one source is active, the absolute maximum of the estimate corresponds to the true maximum, this filter is not recommended since this single-source assumption is usually not valid, and since the spatial resolution of the filter is so low given the high correlation between dictionary columns. This approach can be extended to fast recursive algorithms, such as matching pursuit and its variants, which sequentially project the data or residual to the non-used dictionary columns to obtain fast suboptimal sparse estimates.
The MUSIC cost function is given by \[\tag{4} M_s = \frac{ {\left\| {\left( {{\mathbf{I} } - {\mathbf{U} }_s {\mathbf{U} }_s^T } \right){\mathbf{L} }_s } \right\|_2^2 } }{ {\left\| {{\mathbf{L} }_s } \right\|_2^2 } } = \frac{ {\left\| {P_{{\mathbf{U} }_s }^ \bot {\mathbf{L} }_s } \right\|_2^2 } }{ {\left\| {{\mathbf{L} }_s } \right\|_2^2 } }\ ,\] where \({\mathbf{B}} = {\mathbf{USV}}^T\) is the singular value decomposition (SVD) of the data, \({\mathbf{U}}_s \) is a matrix with the first \(d_s\) left singular vectors that form the signal subspace, and \(\mathbf{L}_s\) is the gain vector for the dipole located at \(r_i\) and with orientation \(\theta_i\) (obtained from anatomy or using the generalized eigenvalue decomposition) (Mo98). \(P_{\mathbf{U}_s }^ \perp\) is an orthogonal projection operator onto the data noise subspace. The MUSIC map is the reciprocal of the cost function at all locations scanned. This map can be used to guide a recursive parametric dipole fitting algorithm.
In contrast, the minimum variance beamformer attempts to minimize the beamformer output power subject to a unity gain constraint: \[\tag{5} \min_{{\mathbf{W} }_s} \quad tr\left( {{\mathbf{W} }_s^T {\mathbf{CW} }_s } \right)\] subject to \({\mathbf{W} }_s^T {\mathbf{L} }_s = {\mathbf{I} }\) , where \({\mathbf{C}}\) is the data covariance matrix, \({\mathbf{L}}_s\) is the \(d_m\) by 3 gain matrix at source point s, and \({\mathbf{W}}_s\) is the \(d_m\) by 3 spatial filtering matrix ( Va97). The solution to this problem is given by \[\tag{6} {\rm{ }}{\mathbf{W}}_s^T = \left( {{\mathbf{L} }_s^T {\mathbf{C} }^{ - 1} {\mathbf{L} }_s } \right)^{ - 1} {\mathbf{L} }_s^T {\mathbf{C} }^{ - 1}\ .\]
The parametric source activity at source point s is given by \({\mathbf{A}}_{\rm{s}} = {\mathbf{W}}_s^T {\mathbf{B}}\ .\) This can be performed at each source-point of interest to obtain a distributed map of activity. This beamforming approach can be expanded to a more general Bayesian graphical model that uses event timing information to model evoked responses, while suppressing interference and noise sources (Zu07). This approach uses a variational Bayesian EM algorithm to compute the likelihood of a dipole at each grid location.
In synthetic aperture magnetometry (SAM), a nonlinear beamformer, an optimization algorithm is used at each source-point to the find the dipole orientation that maximizes the ratio of the total source power over noise power, the so-called pseudo-Z deviate \[\tag{7} {\rm{ Z = }}\sqrt {\frac{{{\mathbf{W} }_s^T {\mathbf{CW} }_s } }{{{\mathbf{W} }_s^T {\mathbf{C} }_n {\mathbf{W} }_s } } } = \sqrt {\frac{P}{n} }\]
where \({\mathbf{C}}_n\) is the noise covariance, usually based on some control recording or assumed to be a multiple of the identity matrix (Vr01). This maximization generates a scalar beamformer (i.e., \({\mathbf{L}}_s\) is a vector) with optimal dipole orientations in terms of SNR. This improves the spatial resolution of SAM relative to that of LCMV beamforming. To generate statistical parametric maps between an active task period (a) and a control period (c), the so-called pseudo-T statistic can be computed as \[\tag{8} T{\rm{ = }}\frac{{P{\rm{(} }a{\rm{) - } }P{\rm{(} }c{\rm{)} } } }{{n{\rm{(} }a{\rm{) + } }n{\rm{(} }c{\rm{)} } } }\ .\]
Such maps usually have more focal activities since they contrast the differences between two states. Other scalar beamformers can be implemented. For example, an anatomically constrained beamformer (ACB) can be obtained by simply constraining the dipole orientations to be orthogonal to the cortical surfaces (Hi03).
Beamforming can be performed in the frequency domain using the dynamic imaging of coherent sources (DICS) algorithm, whose spatial filter matrix for frequency f is given by \[\tag{9} {\rm{ }}{\mathbf{W}}_s^T (f) = \left( {{\mathbf{L} }_s^T {\mathbf{C} }(f)^{ - 1} {\mathbf{L} }_s } \right)^{ - 1} {\mathbf{L} }_s^T {\mathbf{C} }(f)^{ - 1}\]
where \({\mathbf{C}}(f)\) is the cross-spectral density matrix for frequency f (Gr01). Note that the covariance matrix has simply been replaced by the cross-spectral density matrices.
All the spatial filter vectors explained so far depend on the gain vectors associated only with the region of interest (i.e., they don't depend on the gain vectors associated with the rest of the source space). There are other more direct approaches to spatial filtering that incorporate the gain vectors associated with both the region of interest and the rest of the source space, and that do not necessarily use the measured covariance matrix. In the Backus-Gilbert method, a different spread matrix is computed for each candidate source location (Gr98; Gr99). The goal is to penalize the side lobes of the resolution kernels (i.e., the row vectors of the resolution matrix, defined as \(\mathbf{R}=\mathbf{O}\mathbf{L}\ ,\) where \(\mathbf{L} \) is the leadfield matrix for the entire sourcespace (see next section) and \(\mathbf{O}\) is any optimized linear operator that gives the source estimates when multiplied with the data). This usually results in a wider main lobe. In the spatially optimal fast initial analysis (SOFIA) algorithm, virtual leadfields are constructed that are well concentrated within a region of interest compared to the rest of the sourcespace (Bo99). The region of interest can be moved to every source-point. A similar approach is adopted in the Local Basis Expansion (LBEX) algorithm which solves a generalized eigenvalue problem to maximize the concentration of linear combinations of leadfields (Mi06). As a final remark, it should be emphasized that all of the spatial filtering algorithms presented scan one source-point or local region at a time, but can be expanded with multi-source scanning protocols that search through combinations of sources. Although, multisource scanning methods can recover perfectly synchronized sources (which are usually missed by single-source methods), there is no agreed protocol to scan the immense space of multisource configurations.
Instead of performing low-dimensional nonlinear optimization or spatial scanning, one can assume dipoles at all possible candidate locations of interest within a grid and/or mesh called the sourcespace (e.g., source-points in grey matter) and then solve the underdetermined linear system of equations \[\tag{10} {\mathbf{B}} = {\mathbf{LJ}} + \Upsilon\]
for \(\mathbf{\hat J}\in\Re^{{d_s}\times{d_v} }\) (\(d_s\) now being the number of dipole components throughout the sourcespace). The leadfield matrix \(\mathbf{L}\in\Re^{{d_m}\times{d_s} }\) relates the current space with the measurement space. \(\Upsilon \in \Re ^{d_m {\rm{ } } \times {\rm{ } }d_v } \) is the noise matrix usually assumed to be Gaussian. Since there is no unique solution to this problem, priors are needed to find solutions of interest. The rest of the algorithms can be best presented from a more abstract Bayesian general framework that makes explicit the source prior assumptions using probability density functions. Bayes theorem \[\tag{11} p({\mathbf{J}}|{\mathbf{B}}) = \frac{{p({\mathbf{B} }|{\mathbf{J} })p({\mathbf{J} })} }{{p({\mathbf{B} })} }\]
says that the posterior probability given the measurements is equal to the likelihood multiplied by the marginal prior probability and divided by a normalization factor called the evidence.
A Gaussian likelihood model is assumed, \[\tag{12} p({\mathbf{B}}|{\mathbf{\hat J}}) = \exp \left( { - \frac{1}{{2\sigma ^2 } }\left\| {{\mathbf{B} } - {\mathbf{L\hat J} } } \right\|_F^2 } \right) \ ,\]
together with a very useful family of prior models using generalized Gaussian marginal probability density functions (pdf) \[\tag{13} p({\mathbf{\hat J}}) \propto \exp \left( { - {\mathop{\rm sgn}} (p)\sum\limits_{{\rm{i = 1} } }^{\rm{n} } {\left\| {{\mathbf{\hat J} }{(i,\;\,:)} } \right\|_q^p } } \right)\]
where p specifies the shape of the pdf which determines the sparsity of the estimate, and q usually is 2 and specifies the norm of \({\mathbf{\hat J}}{(i,\,:)}\ ,\) the ith row vector of \({\mathbf{\hat J}}\ .\) Because \(p({\mathbf{B}})\) does not affect the location of the posterior mode, it can be ignored, and thus the MAP point estimate can be computed by \[\tag{14} {\mathbf{\hat J}}_{MAP} = \arg \; \max_{{\mathbf{\hat J} } } \;\ln p({\mathbf{\hat J} }|{\mathbf{B} }) = \;\ln p({\mathbf{B} }|{\mathbf{\hat J} }) + \ln p({\mathbf{\hat J} }) = \ln p_\Upsilon ({\mathbf{B} } - {\mathbf{L\hat J} }) + \ln p({\mathbf{\hat J} })\ .\]
Maximizing the log posterior (which is a monotonic function) rather than the posterior, illustrates how these MAP approaches are equivalent to algorithms that minimize p-norm-like measures \[\tag{15} \min_{\mathbf{\hat J}} {\rm{ }}\left\| {{\mathbf{B} } - {\mathbf{L\hat J} } } \right\|_F^2 + \lambda {\mathop{\rm sgn} } (p)\,\sum_{{\rm{i = 1} } }^{\rm{n} } {\left\| {{\mathbf{\hat J} }{(i,\;:)} } \right\|_q^p } ,\;\forall {\rm{ } }\left\| {{\mathbf{\hat J} }{(i,\;:)} } \right\|_q \ne 0\]
where \(\lambda\) is the noise regularization parameter, which can be fixed across iterations, learned, stabilized (e.g., l-curve, generalized cross validation), or adjusted to achieve a desired representation error using the discrepancy principle, \[\tag{16} {\rm{ }}\left\| {{\mathbf{B} } - {\mathbf{L\hat J} } } \right\|_F^2 = \varepsilon = {\rm{ } }\left\| \Upsilon \right\|_F^2\ .\]
MAP estimates using a Gaussian prior (p=2) are equivalent to noise-regularized minimum-l2-norm solutions, \[\tag{17} {\mathbf{\hat J}} = {\mathbf{L}}^T \left( {{\mathbf{LL} }^T + \lambda {\mathbf{I} } } \right)^{ - 1}{\mathbf{B} }\ ,\]
which are widely distributed and suffer from depth bias (i.e., deep sources are mislocalized to superficial source-points) (Ah92; Wa92; Ge98). Weighted minimum-norm algorithms can be used to partially compensate for this depth bias. The inverse operator is then given by \[\tag{18} {\mathbf{\hat J}} = {\mathbf{W}}_a {\mathbf{W}}_a^T {\mathbf{L}}^T \left( {{\mathbf{LW} }_a {\mathbf{W} }_a^T {\mathbf{L} }^T + \lambda {\mathbf{I} } } \right)^{ - 1} {\mathbf{B} }\]
where \({\mathbf{W}}_a\) is a diagonal matrix (e.g., \({\mathbf{W}}_a = diag\left( {\left\| {l_i } \right\|_2^{ - 1} } \right)\ ,\) \({\mathbf{W}}_a = diag\left( {\left\| {l_i } \right\|_2^{ - 1/2} } \right) \ ,\) 3-D Gaussian function, or fMRI priors) (Io90; Fu99). Non-diagonal \({\mathbf{W}}_a {\mathbf{W}}_a^T\) matrices can be used to incorporate source covariance and smoothness constraints (Pa99). To suppress correlated noise the matrix \(\lambda {\mathbf{I}}\) can be replaced with the noise covariance matrix, \(C_\upsilon\ ,\) which can be estimated from the data. Such approaches are often called adaptive. Likewise, MAP estimates using a Laplacian prior (p=1) are equivalent to obtaining minimum-l1-norm solutions. These are usually computed using linear programming, and have been called recently minimum current estimates (MCE) (Ma95; Uu99), but can be computed efficiently for a time-series using a generalized Focal Underdetermined System Solver (FOCUSS) algorithm \[\tag{19} {\mathbf{\hat J}}_k = {\mathbf{W}}_k {\mathbf{W}}_k^T {\mathbf{L}}^T \left( {{\mathbf{LW} }_k {\mathbf{W} }_k^T {\mathbf{L} }^T + \lambda {\mathbf{I} } } \right)^{ - 1} {\mathbf{B} }\ ,\]
where \[\tag{20} {\mathbf{W}}_k = {\mathbf{W}}_a diag\left( {\left\| {{\mathbf{\hat J} }_{k - 1} (i,:)} \right\|_q^{1 - {\textstyle{p \over 2} } } } \right)\]
and \({\mathbf{\hat J}}_{k - 1}\) is the previous estimated matrix at iteration k-1 (Go95; Go97; Ra99; Ra02; Co05; Ra05; Ra08). Importantly, FOCUSS is equivalent to an Expectation Maximization (EM) algorithm used to find MAP estimates in a hierarchical Bayesian framework in which the hyperparameters are integrated out and the prior pdf is a Gaussian scale mixture (Wi07b; Fi02). To simultaneously localize a very long time-series of any length very quicky, the times-series matrix \(\mathbf{B}\) or its covariance matrix can be decomposed with the SVD and replaced in eq. 19 with the matrix \(\mathbf{U}\left(\mathbf{S}\right)^{-1/2}\ ,\) where \(\mathbf{U}\) and \(\mathbf{S}\) are the left singular vectors and singular values matrices (up to whatever rank desired), respectively. Although FOCUSS works optimally with a Laplacian prior (p=1), it can also be used to find MAP estimates with different generalized Gaussian priors. When p=-2, the Magnetic Field Tomography (MFT) algorithm is recovered if the update rule is based on the current modulus, there is only one iteration, and the a priori weight matrix is a 3D Gaussian used for depth bias compensation (Io90; Ri91; Ta99). Importantly, MAP estimates have different modes for different \(\mathbf{W}_a\) matrices (Wi07b), and computer simulations have shown that \({\mathbf{W}}_a = diag\left( {\left\| {l_i } \right\|_2^{ - 1} } \right)\) and p=1 work optimally (Ra05). If one is not sure whether one should use a Gaussian or Laplacian prior, one can use Markov chain Monte Carlo (MCMC) methods to learn which lp-norm is optimal for that particular data set (Au05).
Another approach to compensate for depth bias is the noise-normalized dynamic statistical parametric map (dSPM) technique (Da00; Li02). First, the linear inverse operator, which is equivalent to a correlated noise-regularized pseudoinverse or Wiener operator, is computed by \[\tag{21} {\mathbf{P}} = {\mathbf{C}}_s {\mathbf{L}}^T \left( {{\mathbf{LC} }_s {\mathbf{L} }^T + {\mathbf{C} }_n } \right)^{ - 1}\]
where \(\mathbf{C}_s\) is the source covariance matrix, usually assumed to be the identity matrix. Then the noise- normalized operator is computed which in the case of fixed dipole orientations is given by \[\tag{22} {\mathbf{P}}_{norm} = diag\left( v \right)^{ - 1/2} {\mathbf{P}}\ ,\]
where \(v = diag\left( {{\mathbf{PC} }_n {\mathbf{P} }^T } \right)\ .\) Note that dSPM performs noise-normalization after the inverse operator \(\mathbf{P}\) has been computed.
An alternative approach for depth-bias compensation is the sLORETA technique (Pa02). In contrast to the dSPM method, the inverse operator is weighted as a function of the resolution matrix, \(\mathbf{R=PL}\ ,\) that is associated with the inverse and forward operators \(\mathbf{P}\) and \(\mathbf{L}\ .\) For fixed dipole orientations, the pseudo-statistics of power and absolute activation are respectively given by \[\tag{23} \varphi_i=\frac{\mathbf{j}_i^2}{\mathbf{R}_{ii}} \text{ and } \phi_i=\sqrt{\varphi_i}\ .\] Thus, the standardized sLORETA inverse operator is given by \[\tag{24} \mathbf{P}_{sloreta}=diag(r)^{-1/2}\mathbf{P}\]
where \(r=diag(\mathbf{R})\ .\) Interestingly, the sLORETA algorithm is equivalent to the first step of the Sparse Bayesian (SBL) algorithm explained in the next section.
Instead of finding MAP point estimates using fixed priors, one can use the evidence framework of sparse Bayesian learning (SBL) to learn adaptive parametrized priors from the data itself (Ma92; Ne96; Ti01; Wi04; Sa04; Ra05; Ra06a; Ra07a; Nu07a; Ra08). This approach is an extremely important alternative because the posterior mode may not be representative of the full posterior, and thus, a better point estimate may be obtained, the posterior mean, by tracking the posterior probability mass. This is achieved by maximizing a tractable Gaussian approximation of the evidence, also known as the type-II likelihood or marginal likelihood \[\tag{25} p({\mathbf{B}};\Sigma _{\mathbf{B}} ) = \int {p({\mathbf{B}}|{\mathbf{J}})p({\mathbf{J}};\Sigma _{\mathbf{B}} )} dJ = N(0,\Sigma _{\mathbf{B}} )\]
or equivalently by minimizing the negative log marginal likelihood \[\tag{26} L\left( \gamma \right) = - \log p({\mathbf{B}};\Sigma _{\mathbf{B}} ) = d_v \log \left| {\Sigma _{\mathbf{B}} } \right| + trace\left( {{\mathbf{B} }^T \Sigma _{\mathbf{B} }^{ - 1} {\mathbf{B} } } \right)\]
where \(\Sigma _{\mathbf{B}} = {\mathbf{L}}\Sigma _{\mathbf{J}} {\mathbf{L}}^T + \Sigma _\varepsilon\ .\) The diagonal matrix \(\Sigma _{\mathbf{J}} = \Gamma = diag(\gamma _i )\) is the prior source covariance matrix which contains the vector of hyperparameters on the diagonal (i.e., the variances). In the ARD framework the precisions (i.e., inverse variances) are Gamma distributed. The matrix \(\Sigma _\varepsilon\) is the noise covariance matrix, which can be assumed to be a multiple of the identity matrix (e.g., \(\sigma_\varepsilon^2\mathbf{I}\ ,\) where \(\sigma_\varepsilon^2\) is the noise variance, a hyperparameter that can also be learned from the data or empirically obtained from the measurements). The evidence maximization is achieved by using an Expectation-Maximization update rule \[\tag{27} \gamma _i^{(k + 1)} = \frac{1}{{d_v r_i } }\left\| {\gamma _i^{(k)} {\mathbf{L} }_{(:,\;i)}^T \left( {\Sigma _{\mathbf{B} }^{(k)} } \right)^{ - 1} {\mathbf{B} } } \right\|_F^2 + \frac{1}{{r_i } }trace\left( {\gamma _i^{(k)} {\mathbf{I} } - \gamma _i^{(k)} {\mathbf{L} }_{(:,\;i)}^T \left( {\Sigma _{\mathbf{B} }^{(k)} } \right)^{ - 1} {\mathbf{L} }_{(:,\;i)}^{} \gamma _i^{(k)} } \right)\]
or alternatively using a fixed-point gradient update rule \[\tag{28} \gamma _i^{(k + 1)} = \frac{1}{{d_v r_i} }\left\| {\gamma _i^{(k)} {\mathbf{L} }_{(:,\;i)}^T \left( {\Sigma _{\mathbf{B} }^{(k)} } \right)^{ - 1} {\mathbf{B} } } \right\|_F^2 \left( {trace\left( {\gamma _i^{(k)} {\mathbf{I} } - \gamma _i^{(k)} {\mathbf{L} }_{(:,\;i)}^T \left( {\Sigma _{\mathbf{B} }^{(k)} } \right)^{ - 1} {\mathbf{L} }_{(:,\;i)}^{} \gamma _i^{(k)} } \right)} \right)^{ - 1}\]
where \(r_i\) is the rank of \({\mathbf{L}}_{(:,\;i)}{\mathbf{L}}_{(:,\;i)}^T\ ,\) and \({\mathbf{L}}_{(:,\;i)}\) is a matrix with column vectors from \(\mathbf{L}\) that are controlled by the same hyperparameter (Ra06b,Wi07b). With fixed dipole orientations \({\mathbf{L}}_{(:,\;i)}\) is a vector, but with loose orientations \({\mathbf{L}}_{(:,\;i)}\) is a \(d_m \times 3\) matrix. For patch source models involving dipoles within a region \({\mathbf{L}}_{(:,\;i)}\) is a matrix containing all gain vectors associated with the local patch of cortex. The gradient udpdate rule is much faster than the EM rule, and is almost identical to the update rule used in the sLORETA/FOCUSS hybrid algorithm, which is not expressed in the ARD framework (Sc05; Ra05). Once the optimal hyperparameters have been learned, the posterior mean is given by \[\tag{29} {\mathbf{\hat J}}=E[{\mathbf{J}}|{\mathbf{B}};\Sigma _{\mathbf{J}} ] = \Sigma _{\mathbf{J}} {\mathbf{L}}^T \left( {\Sigma _{\mathbf{B}} } \right)^{ - 1} {\mathbf{B}}\ .\]
It is important to note that many SBL algorithms are distinguished by the parametrization of the source covariance matrix \(\Sigma _{\mathbf{J}} =\sum_{i = 1}^{d_s } {{\mathbf{C} }_i \gamma _i }\) (Wi07b). In fact, if only a few hyperparameters are used that are controlling many source-points, then the parametrization cannot support sparse estimates. For example, in the restricted maximum likelihood (ReML) algorithm one of the source covariance component is the identity matrix which is controlled by a single hyperparameter (Fr02; Ph05; Ma06; Wi07b).
In standard SBL, \({\mathbf{C}}_i=e_i e_i^T\ ,\) where \(e_i\) is a vector with zeros everywhere except at the ith element, where it is one. This delta function parametrization can be extended to box car functions in which \(e_i\) takes a value of 1 for all three dipole components or for a patch of cortex. More interestingly, the \(e_i\) can be substituted by geodesic basis functions \(\psi _i\) (e.g., a 2-D Gaussian current density function) centered at the ith sourcepoint and with some spatial standard deviation (Sa04; Ra06a; Ra06b; Ra07a; Ra08). More powerfully, the source covariance can be composed of components across many possible spatial scales, by using multiple \(\psi _i\) located at the ith source-point but with different spatial standard deviations (Ra06a; Ra06b; Ra07a; Ra08). This approach can be used to estimate the spatial extent of distributed sources by using a mixture model of geodesic Gaussians at different spatial scales. Such multiscale approach has now been extended also to MAP estimation (Ra07b; Ra08).
Although we have assumed here a non-informative hyperprior on the precisions, \(p(\gamma _i^{-1})=\gamma _i\ ,\) since the degrees of freedom parameter of the Gamma distribution is set to 0, this does not need to be the case (Sa04; Wi07b). The problem of finding optimal hyperpriors to handle multimodal posteriors and to eliminate the use of improper priors has been dealt with by making this parameter non-zero (yet small) or by introducing MCMC stategies (Nu07a; Nu07b). In practice, the non-informative hyperprior works well, and helps avoid the problem of determining the optimal hyperprior. Finally, like explained for MAP estimation, to simultaneously localize a very long time-series of any length very quicky, instead of localizing the times-series matrix \(\mathbf{B}\) one can localize the matrix \(\mathbf{U}\left(\mathbf{S}\right)^{-1/2}\ ,\) where \(\mathbf{U}\) and \(\mathbf{S}\) are the left singular vectors and singular values matrices of \(\mathbf{B}\) (up to whatever rank desired).
The relative strengths of different localization algorithms offer an opportunity to select the most appropriate algorithm, constraints, and priors for a given experiment. If one expects only a few focal sources, then dipole fitting algorithms may be sufficient. If one expects distributed sources, then beamforming, spatial scans, or distributed MAP-based estimation algorithms are appropriate. If one expects sparse sources, then MAP estimation with a Laplacian prior may better reflect the true sources. If one is more interested in finding representative sparse estimates of the whole posterior, then SBL is the right choice. If one expects distributed sources with variable levels of spatial extent, then SBL or MAP (with Laplacian prior) estimation using a mixture model of multiscale basis functions is recommended.
Internal references
Volume Conduction, Electroencephalography (EEG), Magnetoencephalography (MEG), Event-Related Brain Dynamics, Functional Imaging, Functional Magnetic Resonance Imaging