While in a majority of biomedical modalities, images are produced directly, in MRI they are only obtained after a reconstruction process. In this chapter, we present the current approaches to MRI reconstruction. We recall in Section 3.1 the continuous model of the datacollection process in MRI and derive its discretized version. In Section 3.2, we review the different approaches that lead to linear reconstructions and we introduce the key concepts of the inverse problem formalism. This approach maps image reconstruction into an optimization problem with the possibility to impose a priori constraints to distinguish the solution from other possible candidates and improve reconstruction quality. We legitimate in Section 3.3 the use of sparsitypromoting priors in MRI and explain how they can be imposed via a proper regularization term. Finally, we review in Section 3.4 the algorithmic procedures that are theoretically capable of achieving the desired reconstructions while being suited to the practical constraints encountered in MRI.
A radiofrequency pulse is emitted to initiate nuclear magnetic resonance (NMR). It excites the spins in a 2D plane or a 3D volume, depending of the type of acquisition format. After excitation, the excited spins behave as radiofrequency emitters and have their precessing frequency and phase modified depending on their positions. This is achieved thanks to the timevarying magnetic gradient fields that are applied during the relaxation, defining a trajectory k in the kspace domain. The modulated part of the signal received by a coil of sensitivity S_{i}(r) is given by
m_{i}(k) =  ∫ 
 S_{i}(r)ρ(r)e^{−2jπk·r}dr. (1) 
The signal ρ is referred to as object. This signal is proportional to the spin density, but might also depend upon other local characteristics. More details on the derivation of relation (1) are provided in Chapter 2.
For an array of R receiving coils with sensitivities denoted by S_{1}⋯ S_{R} and a kspace trajectory sampled at N points k_{n}, we represent the measurements concatenated in a global RN× 1 vector
m=  ⎛ ⎝  ⎛ ⎝  m_{1,1},…,m_{N,1}  ⎞ ⎠  ,…  ⎛ ⎝  m_{1,i},…,m_{N,i}  ⎞ ⎠  ,…  ⎛ ⎝  m_{1,R},…,m_{N,R}  ⎞ ⎠  ⎞ ⎠  . 
From here on, we consider that the Fourier domain and, in particular, the sampling points k_{n}, are scaled to make the Nyquist sampling interval unity. This can be done without any loss of generality if the space domain is scaled accordingly. Therefore, we model the object as a linear combination of pixeldomain basis functions ϕ_{p} that are shifted replicates of some generating function ϕ, so that

Given a sampled version of the coil sensitivity s_{i}[p], the sensitivityweighted object is modeled by
S_{i}ρ= 
 s_{i}[p]c[p]ϕ_{p}. (4) 
The standard implicit choice for ϕ is Dirac’s delta even if it is hardly justified from an approximation theoretic point of view. Different discretizations have been proposed, for example by Sutton et al. [38] with ϕ as a boxcar function or later by Delattre et al. [39] with Bsplines. It is only recently [40] that the details have been worked out to get back the image for general ϕ that are noninterpolating, which is the case for instance for Bsplines of degree greater than 1. The image to be reconstructed—i.e., the sampled version of the object ρ(p)—is obtained by filtering the coefficients c[p] with the discrete filter
P  ⎛ ⎝  e^{jω}  ⎞ ⎠  = 

 ⎛ ⎝  ω+2πh  ⎞ ⎠  , (5) 
where ϕ^{^} denotes the Fourier transform of ϕ.
Since a finite field of view (FOV) determines sets of coefficients c and s_{i} with a finite number M of elements, we handle them as a vectors c and s_{i}, keeping the discrete coordinates p as implicit indexing.
Due to sparsity properties that are discussed later in this chapter, it might be preferable to represent the object in terms of wavelet coefficients. In the wavelet formalism, some constraints apply to ϕ. It must be a scaling function that satisfies the properties for a multiresolution [41]. In that case, the wavelets can be defined as linear combinations of the ϕ_{p} and the object is equivalently characterized by its coefficients in the orthonormal wavelet basis. We refer to Mallat’s reference book [42] for a full review on wavelets. There exists a discrete wavelet transform (DWT) that bijectively maps the coefficients c to the wavelet coefficients w that represent the same object ρ in a continuous wavelet basis. In the rest of the chapter, we represent this DWT by the synthesis matrix W. Note that the matrixvector multiplications c = Ww and w = W^{−1}c have efficient filterbank implementations.
The dataformation model (1) and the object parameterization (4) are combined to model the measurement corresponding to every point k_{n} sampled in kspace. Accordingly, the measurement vector m is related to the coefficients c through the linear relation
m = Ec, (6) 
where the MRI encoding matrix E is formed as
E =  ⎛ ⎝  I_{R}⊗E_{0}  ⎞ ⎠ 
 ^{T}, (7) 
with the symbol ⊗ standing for the Kronecker product, I_{R} representing the R× R identity matrix, and E_{0} being the encoding matrix for the same MRI scan with a single homogeneous receiving coil
E_{0} = diag  ⎛ ⎜ ⎜ ⎝  ⎡ ⎢ ⎢ ⎣ 
 (2πk_{1}),…, 
 (2πk_{N})  ⎤ ⎥ ⎥ ⎦  ⎞ ⎟ ⎟ ⎠ 
 ^{T}. (8) 
There, v_{n} are vectors indexed like c such that v_{n}[p] = exp(−2jπk_{n}·p).
Due to the presence of noise and other scanner inaccuracies, the introduction of a new term b, accounting for an additive perturbation, makes the dataformation model
m = E c+b (9) 
more realistic. Equivalently, if the parameters of interest are the wavelet coefficients w, the model writes
m = M w+b (10) 
with M = EW.
In MRI, the major source of noise is a radiofrequency signal originating from the thermal motion in the object under investigation. When observed with a receiving array of coils, this noise presents nonnegligible correlations across channels. In other terms, the R× R channel crosscorrelation matrix Θ has nonnull offdiagonal entries. Accordingly, the additive perturbation is generally modeled as the realization of a centered multivariate Gaussian process b∼ N(0, Ψ) with covariance matrix Ψ=Θ⊗I_{N}.
The problem of imaging is to recover the M coefficients c (or equivalently w) from the N corrupted measurements m. In this section, we review the popular approaches that lead to reconstructions that depend linearly upon the observations. We show that they are functionally equivalent. Two of these approaches rely on a stochastic interpretation of the problem, where the matrices Ψ and Υ are the known covariance matrices of the noise b and the object c, respectively. The corresponding global variances are given by v_{n} = Tr(Ψ)/N=Tr(Θ) and v_{s} = Tr(Υ)/M. We define the normalized covariance matrices as Ψ_{0}=Ψ/v_{n} and Υ_{0}=Υ/v_{s}. Most linear solutions involve a balancing parameter λ which is necessarily positive and can be interpreted in terms of the signaltonoise ratio λ^{−1}=Tr(Υ)/Tr(Ψ)=M v_{s}/(N v_{n}).
Depending on the scanner settings, the encoding matrix E is generally neither square nor invertible. In such cases, the MoorePenrose pseudoinverse offers a solution to the reconstruction problem. The reconstruction matrix is then defined as
E^{†} = 

 ^{−1}E^{H}. (11) 
The Hermitian matrix transpose is used, denoted by the superscript ^{H}, because in MRI matrices have complexvalued entries. The problem of inverting a nonsquare matrix is tackled by considering the backprojected^{1} problem
E^{H}m = E^{H}Ec, (12) 
because the matrix E^{H}E is square.
Considering the singular value decomposition E=UΣV^{H}, where Σ is an RN× M matrix whose diagonal entries are the singular values σ_{n}, one gets E^{†}=VΣ^{†}U^{H}, with singular values
σ_{n}^{†}=  ⎧ ⎨ ⎩ 

The major concern with pseudoinverse reconstruction resides in the propagation of noise. Indeed, very small but nonnull singular values lead to drastic amplification of the corresponding noise components. This effect is quantified by the condition number, defined as κ(E)=max_{n} σ_{n}/min_{n} σ_{n}. This number, which is greater or equal to 1, is also representative of the numerical challenge faced when inverting E. A linear inverse problem is termed “illconditioned” when the corresponding condition number is large. When the null space of E is not limited to {0}, the problem is said to be illposed.
The aim of regularized reconstruction schemes is to improve reconstruction with respect to the pseudoinverse approach by limiting the propagation of noise in the images.
It is remarkable that (12) rewrites as ∇_{c}(m−Ec _{2}^{2})=0, with ∇_{c} standing for the gradient operator with respect to c. The MorePenrose pseudoinverse provides a leastsquares solution c^{⋆}=E^{†}m to the reconstruction problem because it ensures E^{H}m=E^{H}Ec^{⋆}. This leastsquares solution makes sense when the noise term b is independent and identically distributed.
Instead, when the noise correlation matrix Ψ_{0} is available, this knowledge can be exploited using the weighted pseudoinverse
E_{X}^{†} = 

 ^{−1}E^{H}X (13) 
with the weighting matrix X=Ψ_{0}^{†}. The interest of that type of solution is that it takes into account noise correlations and that it relies less on the noisier samples. Thanks to the relation E^{H}XEE_{X}^{†}m=E^{H}Xm, the weighted pseudoinverse provides a (weighted) leastsquares solution.
The approach proposed by Phillips [43] and Twomey [44], for finite dimensional problems, and by Tikhonov [45], for infinite dimensional problems, defines the reconstruction as the minimization of the functional
⎪⎪ ⎪⎪  m − E c  ⎪⎪ ⎪⎪  _{X}^{2} + λ  ⎪⎪ ⎪⎪  Rc  ⎪⎪ ⎪⎪  ^{2}, (14) 
where the notation · _{X} with X positivedefinite stands for a weighted norm such that v _{X}^{2}=v^{H}Xv. The functional is a tradeoff between a fidelity term, which enforces consistency with the measurements, and a regularization term, which penalizes nonregular solutions with respect to the regularization matrix R. The tuning parameter λ balances the influence of these two terms. The role of the regularization term is to limit the amplification of noise that can be dramatic for illconditioned problems (in MRI, see for instance [46]). In practice, it is often designed with a derivation operator to favor smooth solutions. Similar to the weighted pseudoinverse solution, the weighting matrix can be chosen as X=Ψ_{0}^{†} yielding a reconstruction matrix that gives importance to the samples in inverse proportion to their level of noise. Another common choice is to take X diagonal such as to compensate for an inhomogeneous kspace sampling density [37]. This choice facilitates the reconstruction.
The minimization of a quadratic functional yields a linear solution. Indeed, by taking the gradient of the functional and setting it to zero, we find that the reconstruction matrix writes
F_{QUAD} = 
 ^{−1}E^{H}X. (15) 
Here, the reconstruction problem is tackled within a stochastic framework. The unknowns c and b are modeled as realizations of centered multivariate Gaussian distributions: c∼ N(0, Υ) and b∼ N(0, Ψ).
According to the numerical model (9), the measurements also follow a multivariate Gaussian distribution m∼ N(0, EΥE^{H}+Ψ).
The maximum a posteriori solution (MAP) c is the vector that maximizes the posterior distribution given the measurements m. Using Bayes’ theorem, the probability density function of the posterior distribution of c writes
p(c ∣ m) ∝ p(m ∣ c) p(c). 
In the present stochastic setting, the probability density function can be expanded in
p(c ∣ m) ∝ exp  ⎛ ⎝  −  ⎪⎪ ⎪⎪  m−Ec  ⎪⎪ ⎪⎪  _{Ψ†}^{2}  ⎞ ⎠  exp  ⎛ ⎝  −  ⎪⎪ ⎪⎪  c  ⎪⎪ ⎪⎪  _{Υ†}^{2}  ⎞ ⎠  . (16) 
Finally, the MAP solution is the vector c that minimizes the functional
⎪⎪ ⎪⎪  m−Ec  ⎪⎪ ⎪⎪  _{Ψ0†}^{2}+λ  ⎪⎪ ⎪⎪  c  ⎪⎪ ⎪⎪  _{Υ0†}^{2}. (17) 
We introduced the normalized covariance matrices in the later expression in order to have the parameter λ, which is the inverse of the signal to noise ratio, appear explicitly.
Similarly to the previous approaches, the functional to be minimized is composed of quadratic terms. As a consequence, the solution is linear, characterized by the reconstruction matrix
F_{MAP} = 
 ^{−1}E^{H}Ψ_{0}^{†}. (18) 
The Gaussian model used in the MAP approach is hardly justified for true MRI images c. This assumption can be substituted by the constraint that the reconstruction is affine with respect to the measurements. Accordingly, we write the reconstructed image Fm+g.
To determine adequate parameters F and g, one can rely on the two first order statistics of the unknown data; that are, the expectation vectors c and b, and covariance matrices Υ and Ψ. According to the dataformation model (9), the expectation and covariance of the reconstruction error e=Fm+g−c are given by
E  ⎡ ⎣  e  ⎤ ⎦  = F(Ec+b) + g − c (19) 
and
E  ⎡ ⎣ 
 ⎤ ⎦  = (FE−I)Υ(FE−I)^{H}+FΨF^{H} + E  ⎡ ⎣  e  ⎤ ⎦ 
 ^{H}. (20) 
An unbiased reconstruction^{2} is obtained when g=c−F(Ec+b). For the choice of F, one would reasonably like to minimize the variance of the reconstruction error. Given that the estimator is unbiased, the variance also corresponds to the expectation of the meansquare error. It is is given by the trace of the covariance matrix
Var  ⎡ ⎣  e  ⎤ ⎦  = Tr  ⎛ ⎝  (FE−I)Υ(FE−I)^{H}  ⎞ ⎠  +Tr  ⎛ ⎝  FΨF^{H}  ⎞ ⎠  . (21) 
Interestingly, this relation reveals two distinct contributions to the error.
The matrix F that minimizes the error variance, also referred to as meansquare error, can be computed using matrix calculus. Using the normalized covariance matrices, it writes
F_{MMSE} = Υ_{0}E^{H}(EΥ_{0}E^{H}+λΨ_{0})^{−1}. (22) 
First, Equations (15) and (18) show that quadratic regularization and MAP approaches are equivalent provided that X=Ψ_{0}^{†} and Υ_{0}^{†}=R^{H}R.^{3}
Second, the three following equalities reveal the connection between MAP and LMMSE solutions, (18) and (22), in the case where both matrices Υ_{0} and Ψ_{0} are invertible:

Last, the weighted pseudoinverse solution with X=Ψ_{0}^{†} corresponds to the other solutions in the limiting case where λ tends to 0. This is also the case for the regular MoorePenrose pseudoinverse when the noise is independent and identically distributed; that is to say Ψ_{0}=I_{RN}/R. As already mentioned, the pseudoinverse solutions are only valid when noise propagation is negligible. This situation occurs with wellconditioned (κ(E)≈ 1) reconstruction problems that are largely overdetermined (M≪ RN) and/or subject to very little noise (Tr(Υ)≫Tr(Ψ)).
There is no particular reason for Ψ_{0} to be singular. Most of the time, the correlation between pixels in the image are not modeled; this translates in a matrix Υ_{0} which is diagonal. When no signal is expected from some pixels of the image, (for instance outside a predetermined ROI), it could be tempting to set to 0 the corresponding entries in Υ_{0}, resulting in a singular matrix. However, a reasonable problem setting would exclude such entries in the unknown vector c, restoring the invertibility of Υ_{0}.
We just saw that the linear approaches to reconstruction can be derived from the solution of some optimization problems. The corresponding functionals were quadratic, yielding closedform solutions. In this section, we consider other approaches that are popular in MRI and which involve nonquadratic regularization terms.
The solution c^{⋆} is defined as the minimizer of a cost function that involves two terms: the data fidelity F(b) and the regularization R(c) that penalizes undesirable solutions. This is summarized as
c^{⋆}= arg 
 F(m − Ec) + λ R(c), (23) 
where the regularization parameter λ≥ 0 balances the two constraints. In MRI, the noise term b=m − Ec is usually assumed to be the realization of a Gaussian process with normalized covariance matrix Ψ_{0}. From a Bayesian point of view, this justifies the choice F(b)=b _{Ψ0†}^{2}=b^{H}Ψ_{0}^{†}b as a proper loglikelihood term. A more practical motivation for this choice is that a quadratic fidelity term yields a simple closedform gradient that greatly facilitates the design and performance of reconstruction algorithms.
When the kspace sampling is dense enough and the signaltonoise ratio is high, the quadratic regularization terms (presented in the previous section) yield satisfying reconstructions. But, the constraints to reduce the scan duration favor setups with reduced SNR and kspace trajectories that present regions of low sampling density. In these situations where the reconstruction problem is more challenging, the reconstructed image can often be enhanced by the use of a more suitable regularization term R(c).
Total Variation (TV) was introduced as an edgepreserving denoising method by Rudin et al. [47]. It is now a very popular approach to tackle image enhancement problems.
The TV regularization term corresponds to the sum of the Euclidean norms of the gradient of the object. In practice, it is defined as R(w)=∇c _{ℓ1}. In this context, the operator ∇ returns pixelwise the ℓ_{2}norm of finite differences. The use of TV regularization is particularly appropriate for piecewiseconstant objects such as the SheppLogan (SL) phantom used for simulations in tomography and MRI. Textured and noisy images exhibit a much larger total variation.
Another popular idea is to exploit the fact that the object can be well represented by few nonzero coefficients (sparse representation) in an orthonormal basis of M functions φ_{p}. Formally, we write that
It is welldocumented that typical MRI images admit sparse representation in bases such as wavelets or block DCT [6]. We illustrate this property in Figure 3.1.
The ℓ_{1}norm is a good measure of sparsity with interesting mathematical properties (e.g., convexity). Thus, among the candidates that are consistent with the measurements, we favor a solution whose wavelet coefficients have a small ℓ_{1}norm. Specifically, the solution is formulated as
w^{⋆}= arg 
 C(w), (24) 
with
C(w) =  ⎪⎪ ⎪⎪  m − M w  ⎪⎪ ⎪⎪  _{ℓ2}^{2}+λ  ⎪⎪ ⎪⎪  w  ⎪⎪ ⎪⎪  _{ℓ1}. (25) 
This is the general solution for waveletregularized inverse problems considered by [19] as well as by many other authors.
MRI gives rise to a largescale inverse problem in the sense that the number of degrees of freedom—that is to say, the unknown pixel values—is large. Consequently, the matrices are generally too large to be stored in memory not to mention the fact that direct matrix multiplication involves too many operations.^{4} We summarize in this section the strategies that make the reconstruction in MRI feasible with reasonable computer requirements and acceptable computation times.
The matrixvector multiplications y=E_{0}x and y=E_{0}^{H}x are two basic operations in MRI reconstruction. They can be implemented efficiently using the FFT algorithm. For nonCartesian samples k_{n}, the gridding method, based on FFT and interpolation, can provide accurate computations (see [48] for instance). Algorithms 1 and 2 describe the implementation of the operations y=E_{0}x and y=E_{0}^{H}x, respectively.
Gridding
(x, p_{1},…, p_{M}, k_{1},…, k_{N}) (going to kspace domain)
Gridding
(x, p_{1},…, p_{M}, k_{1},…, k_{N}) (going to kspace domain)
Gridding
(x, −k_{1},…, −k_{N}, p_{1},…, p_{M}) (going to spatial domain)
An interesting work by Wajer [49] identifies E_{0}^{H}E_{0} as a convolution matrix associated to the kernel
G[p] = 
 ⎪ ⎪ ⎪ ⎪ 
 (2πk_{n})  ⎪ ⎪ ⎪ ⎪ 
 exp  ⎛ ⎝  2jπ k_{n}·p  ⎞ ⎠  . (26) 
When the kernel is precomputed for the lattice points belonging to the set S={ p−q ∣ p∈FOV, q∈FOV}, one can avoid the use of Algorithms 1 and 2. An efficient implementation of the operation y=E_{0}^{H}E_{0}x, which uses zeropadded multidimensional FFTs, is described in Algorithm 3.
FFT
(G) (DFT coefficients)
ZPAD
(x,S) (zeropadding x to the dimensions of G)
FFT
(x) (computing DFT coefficients)
IFFT
(x) (inverse DFT)
Most of the time, in parallel MRI, the covariance matrices are block diagonal. In that case, they are sparse matrices and one can benefit from the related efficient memory storage and matrix operations. As already mentioned, Ψ_{0} is fully characterized by the channel crosscorrelation matrix Θ_{0}=Θ/v_{n} such that Ψ_{0}=Θ_{0}⊗I_{N}. Its pseudoinverse or inverse is then given by Ψ_{0}^{†}=Θ_{0}^{†}⊗I_{N}. The matrixvector multiplications with E^{H}Ψ_{0}^{†} and E^{H}Ψ_{0}^{†}E are implemented as described in Algorithms 4 and 5, respectively.
The conjugate gradient method (CG) [50] is an iterative algorithm that is among the most efficient in solving largescale linear problems Ac=b, characterized by symmetric and positivedefinite matrices A. The only operations involving the matrix A are matrixvector multiplications Ax. In parallel MRI, it is the method of reference [37] to perform linear reconstructions. The quadraticregularized solution characterized by the reconstruction matrix in (15) is computed with CG solving the linear problem defined by the matrix A=E^{H}XE+λR^{H}R and vector b=E^{H}Xm.
The idea of the method is to decompose the solution in a basis of mutually conjugate vectors; that is to say c=∑_{i} α_{i}p_{i}, with p_{i}^{H}Ap_{j}=0 for i≠ j. At iteration i, the estimate is c_{i}=∑_{j≤ i}α_{j}p_{j} and the corresponding residue writes r_{i}=b−Ac_{i}. For the next direction, the choice p_{i+1}=r_{i}−∑_{j≤ i}(p_{j}^{H}Ar_{i})p_{j}/p_{j} _{A} ensures the conjugacy constraint. In this direction, the coefficient α_{i+1}=Re(p_{i+1}^{H}Ar_{i})/p_{i+1} _{A} is optimal with respect to the cost C(c)=c^{H}Ac−c^{H}b−b^{H}c. An efficient implementation of the method is described in Algorithm 6.
The CG algorithm theoretically converges within a finite number of iterations. In practice, this result is compromised by the propagation of roundoff errors. In the context of MRI, the property of practical interest is the linear convergence rate achieved by CG. Indeed, the distance to the desired solution decreases as a power of the iteration number, with the convergence rate
0≤ r(A)=  ⎛ ⎜ ⎝  √ 
 −1  ⎞ ⎟ ⎠  /  ⎛ ⎜ ⎝  √ 
 +1  ⎞ ⎟ ⎠  <1. 
When the condition number κ(A) is large, the rate r(A) gets close to the unity, characterizing a slower convergence. Using the weighted norm x _{A}=√x^{H}Ax, the distance is upperbounded by
⎪⎪ ⎪⎪  c_{i}−c^{⋆}  ⎪⎪ ⎪⎪  _{A}≤ 2  ⎪⎪ ⎪⎪  c_{0}−c^{⋆}  ⎪⎪ ⎪⎪  _{A}r(A)^{i}. (27) 
With the regular Euclidean distance, the bound is looser
⎪⎪ ⎪⎪  c_{i}−c^{⋆}  ⎪⎪ ⎪⎪  _{2} ≤ 2κ(A)  ⎪⎪ ⎪⎪  c_{0}−c^{⋆}  ⎪⎪ ⎪⎪  _{2}r(A)^{i}. (28) 
The Iteratively Reweighted LeastSquares algorithm (IRLS), which is also known as the positive form of halfquadratic minimization [51], can be used to compute the solutions defined as
c^{⋆}= arg 
 ⎪⎪ ⎪⎪  m−Ec  ⎪⎪ ⎪⎪  _{X}^{2} + λ  ⎪⎪ ⎪⎪  Rc  ⎪⎪ ⎪⎪  _{ℓp}^{p}. (29) 
In this context, the functional is strictly convex for p>1. This condition ensures the unicity of the minimizer.
The principle of IRLS is to design an upperbounding quadratic proxy for the regularization term, tailored to the neighborhood of c_{i}. In practice, one chooses the functional
Z_{i}(c)=p/2  ⎪⎪ ⎪⎪  Rc  ⎪⎪ ⎪⎪  _{Di}^{2} + (1−p/2)  ⎪⎪ ⎪⎪  Rc_{i}  ⎪⎪ ⎪⎪  _{ℓp}^{p}, (30) 
where D_{i} is a diagonal matrix with entries (Rc_{i})_{n}^{p−2}. It has the following desirable properties
An implementation of the IRLS is described in Algorithm 7.
CG
(A_{i},a,c_{i}) (using Algorithm 6)
Let us remember that for p≤ 1 the minimization problem might not admit a unique solution. When the minimizer c^{⋆} is unique, it is also the unique fixedpoint of the algorithm. As long as p<2, the sequence of functional values C(c_{i})=m−Ec_{i} _{X}^{2} + λRc_{i} _{ℓp}^{p} generated by the IRLS is monotonically decreasing. This guarantees the convergence since the sequence is lowerbounded by the finite quantity C^{⋆}=min_{c}m−Ec _{X}^{2} + λRc _{ℓp}^{p}.
The IRLS algorithm can be simply adapted in order to solve the minimization with mixednorm regularization terms. A particular case is the total variation penalty which corresponds to the ℓ_{1}norm of the pixelwise ℓ_{2}norm of the spatial gradient [52]. The IRLS algorithm for TV regularization was first proposed by Wohlberg and Rodríguez [53]. It is described in Algorithm 8.
CG
(A_{i},a,c_{i}) (using Algorithm 6)
Dualitybased algorithms proved to be an efficient alternative to achieve TV regularization [54,55].
The Iterative Shrinkage/Thresholding Algorithm (ISTA)[18,17,19], also known as thresholded Landweber (TL), aims at minimizing the functional
C(w)=  ⎪⎪ ⎪⎪  m−Mw  ⎪⎪ ⎪⎪  _{X}^{2} + λ  ⎪⎪ ⎪⎪  w  ⎪⎪ ⎪⎪  _{ℓ1}. (31) 
Here, we use the notation w because ISTA is often applied on wavelet coefficients.
An important observation to understand ISTA is to see that the nonlinear shrinkage operation, sometimes called softthresholding, solves a minimization problem[56], with

By separability of norms, this applies componentwise to vectors of ℂ^{N}:
T_{λ}(u) = arg 
 ⎪⎪ ⎪⎪  u−w  ⎪⎪ ⎪⎪  _{ℓ2}^{2} + λ  ⎪⎪ ⎪⎪  w  ⎪⎪ ⎪⎪  _{ℓ1}. 
This means that the ℓ_{1}regularized denoising problem (i.e., when M and X are identity matrices) is precisely solved by a shrinkage operation.
The ISTA generates a sequence of estimates w_{i} that converges to the minimizer w^{⋆} of (31) when it is unique. The idea is to define at each step a new functional C′(w,w_{i}) whose minimizer w_{i+1} will be the next estimate
w_{i+1} = arg 
 C′(w,w_{i}). (33) 
Two constraints must be considered for the definition of C′.
In accordance with Constraint 1, C′ can take the generic quadratically augmented form
C′(w,w_{i}) = C(w) +  ⎪⎪ ⎪⎪  w−w_{i}  ⎪⎪ ⎪⎪  _{Λ−MHXM}^{2}, (34) 
with the constraint that (Λ−M^{H}XM) is positive definite, where the weighting matrix Λ plays the role of a tuning parameter.
Then, ISTA corresponds to the trivial choice Λ=L/2I, with the value of L chosen to be greater or equal to the Lipschitz constant of the gradient of Mw _{X}^{2}, so that L≥ 2λ_{max}( M^{H}XM).
Let us define a = M^{H}Xm, A = M^{H}XM, and
z_{i}=w_{i}+2(a−Aw_{i})/L. (35) 
Then, using standard linear algebra, we can write

This shows that Constraint 2 is automatically satisfied.
Note that both the intermediate variable z_{n} in (35) and the threshold values will vary depending on L.
Beck and Teboulle[20, Thm. 3.1] showed that this algorithm decreases the cost function in direct proportion to the number of iterations i.
C(w_{i})− C(w^{⋆})≤ 
 ⎪⎪ ⎪⎪  w_{i0}−w^{⋆}  ⎪⎪ ⎪⎪  _{ℓ2}^{2}. (36) 
Selecting L as small as possible will clearly favor the speed of convergence. It also raises the importance of a “warm” starting point.
Among the variants of ISTA, FISTA, proposed by Beck and Teboulle[20], ensures stateoftheart convergence properties while preserving a comparable computational cost. Thanks to a controlled overrelaxation at each step, FISTA quadratically decreases the cost function, with
C(w_{i})− C(w^{⋆}) ≤ 
 ⎪⎪ ⎪⎪  w_{0}−w^{⋆}  ⎪⎪ ⎪⎪  _{ℓ2}^{2}. (37) 
More details on FISTA, as a particular case of FWISTA with the trivial choice Λ=L/2I, can be found in Section 5.2.3.
An implementation of FISTA is given Algorithm (10).