In this chapter,1 we exploit the fact that wavelets can represent magnetic resonance images well, with relatively few coefficients. We use this property to improve MRI reconstructions from undersampled data with arbitrary k-space trajectories. Reconstruction is posed as an optimization problem that could be solved with the iterative shrinkage/ thresholding algorithm (ISTA) [17,18,19] which, unfortunately, converges slowly. To make the approach more practical, we propose a variant that combines recent improvements in convex optimization and that can be tuned to a given specific k-space trajectory. We present a mathematical analysis that explains the performance of the algorithms. Using simulated and in vivo data, we show that our nonlinear method is fast, as it accelerates ISTA by almost two orders of magnitude. We also show that it remains competitive with TV regularization in terms of image quality.
MRI scanners provide data that are samples of the spatial Fourier transform (also known as k-space) of the object under investigation. The Shannon-Nyquist sampling theory in both spatial and k-space domains suggests that the sampling density should correspond to the field of view (FOV) and that the highest sampled frequency is related to the pixel width of the reconstructed images. However, constraints in the implementation of the k-space trajectory that controls the sampling pattern (e.g., acquisition duration, scheme, smoothness of gradients) may impose locally reduced sampling densities. Insufficient sampling results in reconstructed images with increased noise and artifacts, particularly when applying gridding methods.
The common and generic approach to alleviate the reconstruction problem is to treat the task as an inverse problem . In this framework, ill-posedness due to a reduced sampling density is overcome by introducing proper regularization constraints. They assume and exploit additional knowledge about the object under investigation to robustify the reconstruction.
Earlier techniques used a quadratic regularization term, leading to solutions that exhibit a linear dependence upon the measurements. Unfortunately, in the case of severe undersampling (i.e., locally low sampling density) and depending on the strength of regularization, the reconstructed images still suffer from noise propagation, blurring, ringing, or aliasing errors. It is well known in signal processing that the blurring of edges can be reduced via the use of nonquadratic regularization. In particular, ℓ1-wavelet regularization has been found to outperform classical linear algorithms such as Wiener filtering in the deconvolution task .
Indicative of this trend as well is the recent advent of Compressed Sensing (CS) techniques in MRI [6,9]. These let us draw two important conclusions.
Many recent works in MRI have focused on nonlinear reconstruction via Total Variation (TV) regularization, choosing finite differences as a sparsifying transform [6,8,10,12]. Nonquadratic wavelet regularization has also received some attention [6,73,11,13,74,14,16], but we are not aware of a study that compares the performance of TV against ℓ1-wavelet regularization.
Various algorithms have been recently proposed for solving general linear inverse problems subject to ℓ1-regularization. Some of them deal with an approximate reformulation of the ℓ1 regularization term. This approximation facilitates reconstruction sacrificing some accuracy and introducing extra degrees of freedom that make the tuning task laborious. Instead, the iterative shrinkage/ thresholding algorithm [17,18,19] (ISTA) is an elegant and nonparametric method that is mathematically proven to converge. A potential difficulty that needs to be overcome is the slow convergence of the method when the forward model is poorly conditioned (e.g., low sampling density in MRI). This has prompted research in large-scale convex optimization on ways to accelerate ISTA. The efforts so far have followed two main directions:
In this work, we exploit the possibility of combining and tailoring the two generic types of accelerating strategies to come up with a new algorithm that can speedup the convergence of the reconstruction and that can accommodate for every given k-space trajectory. Here, we first consider single-coil reconstructions that do not use sensitivity knowledge. In a second time, we confirm the results with SENSE reconstructions .
We propose a practical reconstruction method that turns out to sensibly outperform linear reconstruction methods in terms of reconstruction quality, without incurring the protracted reconstruction times associated with nonlinear methods. This is a crucial step in the practical development of nonlinear algorithms for undersampled MRI, as the problem of fixing the regularization parameter is still open. We also provide a mathematical analysis that justifies our algorithm and facilitates the tuning of the underlying parameters.
This chapter is structured as follows: In Section 5.2, we propose a fast algorithm for solving the nonlinear reconstruction problem and present theoretical arguments to explain its superior speed of convergence. Finally, we present in Section 5.3 an experimental protocol to validate and compare our practical method with existing ones. We focus mainly on reconstruction time and signal-to-error ratio (SER) with respect to the reference image.
In this section, we present reconstruction algorithms that handle constraints expressed in the wavelet domain while solving the classical ℓ1-regularized minimization problem
|m − M w||⎪⎪|
that is justified in Section 3.3.2.
By introducing weighted norms instead of simple Lipschitz constants, we revisit the principle of ISTA algorithm and simplify the derivation and analysis of this class of algorithms. We end up with a novel algorithm that combines different acceleration strategies and we provide a convergence analysis. Finally, we propose an adaptation of the fast algorithm to implement the random-shifting technique that is commonly used to improve results in image restoration.
The standard algorithm ISTA is particularly well-suited for this minimization task. Let us recall the key properties of ISTA that are detailed in Section 3.4.4.
In ISTA, each iteration aims at minimizing in a simple shrinkage step a surrogate functional that is locally tailored to the objective. The functional
|C′(w,wi) = C(w) +||⎪⎪|
satisfies the constraints provided that:
ISTA corresponds to the trivial choice Λ=L/2I with L≥ 2λmax( MHXM).
SISTA is an extension of the ISTA that was introduced by Bayram and Selesnick . Here, we propose an interpretation of SISTA as a particular case of (3) with a weighting matrix that replaces advantageously the step size 2/L. The idea is to use a diagonal weighting matrix Λ−1=diag(τ) with coefficients that are constant within a wavelet subband.
Accordingly, SISTA is described in Algorithm 1.
By considering the weighted scalar product ⟨x , y ⟩Λ instead of L/2⟨x , y ⟩, we can adapt the convergence proof of ISTA by Beck and Teboulle (see Proposition 1). This result is new, to the best of our knowledge.
Therefore, by comparing Propositions 1 and 1, an improved convergence is expected. The main point is that, for a “warm” starting point w0 or after few iterations (i0), the weighted norm in (4) can yield significantly smaller values than the one weighted by L/2 in (36). The proof is provided in Appendix B.1.
Bayram and Selesnick  provide a method to select the values of τ for SISTA. To present this result, let us introduce some notations. We denote by s an index that scans all the S wavelet subbands, coarse scale included, by τs the corresponding weight constant, and by Ms the corresponding block of M. We also define γs1,s2=√λmax(Ms2HMs1Ms1HMs2). The authors of  show that, for each subband, the condition
is sufficient to impose the positive definiteness of (Λ−MHM) that is required in Equation 3. In the present context, we propose to compute the values γs,s′ by using the power iteration method, once for a given wavelet family and k-space sampling strategy.
Taking advantage of the ideas developed previously, we derive an algorithm that corresponds to the subband adaptive version of FISTA. In the light of the minimization problem (2), FWISTA generalizes the FISTA algorithm using a parametric weighted norm. We give its detailed description in Algorithm 2, where the difference with respect to FISTA resides in using the SISTA step.
In the same fashion as for SISTA, we revisit the convergence results of FISTA [20, Thm. 4.4] for FWISTA.
The proof is provided in Appendix B.2.
This result shows the clear advantage of FWISTA compared to ISTA (Proposition 1) and SISTA (Proposition 1). Moreover, we note that FWISTA can be simply adapted in order to impose a monotonic decrease of the cost functional value, in the same fashion as MFISTA. The same convergence properties apply [79, Thm. 5.1].
Wavelet bases perform well the compression of signals but can introduce artifacts that can be attributed to their relative lack of shift-invariance. In the case of regularization, this can be avoided by switching to a redundant dictionary. The downside, however, is a significant increase in computational cost. Alternatively, the practical technique referred to as random shifting (RS)  can be used. Applying random shifting is much simpler and computationally more efficient than considering redundant transforms and leads to sensibly improved reconstruction.
Here, we propose a variational interpretation that motivates our implementation of FWISTA with RS (see Algorithm 3). We consider the DWT [W1⋯WNs]H, with Ws = SsW, where Ss represent the different shifting operations required to get a translation-invariant DWT. The desired reconstruction would be defined as the minimizer of
|m − E c||⎪⎪|
In 1-D , this formulation includes TV regularization, that is the ℓ1-norm of a single-level undecimated Haar wavelet transform without coarse-scale.
Rewriting (8) in terms of wavelet coefficients, we get
|Ns C(c) =|
|m − MSs−1ws||⎪⎪|
For a current estimate, we select a transform Wi and perform a step towards the minimization of the cost with respect to ws while keeping ws′, for s′≠ s fixed. As the minimization subproblem (10) takes the form (2), a SISTA iteration is appropriate. It is expected to have the minimizers of the functionals Cs and the minimizer of (8) look very similar. In the first iterations of the algorithm, the minimization steps with respect to any ws are functionally equivalent (i.e., the modification is mostly explained by the gradient step). This is why FWISTA can be used to speedup the first iterations. When the solution gets close to the solution ISTA steps are used.
As the scheme is intrinsically greedy, we do not have a theoretical guarantee of convergence. Yet, in practice, we have observed that the SER stabilizes at a much higher value than it does when using ISTA schemes without RS (see Figure 5.4).
Our method is described in Algorithm 3. Note that it has no more matrix-vector multiplications per iteration than ISTA, SISTA and FISTA. Therefore, the computational cost of an iteration is expected to be equivalent.
Our implementation uses Matlab 7.9 (Mathworks, Natick). The reconstructions run on a 64-bit 8-core computer, clock rate 2.8 GHz, 8 GiB RAM (DDR2 at 800 MHz), Mac OS X 10.6.5. For all iterative algorithms, a key point is that matrices are not stored in memory. They only represent operations that are performed on vectors (images). In particular, a=MHm is computed once per dataset. Matrix-to-vector multiplication with A=MHM, specifically, EHE, have an efficient implementation thanks to the convolution structure of the problem [49,82]. For these Fourier precomputations, we used the NUFFT algorithm . For wavelet transforms, we used the code provided online by the authors of . This Fourier-domain implementation proved to be faster than Matlab’s when considering reconstructed images smaller than 256× 256 and the Haar wavelet. It must also be noted that the 2-D DFT was performed using the FFTW library which efficiently parallelizes computations.
For Tikhonov regularizations, we implemented the classical conjugate gradient (CG) algorithm, with the identity as the regularization matrix. For TV regularizations, we considered the iteratively reweighted least-squares algorithm (IRLS), which corresponds to the additive form of half-quadratic minimization [51,70]. We used 15 iterations of CG to solve the linear inner problems, always starting from the current estimate, which is crucial for efficiency. For the weights that permit the quadratic approximation of the TV term, we stabilized the inversion of very small values.
We implemented ISTA, SISTA, and FWISTA as described in Section 5.2, with the additional possibility to use random shifting (see Section 5.3.3). For the considered reconstructions using our method, described in Algorithm 3, K=30 was a reasonable choice. The Haar wavelet transform was used, with 3 decomposition levels when no other values are mentioned. As is usual for wavelet-based reconstructions, the regularization was not applied to the coarse-level coefficients.
Reconstructions were limited to the pixels of the ROI for all algorithms. The regularization parameter λ was systematically adjusted such that the reconstruction mean-squared error (MSE) inside the ROI was minimal. For practical situations where the ground-truth reference is not available, it is possible to adjust λ by considering well-established techniques such as the discrepancy principle, generalized cross validation, or L-curve method .
In this section, we focused on the problem of reconstructing images of objects weighted by the receiving channel sensitivity, given undersampled measurements. This problem, which involves single-channel data and hence differs from SENSE, is challenging for classical linear reconstructions as it generates artifacts and propagates noise. We considered spiral trajectories with 50 interleaves, with an interleave sampling density reduced by a factor R=1.8 compared to Nyquist for the highest frequencies and an oversampling factor 3.5 along the trajectory. Spiral acquisition schemes are attractive because of their versatility and the fact that they can be implemented with smooth gradient switching [84,85].
We validate the results with the three sets of data that we present below. The corresponding reference images are shown in Figure 5.1.
The data were collected on a 3T Achieva system (Philips Medical Systems, Best, The Netherlands). A field camera with 12 probes was used to monitor the actual k-space trajectory . An array of 8 head coils provided the measurements. We acquired in vivo brain data from a healthy volunteer with parameters TR = 1000 ms and TE = 30 ms. The excitation slice thickness was 3 mm with a flip angle of 30∘. The trajectory was designed for a FOV of 25 cm with a pixel size 1.5 mm. It was composed of 100 spiral interleaves. The interleaf distance for the highest frequencies sampled defined a fraction of the Nyquist sampling density (R=0.9).
The subset used for reconstruction corresponds to half of the 100 interleaves. The corresponding reduction factor, defined as the ratio of the distance of neighboring interleaves with the Nyquist distance, is R=1.8.
We used analytical simulations of the Shepp-Logan (SL) brain phantom with a similar coil sensitivity, following the method described in . The values of these simulated data were scaled to have the same mean spatial value (i.e., the same central k-space peak) as the brain reference image. A realization of complex Gaussian noise was added to this synthetic k-space data, with a variance corresponding to 40 dB SNR. The 176× 176 rasterization of the analytical object provided a reliable reference for comparisons.
A second simulation was considered with an object that is more realistic than the SL phantom. We chose a 512× 512 MR image of a wrist that showed little noise and interesting textures. We simulated acquisitions with the same coil profile and spiral trajectory (176× 176 reconstruction matrix), in presence of a 40 dB SNR Gaussian complex noise. The height of the central peak was also adjusted to correspond to that of the brain data. The reference image was obtained by sinc-interpolation, by extracting the lowest frequencies in the DFT.
In this section, we present the different experiments we conducted. The two main reconstruction performance measures that we considered are
In this first experiment, we compared the convergence properties of the different ISTA-type algorithms, as presented in Section 5.2, with the Haar wavelet transform. The data we considered are those of the MR wrist image. The regularization parameter was adjusted to maximize the reconstruction SER with respect to the ground-truth data. The actual minimizer of the cost functional, which is the common fixed-point of this family of algorithms, was estimated by iterating FWISTA 100 000 times.
The convergence results are shown in Figures 5.2 and 5.3 for the simulation of the MR wrist image. Similar graphs are obtained using the other sets of data.
For a fixed number of iterations, FISTA schemes (FISTA and FWISTA) require roughly 10% additional time compared to ISTA and SISTA. In spite of this fact, their asymptotic superiority appears clearly in both figures. The slope of the decrease of the cost functional in the log-log plot of Figure 5.3 reflects the convergence properties in Propositions 1, 1, and 2. When considering the first iterations, which are of greatest practical interest, the algorithms with optimized parameters (SISTA and FWISTA) perform better than ISTA and FISTA (see Figure 5.3). The times required by each algorithm to reach a 30 dB SER (considered as a threshold value to perceived changes) are 415 s (ISTA), 53 s (SISTA), 12.7 s (FISTA), and 4.4 s (FWISTA). With respect to this criterion, SISTA presents an 8-fold speedup over ISTA, while FWISTA presents a 12-fold speedup over SISTA and nearly a 3-fold speedup over FISTA. It follows that FWISTA is practically close to two orders of magnitude faster than ISTA.
The algorithms presented in Section 5.2 apply for any orthogonal wavelet basis. For the considered application, we want to study the influence of the basis on performance. In this experiment we considered the Battle-Lemarié spline wavelets  with increasing degrees, taking into account the necessary postfilter mentioned in (5).
We compared the best results for several bases. They were obtained with FWISTA after practical convergence and are reported in Table 5.1. Figure 5.4 illustrates the time evolution of the SER using ISTA and FWISTA in the case of the SL reconstruction. Similar graphs are obtained with the other sets of data.
It is known that the Haar wavelet basis efficiently approximates piecewise-constant objects like the SL phantom, which is consistent with our results. On the other hand, splines of higher degree, which have additional vanishing moments, perform better on the textured images (upper part of Table 5.1).
We present in the lower part of Table 5.1 the performances observed when using ISTA with RS. We conclude that, in the case of realistic data, it is crucial to use RS as it improves results by at least 0.7 dB, whatever the wavelet basis is. The remarkable aspect there is that the Haar wavelet transform with RS consistently performs best. Two important things can be seen in Figure 5.4: FWISTA is particularly efficient during the very first iterations, while SISTA with RS yields the best asymptotic results in terms of SER and stability. Our method combines both advantages.
In Table 5.2, we present the results obtained using different depths of the wavelet decompositions. Our reconstruction method is used together with the Haar wavelet transform and RS. The performances are similar but there seems to be an advantage in using several decomposition levels both in terms of SER and reconstruction speed. The FWISTA scheme seems to recoup the cost of the wavelet transform operations associated to an increase in the depth of decomposition.
We report in Table 5.3 the results obtained for different reconstruction experiments using state-of-the-art linear reconstruction, TV regularization, and our method. The images obtained when running the different algorithms after approximately 5 s, and after practical convergence as well, are shown in Figure 5.5. We display in Figure 5.6 the time evolution of the SER for the different experiments. In each case, we emphasize the time required to reach −0.5 dB of the asymptotic value of SER. Finally, we present in Figure 5.7 the reconstructions and in Figure 5.8 the error maps of the different IST-algorithms at different moments of reconstruction. This was done with the wrist simulated experiment using the Haar wavelet basis and RS.
Firstly, we observe that TV and our method achieve similar SER (Table 5.3) and image quality (Figure 5.5). They both clearly outperform linear reconstruction, with a SER improvement from 1.5 to 3 dB, depending on the degree of texture in the original data. Moreover, the pointwise maximal reconstruction error appears to always be smaller with nonlinear reconstructions. Due to the challenging reconstruction task, which significantly undersamples of the k-space, residual artifacts remain in the linear reconstructions and at early stages of the nonlinear ones. Although the k-space trajectory is exactly the same in the three cases, artifacts are less perceived in the in vivo reconstructions, while they stand out for the synthetic experiments.
Secondly, it clearly appears that the linear reconstruction, implemented with CG, leads to the fastest convergence, unfortunately with suboptimal quality. For a reconstruction time one order of magnitude longer, our accelerated method provides better reconstructions. This is illustrated in Figure 5.5 for reconstruction times of the order of five seconds (rows 1, 2, and 4).
Finally, we observe in Figures 5.7 and 5.8 the superiority of the proposed FWISTA over the other IST-algorithms. For the given reconstruction times, it consistently exhibits better image quality as can be seen in both reconstructions and error maps.
Our reconstruction method is applicable to linear MR imaging modalities. In this section, we report results obtained on a SENSE reconstruction problem.
For illustration, let us consider a simple MRI problem with one receiving channel and a full k-space Cartesian sampling. In such a case, the measurements are expressed as m=Ecorig+b. The system matrix is E = FS where F corresponds to a unitary Fourier matrix and S is the diagonal receiving sensitivity matrix.
Applying FISTA with L = 2τ−1||S ||∞2 and τ≈ 1 results in a gradient step of the form ci+τ||s ||∞−2SHS(corig−ci) + τ||s ||∞−2SHFHb. The latter tends to be ineffective for the pixels whose locations correspond to a relatively small sensitivity, this fact being measured by the condition number of S; i.e., the product ||S ||∞||S−1 ||∞. In regard to the convergence result in Proposition 2, we see that the constant term of the upper bound for FISTA is proportional to ||S ||∞2 as it amounts to ||S ||∞2||c0−c⋆ ||22.
FWISTA, by contrast, can take advantage of step sizes (gradient and threshold) that are tailored to the individual pixel sensitivities. In our simple example, we apply FWISTA with Λ=τ−1SHS. This results into the simplified gradient step ci+τ (corig−ci) + τSFHb. For the convergence bound, we achieve a constant that is τ−1||c0−c⋆ ||SHS2 and tends be significantly smaller than the one obtained with FISTA, especially if the range of sensitivity values is large.
That is why we propose, in this section, to adapt the threshold and step-sizes in FWISTA depending both on the spatial localization and the wavelet subband of the coefficients.
We first considered a 2-D brain imaging setup. The data are recorded by four head coils with known sensitivity maps distributed around the sample. Meanwhile, a radial k-space trajectory with 90 lines that supports a 176× 176 reconstruction is imposed. Our simulation is achieved with a 704× 704 rasterized version of the SL phantom. We are also using realistic coil sensitivities computed using Biot-Savart’s law. The 2-levels Haar wavelet basis is chosen to impose the sparsity constraints. The sum-of-squares sensitivity is considered for the weights of FWISTA. We exploit the localization properties of the wavelets to impose the weights on the wavelet coefficients. The wavelet regularization parameter is the same across the wavelet subbands except for the coarse coefficients where the value 0 is imposed. The value λ is optimized for MSE performance. The initial estimate is c0 = 0.
We compare the convergence speed of ISTA, FISTA, and FWISTA in the reconstruction task described above. The reference minimizer was determined by running FISTA for 100 000 iterations. Results are shown in Figures 5.9 and 5.10.
We observe that FWISTA yields nearly 3-fold acceleration over FISTA in terms of cost functional. The asymptotic rates of FISTA and FWISTA both on the cost functional value and the distance to the minimizer are similar and the speedup of FWISTA seems primarily due to a better constant, which is consistent with our theoretical prediction.
The data were acquired with the same scanner setup as in Section 5.3.2. This time, the data from the 8 receiving channels were used for reconstruction, as well as an estimation of the sensitivity maps. An in vivo gradient echo EPI sequence of a brain was performed with T2* contrast. The data was acquired with the following parameters: excitation slice thickness of 4 mm, TE=35 ms, TR=900 ms, flip angle of 80∘, and trajectory composed of 13 interleaves, supporting a 200× 200 reconstruction matrix with pixel resolution 1.18 mm× 1.18 mm. The oversampling ratio along the readout direction was 1.62.
The reference image was obtained using the complete set of data and performing an unregularized CG-SENSE reconstruction. The reconstruction involved 3 of the 13 interleaves, representing a significant undersampling ratio R=4.33.
The images obtained using regularized linear reconstruction (CG), TV (IRLS), and our method are presented in Figure 5.11. In Figure 5.12, the SER evolution with respect to time is shown for the three methods. The times to reach −0.5 dB of the asymptotic SER value are 5.9 s (CG), 49.1 s (IRLS), and 22.8 s (our method).
With this high undersampling, the errors maps show that reconstructions suffer from noise propagation mostly in the center of the image. It appears that TV and our method improve qualitatively and quantitatively image quality over linear reconstruction (see Figure 5.11). As it was observed in Section 5.3.2 with the spiral MRI reconstructions, this in vivo SENSE experiment confirms that our method is competitive with TV. In terms of reconstruction duration, our method proves to converge in a time that is of the same order of magnitude as CG (see Figure 5.12).
We proposed an accelerated algorithm for nonlinear wavelet-regularized reconstruction that is based on two complementary acceleration strategies: use of adaptive subband thresholds plus multistep update rule. We provided theoretical evidence that this algorithm leads to faster convergence than when using the accelerating techniques independently. In the context of MRI, the proposed strategy can accelerate the reference algorithm up to two orders of magnitude. Moreover, we demonstrated that, by using the Haar wavelet transform with random shifting, we are able to boost the performance of wavelet methods to make them competitive with TV regularization. Using different simulations and in vivo data, we compared the practical performance of our reconstruction method with other linear and nonlinear ones.
The proposed method is proved to be competitive with TV regularization in terms of image quality. It typically converges within five seconds for the single channel problems considered. This brings nonlinear reconstruction forward to an order of magnitude of the time required by the state-of-the-art linear reconstructions, while providing much better quality.