key: cord-0641960-v7qb4lfw authors: Ding, Peijian title: Accelerated Alternating Minimization for X-ray Tomographic Reconstruction date: 2021-08-02 journal: nan DOI: nan sha: 5ad00d2fc4264bfa5a3f3f10bcfa8d2ec4f1ac06 doc_id: 641960 cord_uid: v7qb4lfw While Computerized Tomography (CT) images can help detect disease such as Covid-19, regular CT machines are large and expensive. Cheaper and more portable machines suffer from errors in geometry acquisition that downgrades CT image quality. The errors in geometry can be represented with parameters in the mathematical model for image reconstruction. To obtain a good image, we formulate a nonlinear least squares problem that simultaneously reconstructs the image and corrects for errors in the geometry parameters. We develop an accelerated alternating minimization scheme to reconstruct the image and geometry parameters. Tomography is a technique of displaying representations of a cross section through an object through the use of some penetrating waves such as X-ray or ultrasound. In simple words, it allows us to see the inside of an object without breaking it. Thus, tomography is widely used in medical imaging, seismic exploration, and material science. In medical imaging, a Computerized Tomography (CT) Scan creates a cross-sectional image of human body by combining X-ray images taken from different angles. During a CT scan, the patient lies on a bed that slowly moves through the gantry while the X-ray tube rotates around the patient and shoots X-ray beams through the human body, received by a detector. See Figure 1 . Then, an image of the cross section of the human body is reconstructed following a mathematical procedure. Although CT images can help doctors diagnose and monitor diseases such as Covid-19, regular CT machines are heavy and expensive and not widely available in less developed areas. The goal of this paper is to compensate for cheaper and more portable machines by solving for geometry parameters such as the source-to-object distance that may not be calibrated precisely during the imaging process. Source-to-object distance measures how far away the center of the object is from the X-ray source. Since the source-to-object distance may vary from angle to angle, the reconstructed image will be corrupted if incorrect values are used, as illustrated in Figure 2 . Similar conclusions can be drawn if incorrect angles are used to reconstruct the image. Thus, our algorithm will significantly improve image quality by taking into account the variation in geometry parameters. For the purpose of this paper, we primarily focus on 2D computed tomography to simplify notations and to reduce computational costs required to perform a substantial number of numerical experiments. But the methods discussed in this paper can be extended to 3D problems. An X-ray imaging problem can be formulated as an inverse problem in linear algebra: where A is a m × n forward operator, b is projection data, also referred to as a sinogram, and x is the solution vector which represents the image. Since the forward operator A is determined by geometry parameters r, we form the following nonlinear least squares problem: (1.1) arg min x,r ||A(r)x − b|| 2 2 where x represents the image vector, and A is the forward operator that is a function of r and maps the true image x to the sinogram b. In this paper, we denote r as a general geometry parameter vector that can represent both source-to-object distance d and errors in angles δθ. Thus, the conclusions we make about r also apply to both d and δθ. In this paper, we show that Equation 1.1 can be solved using an accelerated Block Coordinate Descent method. In addition to incorporating an acceleration scheme, we also exploit separability to further reduce the computational cost. 2. Alternating Minimization Scheme. In order to solve the nonlinear least squares problem, an intuitive approach to consider is the Block Coordinate Descent (BCD) algorithm. In this section, we discuss the linear and nonlinear least squares problem involved in BCD. We also briefly discuss the variable projection approach and argue the advantage of BCD over variable projection for our problem. 2.1. Block Coordinate Descent. Block Coordinate Descent is a simple approach to solve an optimization problem. Its idea is based on the general Coordinate Descent (CD) algorithm. Because of its lack of sophistication, most optimization researchers have not focused on this approach until recently when CD approaches were found to be computationally competitive to other reputable alternatives in various applications such as machine learning [15] . Since we are able to exploit separability in the geometry parameters, the BCD method, which is given in Algorithm 2.1, is worth investigating for tomographic reconstruction. Algorithm 2.1 BCD to Reconstruct Geometry and Image Parameters 1: Input: r 0 ∈ R N A , x 0 ∈ R n 2: for k = 1, 2, . . . until a stopping criterion holds do 3: x k = arg min x ||A(r k )x − b|| 2 2 5: end for N A denotes the number of angles, and n is the length of the image vector. Note that in practice an initial estimate r 0 is given. With this information, we can easily obtain x 0 by solving the linear least squares problem, We remark that for ill-posed problems, computing x 0 and Step 4 of Algorithm 2.1 typically requires incorporating regularization procedures to avoid amplifying noise when solving the linear least squares problems. This is discussed further in section 2.2.1. There is another property in our matrix that makes this alternating minimization scheme favorable. The matrix A and vector b can be partitioned into block of rows as . . . where errors in geometry parameters associated with rows in each [A i (r (i) ), b i ] block are assumed constant. For example, if we collect projections at 0, 1, 2, . . . , 359 degrees around the center of the object, and N A = 10, then an error is introduced into the geometry parameters once every 36 degrees. In reality, N A = 360, but if some of the errors are small enough to ignore, using a smaller N A can reduce the computational cost. The choice of N A is discussed further in Section 4.2. The separability of the geometry parameters allows us to solve for a set of much smaller systems at Step 3 in Algorithm 2.1, namely, and then Note that parameters in the vector r k are completely independent, so they can be updated simultaneously on a parallel computing architecture, which could dramatically lower the computing time. In fact, if we only consider source-to-object distance as geometry parameters, the dimension of the parameter r (i) is one. Comparing to this alternating minimization scheme, Variable Projection [9] [10] cannot utilize this matrix property because its matrix multiplication process would take away the separability property. Thus, it makes this alternating minimization scheme more worthwhile to investigate. Next, we discuss linear least squares solvers and nonlinear least squares solvers respectively. In this subsection, we consider the linear least squares problem in Step 4 of Algorithm 2.1, min x ||Ax − b|| 2 2 , where we assume matrix A is fixed, e.g. A = A(r k ). Typically in inverse problems, the data we obtain is not the exact data. This is also the case in our X-ray tomography problem, even in the case when geometry parameters, and hence matrix A, are known exactly. Ideally, we want to solve Ax true = b true but the linear system that we actually solve is: where η is the noise in our measurement data, b true is the noise-free data, A ∈ R m×n , and b ∈ R m . For simplicity, we assume A is full rank and m = n. Our discussion also holds for more general cases when we compute a pseudo-inverse instead of an exact inverse. By using the SVD of A = U ΣV T where σ i is the i th singular value and u i , v i are the i th column of the left and right orthogonal matrix U and V , we first recall that Av i = σ i u i , and we note that we can find scalars α i and η i such that Using these relations, we obtain Then, notice that Thus, From this result we observe that the noise in the data is magnified by the small singular values. Since σ 1 σ 2 ... σ n 0, larger indices correspond to smaller singular values. The computed solution is dominated by noise amplified by division of small singular values. Thus, we need regularization schemes to filter out this noise. In particular, a regularized solution can be written as: where the scalar 0 φ i 1 is called a filter factor. As σ i decreases, the filter factors should approach zero so that the noise contributed by the small singular values are filtered out. Truncated Singular Value Decomposition. Since the noise is magnified by small singular values, the most intuitive approach is to cut off the small singular values by setting them to zero. This is called Truncated SVD regularization. The TSVD solution to the inverse problem is given by where k n. The critical part in the TSVD is identifying the threshold k. One approach is choosing k at a significant drop-off of singular values, as illustrated by In this case, k = 9 can be easily identified as the threshold for the TSVD approach. However, typically in inverse problems, singular values decay smoothly, as illustrated by Figure 4 . In cases like Figure 4 , a reliable cut-off threshold for singular values is hard to find. Note that k = 250 is not a good choice for the cut-off because the singular values are very close to zero for smaller indices k (e.g. σ 200 ≈ 10 −8 ). Therefore, we need more advanced regularization techniques to deal with smoothly decaying singular values. Tikhonov Regularization. Classical Tikhonov Regularization can be used to solve ill-posed problems. The regularized solution x reg is the unique solution to the following: The singular values of the 256 × 256 test problem heat generated from Regularization Tools in MATLAB where λ is the regularization parameter that controls the smoothness of the regularized solution. The above equation is equivalent to the following: Then the normal equations for this least squares problem can be written as: From the normal equations, we obtain the following: where the filter factor is φ i = σ 2 i σ 2 i +λ 2 . For simplicity, we assumed A has full rank. The pseudo-inverse can be used for the rank deficient case and Equation 2.10 would sum from i = 1 to rank(A) instead of the full column rank n. The modified Tikhonov regularization is not too much different from the classical Tikhonov except that it uses the 2-norm of Lx instead of x in Equation 2.7, where L is a p × n matrix with p n. In contrast to TSVD where singular values after σ k are cut off, the Tikhonov regularization applies a smoother filter to all the singular values. Given a good regularization parameter, the large singular values would not be affected too much whereas the small values would be gradually filtered more as they approach zero. The quality of the filtering depends on how we choose the regularization parameter. If λ = 0, then all φ i = 1 and we are directly calculating the inverse (or pseudo-inverse) solution. If we select a very large λ σ 1 , all φ i would approach zero and we would have over-smoothed the solution. Methods. The choice of regularization parameter is critical to the quality of the regularized solution. Parameter choice methods can usually be divided into two classes depending on their assumption about error norm where b is the measured data and b true is the noise free data [6] . The first class contains methods based on a good estimate of ||η|| 2 2 . The second class includes methods that are not based on a good estimate of ||η|| 2 2 , but seek to extract this information from the given right hand side b. In this section, we introduce the Generalized Cross-Validation method (GCV) which is a popular method in the second class. The underlying idea of GCV is that a good regularization parameter should predict missing values. For example, if some data point b i is missing in the right hand side, the regularized solution should predict this missing value well. GCV can be written as: The goal is to find λ such that G(λ) is minimized. Now we simplify the numerator and the denominator of G(λ) respectively: Thus, G(λ) becomes: Then we can write G(λ) in the case of Tikhonov regularization as: We can solve for λ by using the function fminbnd in MATLAB which is based on golden section search and parabolic interpolation. In practical applications, several studies have found that occasionally GCV would drastically undersmooth the solution by choosing the regularization parameter too small [3] [4] [13] . In [1] , GCV has been found to over-smooth the solution in the Lanczos-hybrid methods, discussed in the next subsection. To alleviate this difficulty, it was proposed to use the weighted GCV method: where w is the weight parameter that determines the function G(w, λ) along with λ. When w = 1, we have the non-weighted version of GCV like in Equation 2.12. When w > 1, the solution is smoother. When w < 1, the solution is less smooth. So far in weighted-GCV literature, only experimental approaches have been used to determine the value for w. We have so far discussed regularization and parameter choice methods using the SVD. For large-scale problems, such as in image reconstruction, directly applying SVD based methods is not computationally feasible. In this subsection we describe a hybrid LSQR scheme that combines an efficient iterative method with SVD based approaches that enforce regularization on small projected sub-problems. The standard LSQR algorithm projects the linear least squares problem onto a sequence of Krylov subspace of small and increasing dimensions. The best approximation x k to the least squares problem in the Krylov subspace K k is given by where V k is a n × k matrix with orthonormal columns at the k th step of the Golub-Kahan bidiagonalization process [11] . y k can be solved by where B k is the bidiagonal matrix at the k th step of the Golub-Kahan bidiagonalization process, β is a scalar, and e 1 is the standard basis vector. When being applied to ill-posed problems, LSQR exhibits a semi-convergence behavior which means that early iterations construct information related to the solution while later iterations construct information related to noise [1] . This can be compensated by applying a direct regularization method such as Tikhonov or TSVD, which can be solved cheaply on a small scale problem of the reduced linear least squares in the Krylov subspace. So, we can write the Hybrid LSQR using Tikhonov regularization as: where λ k is a regularization parameter chosen at the k th iteration using the weighted GCV method discussed in Section 2.2.2. A method like GCV can be used to choose a stopping iteration so that k will not be too large; details can be found in [1] . Comparing to LSQR, this hybrid method can effectively stabilize the iterations [1] . Although at each iteration a new regularization parameter must be chosen, it is not computationally expensive for the projected problem. To summarize the method, the hybrid LSQR method projects the large scale linear least squares problem onto a low-dimensional Krylov subspace where we can inexpensively apply a direct regularization method like the adaptive weighted-GCV. 2.3. Nonlinear Least Squares. In our alternating minimization scheme, we iteratively solve the image and the geometry parameters. While we have discussed methods to solve the linear least squares problem in the previous section, we need other tools to solve the nonlinear least squares problem: where x is approximated by the linear least squares solution we obtained by using hybrid LSQR in Section 2.3. We utilize the implicit filtering method which solves the bound-constraint optimization problem for which the derivative information is not available [8] . Since we do not have the derivative information of our objective function and a reasonable bound can be established for the geometry parameters in our tomographic reconstruction problem, implicit filtering serves as a good tool to solve our problem. Implicit filtering builds the local model of the objective function using a quasi-Newton method. In our numerical experiments, we compare implicit filtering to the MATLAB function fminbnd. 3. Acceleration Algorithms. In the previous section, we have introduced the alternating minimization scheme and the methods we use to solve least squares problems. In this section, we introduce methods that will accelerate the convergence of our minimization scheme. 3.1. Accelerated Block Coordinate Descent. Since we can divide variables in our least squares problem into two blocks -geometry parameters r and image x, it makes sense for us to directly investigate methods that accelerate the BCD algorithm. We implemented Accelerated Block Coordinate Descent (ABCD). This method can be applied to a four-block problem by dividing it into two larger blocks, but in our problem we do not have to do so. Theoretically, the proposed acceleration method in [2] has a complexity of O( 1 k 2 ). In our implementation, we simplify the algorithm as Algorithm 3.1. x k = arg min , where w k = (x k , r k ) 5: end for N A denotes the number of angles, and n is the length of the image vector. We remark that we can exploit separability when solving forr k , as discussed in Section 2.1. As mentioned in Algorithm 2.1, since r 0 is given, The Tikhonov regularized least squares problems for x 0 andx k are solved using the hybrid scheme IRhybrid lsqr provided in IRTools [5] in MATLAB. The nonlinear least squares problem forr k is solved using a MATLAB package called imfil [8] . We have found that accelerating the solution vector x alone has yielded stabler results with slightly better accuracy than performing acceleration on both x and r. We show the numerical experiments in section 4. Anderson Acceleration, also called Anderson mixing, is a method used to accelerate the convergence of fixed point iteration. Note that we can write out alternating minimization scheme as a fixed point iteration, where Determine α (k) = (α We can cast the linear constrained optimization problem in Step 7 of Algorithm 3.3 into an unconstrained form which is straightforward to solve and convenient for efficient implementation [14] . We define ∇f i = f i+1 − f i for each i and set ∇F = (∇f k−m k , ..., ∇f k−1 ). Then the least squares problem is equivalent to This unconstrained least squares problem leads to a modified version of Anderson Acceleration in Algorithm 3.4, where Homer Walker proposed implementation that efficiently updates the QR factors in the decomposition F k = Q k R k [14] . The basic logic is the following: every F k is Determine where g(x k ) comes from Algorithm 3.2 7: end for obtained from F k−1 with a column added on the right. If the resulting matrix has more columns than m, then delete one from the left. The column addition can be achieved by a modified Gram-Schmidt process. The deletion process is a little more complicated. We delete the first column on the left when m k−1 = m. If F k−1 = QR, then In this section, we make a few comparisons of different methods to solve the X-ray tomography problem. Firstly, we compare the speed of BCD exploiting the separability of geometry and the speed of BCD without using such property. Secondly, we compare results produced from different number of angles. Thirdly, we make comparisons of all the acceleration schemes. Fourthly, different regularization parameters in the linear least squares solvers are compared. Fifthly, we compare implicit filtering with the MATLAB function fminbnd as the nonlinear least squares solver. Lastly, we show our algorithm works for both geometry parameters d and δθ. For experiments 1-5, we solve problems with only unknown source-to-object distance d, where geometry parameters r = d. For the last experiment, r = (d, δθ). Comparisons in all experiments are made about geometry errors and reconstruction errors. Geometry errors are defined as the relative errors of geometry parameters, ||r−rtrue||2 ||rtrue||2 . For experiment 1-5, where r = d, geometry errors are ||d−dtrue||2 ||dtrue||2 . For experiment 6, where r = (d, δθ) geometry errors are represented by both ||d−dtrue||2 ||dtrue||2 and ||δθ−δθtrue||2 ||δθtrue||2 . The reconstruction errors are defined as the relative errors of the image, ||x−xtrue||2 ||xtrue||2 . We use fan-beam projection for all our tomography problems for the sake of consistency. Note that we can also easily adapt our code to solve parallel beam projection problems by using the IRtools [5] and AIR Tools [7] MATLAB packages. In practice, good initial guesses of geometry parameters r are available and prior knowledge can help us set the bounds for them. For the source-to-object distance d, we generate a test problem where true d values, d true , are random numbers (chosen from a uniform distribution) between 1.5 and 2.5. We use a constant initial guess of d = 2 for all angles and set the bounds for d from 1.5 to 2.5. For the errors in angles δθ, we assume for each set of angles the error bound is from −0.5 to 0.5. The test image we use is Shepp-Logan Phantom [12] , with image size n = 32 × 32. The noise level is ||η||2 ||b||2 = 0.01, where η is a vector with random entries chosen from a normal distribution. Budget is a hyper-parameter in imfil that stands for the maximum number of function evaluations in the nonlinear least squares solver. N A stands for number of angles. Moreover, 0 th iteration is included in the relative errors figures below. This represents the relative error of the initial guesses with regard to the true solution. Let BCD that exploits separability be called BCDS. In this section, we compare the running time of BCD and that of BCDS. In this subsection's numerical experiments, budget = 1000 (used in imfil to put a limit on the number of function evaluations), and N A = 10. As we see in Figure 5 , BCDS dramatically speeds up the convergence because the separability allows us to solve a much smaller problem independently for one r(i) at a time. The average running time of image reconstruction using BCD is 12.8s, around the same as BCDS's 12.6s. The time of geometry reconstruction using BCD is 1857.5s, more then ten times longer than BCDS's 157.1s. This is also the reason BCDS is discussed first in this section. In the remaining experiments, we always use the BCDS to reduce the running time. Moreover, the geometry errors and reconstruction errors of BCDS are both better than that of BCD. In Figure 6 , the phantom reconstructed by BCDS is much closer to the true image than the one from BCD. A typical CT image using fan beam x-ray sources collects data at angles of one degree increment, from 0 to 359 degrees. In a perfect machine, the geometry parameters are precisely known each time the source is rotated to a new angle. In our experimental scenario, we assume the geometry parameters are only known approximately. To experiment with various scenarios, we assume that errors are introduced into the geometry parameters once every 360 N A degrees. That is, with N A = 10, errors h Fig. 7 . Comparison of number of angles: geometry errors (left), reconstruction errors (right) occur once every 36 degrees. Partially, N A depends on the precision of machine calibration. For a good machine, N A may be small. Also, N A may depend on how precise we measure the data. For example, two geometry parameters that are different in terms of double precision may be rounded to the same number in single precision. In this scenario, the difference could be small enough that we can treat the two geometry parameters as being equal. If the number of angles is larger than the true number of angles, we may end up solving a larger problem than we need. For example, if for every 36 degrees only one geometry error is introduced, then N Atrue = 10. If we assume N A = 20, the average of the two geometry errors per 36 degrees would approximate the one true geometry error introduced in that set of angles. If the number of angles is smaller than the true number of angles, we end up solving a low dimension approximation. We do not seek to solve for the image exactly but aim to compute good approximations that yield much better results than not considering geometry parameters at all. In practice, we can consider the number of angles as a hyper-parameter that practitioners can set based on their expertise and knowledge of the machine calibration. In our problems, we assume to know the number of angles. We compare the case where the number of angles is 5, 10, and 20 respectively. We want to explore the differences in relative error of r, relative error of x, the convergence, and the image quality. In this comparison, budget = 10. The reason that we can use a such small budget is we are essentially solving one scalar of r at a time. The recommended formula [8] to set up a good guess for budget is 10 * N 2 , where N is the length of the solution vector. Since we solve for each component of the solution vector separately, the budget is just 10 based on the recommended formula. Adjusting budget according to each particular problem as a hyper-parameter could further improve the quality of the result. Since we try to make apple-to-apple comparisons, we keep budget the same for all three cases. Both geometry and image parameters converge for all three different number of angles, as shown in Figure 7 . Although the errors when N A = 10 are not as low as when N A = 5, the reconstruction errors when N A = 10 still have a 40% reduction comparing to its initial error level. When N A = 20, the geometry error has a huge spike at the first iteration. Although our algorithm has gradually decreased the geometry error after the first iteration, the resulting geometry error is almost the same as the geometry error of the initial guess. The reconstruction error drops from more than 100% at the initial guess to below 85%, but the error climbed back up in the next few iterations. In the end, there was a 10% drop in reconstruction errors comparing to the initial guess. There is one possible explanation for this unusual phenomenon when N A = 20. Let r k and x k represent the value of r and x at k th iteration in BCDS. First, given x 0 , r 1 is calculated from the nonlinear least squares problem in Algorithm 2.1. r 1 is particularly bad because how bad x 0 is. Despite this x 0 , BCDS is still improving the geometry parameter after the first iteration. In this section, we compare BCDS, ABCDS, and BCDS with Anderson acceleration, where ABCDS represents the accelerated block coordinate descent that exploits separability of geometry parameters. The number of columns of the matrix F k , also referred to as memory size in this paper, in Algorithm 3.4 is m = 5. Since the linear least squares problem at Step 5 of Algorithm 3.4 is relatively small, we directly use MATLAB backslash operator to solve it. There are two implementations of ABCDS that we compare. Then, we compare Anderson acceleration, ABCDS, and the BCDS method. Since we can choose to apply the acceleration scheme on the image vector x only or apply it on both geometry parameter r and image vector x, we compare the two versions of ABCDS. We denote the former version of ABCDS as ABCDS-1. We denote the latter version that applies the acceleration on both image and geometry parameters as ABCDS-b. The experiment setup for this subsection is budget = 100 and N A = 10. From Figure 8 , we can tell that although both approaches converge and produce similar results at the end of iterations, the geometry errors of ABCDS-1 are slightly lower and more stable than that of ABCDS-b. Thus, applying the acceleration scheme on the image vector x alone is enough to produce stable and convergent results. For the sake of simplicity, when we mention ABCDS in the rest of the paper, it always refers to ABCDS-1 . Figure 9 , geometry errors and reconstruction errors converge for all three methods. The geometry errors and the reconstruction errors of BCDS converge around 15 th iteration. The both geometry and reconstruction errors of ABCDS converge around 10 th iteration. Anderson acceleration converges slowest because there are some spikes around 15 th iteration for both geometry and reconstruction errors. There are safeguards that can ensure global convergence of the type I Anderson acceleration [16] . However, in our problem, Anderson acceleration does converge but does not accelerate the convergence. This could be an interesting challenge to be explored in the future. For our problem, ABCDS is a better acceleration scheme its convergence and acceleration effects. In this section, we compare BCDS with different regularization parameters: no regularization, GCV, and weighted-GCV, as shown in Figure 10 . The reason we use BCDS without the acceleration methods is that we want to see the direct impact of regularization on the alternating minimization scheme. In this experiment, N A = 10 and budget = 100. When there is no regularization, IRhybrid lsqr is essentially an LSQR algo-rithm, which exhibits semi-convergence behavior. When GCV is used instead of the weighted-GCV, the geometry and image parameter errors also exhibit semiconvergence behavior. The weighted-GCV helps stabilize LSQR's convergence and thus produces the best result. 4.5. Imfil Budget. We explore the effect of evaluation budgets in the nonlinear least squares solver on the geometry and reconstruction errors. We set N A = 10, and use budget = 10, 100, 1000, 10000. Since the budget size may greatly affect the nonlinear least squares solutions, we explore its effects on BCDS without any acceleration. As we can see in Figure 11 , 100, 1000, and 10000 are equally good. Both geometry and reconstruction errors are very small when 100, 1000, 10000 are used. Thus, 100 is the best budget out of the four because any more evaluation beyond 100 does not make the solution better. When budget is 1000 and 10000 respectively, we wasted many evaluations without making any progress. When 10 is used, even though we get convergent results earlier, the small budget causes the algorithm to terminate before it finds a better solution. 4.6. Nonlinear Least Squares Solver. In this section, we compare BCDS with different nonlinear least squares solvers -fminbnd and imfil. Set N A = 10 and budget = 100. In Figure 12 , fminbnd exhibits semi-convergence behavior whereas imfil converges. Under our problem settings, fminbnd reconstructs geometry parameters around 10s faster than imfil on average. However, fminbnd tends to exhibit semiconvergence behavior whereas iterations in imfil are more stable. Since in practical cases it is impossible to know the exact iteration, at which the geometry errors are lowest in the semi-convergence case, imfil is more reliable in solving the nonlinear least squares problem. 4.7. Geometry Parameters: d and δθ. We show that our alternating scheme as well as the accelerated version can also be applied to the case when geometry parameters r = (d, δθ).The nonlinear least squares problem becomes: arg min In all previous experiments we assume there was no error in angles. If the angles contain errors as well, the image quality could also be affected, as illustrated in Figure Fig As shown in Figure 14 , both ABCDS and BCDS converge for all three parameters d, δθ, and x. ABCDS has accelerated the convergence for all three geometry parameters, especially for the reconstruction errors. The only difference between this problem, r = (d, δθ), and the previous problem, r = d, is we solve for two geometry parameters for each nonlinear least squares problem instead of one. In this paper, we propose an accelerated alternating minimization scheme to solve X-ray tomography problems. The linear least squares problem is solved by a weighted hybrid LSQR algorithm with Tikhonov regularization. The nonlinear least squares problem is solved by implicit filtering. We also investigated ABCDS and Anderson mixing to accelerate convergence. We have the following findings: 1. BCDS runs faster than the normal BCD because the separability of parameters allow us to solve each entry of r independently. Also, since solving for each parameter separately makes the nonlinear least squares problem much easier than solving all parameters at once, BCDS is faster at finding a good solution and converge faster than BCD. 2. ABCDS is much less computationally costly than BCDS with Anderson acceleration because Anderson acceleration requires solving a linear least squares problem in each iteration when choosing the weight. Although both methods converge, ABCDS have better acceleration effects than Anderson acceleration. 3. The weighted hybrid LSQR algorithm with Tikhonov regularization stabilizes the convergence more than applying an unweighted GCV to the LSQR algorithm. When no regularization method is used, the LSQR algorithm exhibits semi-convergence behavior. It reconstructs the solution at earlier iterations but noise at later iterations. 4. Choosing an appropriate budget for implicit filtering is important. When the budget is chosen too small, better solutions are not explored. When budget is chosen too big, many function evaluations are wasted without making progress. A suggested number in the imfil documentation is 10 * N 2 , where N is the length of r. We found this formula does not always give the appropriate budget. Since we solve for each geometry parameter using separability, the dimension of each small problem is one. But, we have found 100 to be the best budget for 32 × 32 test problem with N A = 10. 5. imfil is a better nonlinear least squares solver for our algorithm than MAT-LAB's fminbnd because the latter tends to exhibit semi-convergence behaviors. The ABCDS method has shown its success in our tomographic reconstruction problems. We believe this algorithm can be used to effectively solve X-ray tomography problems that have variations in the geometry parameter. A future direction towards more improvement would be adapting, applying, and advancing this algorithm on X-ray images produced in clinical trials. A weighted GCV method for Lanczos hybrid regularization Computing the best approximation over the intersection of a polyhedral set and the doubly nonnegative cone Confidence intervals for nonparametric curve estimates: Toward more uniform pointwise coverage Flexible parsimonious smoothing and additive modeling IR tools: A MATLAB package of iterative regularization methods and large-scale test problems Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion AIR tools -a MATLAB package of algebraic iterative reconstruction methods Implicit filtering Variable projection for nonlinear least squares problems Joint calibration and tomography based on separable least squares approach with constraints on linear and non-linear parameters Numerical Methods for Least Squares Problems The Fourier reconstruction of a head section Estimation of regularization parameters in multiple-image deblurring Anderson acceleration: Algorithms and implementations Coordinate descent algorithms Globally convergent type-i anderson acceleration for nonsmooth fixed-point iterations Acknowledgments. This work was done as part of an undergraduate honors thesis project at Emory University, and was partially supported by the U.S. National Science Foundation under Grant DMS-1819042. The author would like to thank Chang Meng and James Nagy for their guidance and mentorship.