key: cord-0335346-dg57t5j7
authors: Cascarano, Pasquale; Sebastiani, Andrea; Comes, Maria Colomba; Franchini, Giorgia; Porta, Federica
title: Combining Weighted Total Variation and Deep Image Prior for natural and medical image restoration via ADMM
date: 2020-09-23
journal: nan
DOI: 10.1109/iccsa54496.2021.00016
sha: 83772b8c151534eda46816fe2c64347691d47fa2
doc_id: 335346
cord_uid: dg57t5j7

In the last decades, unsupervised deep learning based methods have caught researchers attention, since in many real applications, such as medical imaging, collecting a great amount of training examples is not always feasible. Moreover, the construction of a good training set is time consuming and hard because the selected data have to be enough representative for the task. In this paper, we focus on the Deep Image Prior (DIP) framework and we propose to combine it with a space-variant Total Variation regularizer with an automatic estimation of the local regularization parameters. Differently from other existing approaches, we solve the arising minimization problem via the flexible Alternating Direction Method of Multipliers (ADMM). Furthermore, we provide a specific implementation also for the standard isotropic Total Variation. The promising performances of the proposed approach, in terms of PSNR and SSIM values, are addressed through several experiments on simulated as well as real natural and medical corrupted images.

The task of image restoration aims to recover a well-looking image, that is clean and sharp, from a blurred and noisy observation. Mathematically, for a given blurred and noisy image g ∈ R n , the problem can be re-written as an inverse problem of the following form:

where H ∈ R n×n is a known operator which models the blur, η ∈ R n is a realization of the random white Gaussian noise affecting g. Problems of the form (1) are well-known to be illposed problems [1] . Therefore, it is impossible to invert the operator H for finding u from (1) due to the lack of stability and/or uniqueness properties.

In the field of image restoration, different approaches have been proposed in order to provide an estimate u * of the desired solution. The most famous and promising methods can be mainly divided in two categories: regularized reconstructionbased and learning-based methods.

The regularized reconstruction-based approaches convert the problem into an optimization problem which reads as:

where the first and the second term are referred to as fidelity and regularization terms, respectively. The fidelity term models the noise affecting g and, upon zero-mean Gaussian noise assumptions, it is usually defined as an L 2 -norm functional.

The regularization term encodes prior information on the solution, such as its sparsity or regularity [2] . A popular choice for R is a Total Variation (TV) based functional [3] which, in a discrete setting, is defined as follows:

where (Du) i := ((D h u) i , (D v u) i ) for i = 1 . . . n, is the discrete gradient of u computed at pixel i and D h , D v are the first order finite difference discrete operators along the horizontal and vertical axes, respectively. The positive scalar parameter µ balances the strength of the regularization. Its choice is crucial to obtain good quality restorations and, in the literature, many methods have been proposed for its selection [4] . Very recently, in order to extend the TV regularizer and to provide a local regularization adapted to the underlying image patterns, a space-variant total variation has been proposed [5] . The idea is to weight at any pixel the amount of regularisation by considering the following regularizer:

which is more flexible than the standard TV in (3) .

Recently, the learning-based approaches have become popular, in the image restoration field, due to their outstanding performances [6] . In particular, the supervised learning-based arXiv:2009.11380v2 [eess.IV] 24 Mar 2021 methods [7] make use of Deep Neural Network (DNN) architectures to learn the correlation between the degraded images and their cleaned counterparts from a set of example pairs. In mathematical terms, they attempt to solve the following minimization problem:

where f θ is a fixed DNN architecture with weights θ, L is a fixed loss function and {(G, U )} is a training set of degradedcleaned example pairs (with G and U we mean the set of degraded images in input and the cleaned target, respectively).

Once (5) is solved by means of standard stochastic optimization algorithms, e.g., Adam [8] or Stochastic Gradient Descent (SGD) [9] , for a given degraded image g, an approximation u * of the desired solution u is obtained as u * = f θ * (g).

The success of this supervised framework is strictly related to the fixed training set of examples. However, in some real applications, such as medical imaging, this is also the main weakness of supervised techniques, since it is practically impossible to collect ground truth data [10] . For this reason, unsupervised deep learning based methods have caught researchers attention. One of the most famous unsupervised approach is called Deep Image Prior (DIP), introduced by Ulyanov et al. in [11] , which is the combination of the following minimization problem:

and a regularization by early-stopping procedure. In particular, f θ is a fixed Convolutional Neural Network (CNN) generator whose weights are θ, and z is a random input vector usually sampled from a uniform distribution. The CNN generator is initialized with a set of random weights that are iteratively optimized by means of standard gradient-based algorithms solving (6) . Then, an approximation u * of the target solution u is computed as f θ * (z), where θ * is an early-stopped solution obtained by applying the early-stopping procedure to the involved iterative optimization scheme solving (6) . In his pioneering work, Ulyanov has empirically shown how the architecture of a Deep CNN is able to represent more easily natural images than random noise, without the need of a fixed set of training examples. So far, the researchers have mostly worked both on a theoretical analysis of DIP [12; 13; 14] and on an improvement of its performances. A widely used approach is to adjust the objective in (6) in order to constrain the solution to satisfy a given prior. In particular, in [15] , the standard DIP objective is combined with regularization by denoising. In [16] , the authors add an explicit anisotropic TV term to (6) . Furthermore, the idea of combining DIP with TV-based terms has shown good performances when dealing with X-ray images [17] and computed tomography reconstructions [18] .

In this work, we propose to improve the performances of DIP by adding explicit priors to (6) . Differently from [16] , we consider both the standard isotropic TV and the weighted TV (WTV) defined in (3) and (4), respectively. In particular, we exploit an automatic estimation for all the local regularization parameters when dealing with WTV, thus allowing the implemented algorithm to be less dependent from a good choice of these hyperparameters. More specifically, the regularization parameter µ i are iteratively updated accordin to the rule proposed in [19] for the Tikhonov regularization. We remark that the WTV functional is more flexible than the standard isotropic TV, since its definition is tailored to better recover the local image patterns.

Furthermore, we make use of the ADMM framework instead of a standard sub-gradient method as in [16] . The ADMM framework is more flexible and due to its modular structure allows us to embed any prior (explicit or implicit) information by simply modifying the regularizer-related substep. In the following, we will refer to our proposals as ADMM DIP-TV and ADMM DIP-WTV, to denote the the ADMM implementation when the standard isotropic TV and WTV are used as regularizers, respectively.

The paper is organized as follows: in the second section we introduce our ADMM DIP-TV and ADMM DIP-WTV methods and we show how the resulting ADMM substeps can be efficiently solved. In the third section we present some numerical experiments on synthetic as well as real noisy natural and medical images and we compare the results with the standard DIP [11] .

The main goal of the proposed ADMM DIP-TV and ADMM DIP-WTV methods is to boost the performances of the DIP framework adding the isotropic TV regularizer in (3) and the space-variant WTV in (4), respectively. We now describe in details the ADMM DIP-WTV method. Concerning the ADMM DIP-TV method we point out that it is immediate to observe that WTV in (4) reads as the standard TV in (3) assuming all µ i = µ, ∀ i = 1 . . . n. Therefore, it is worth saying that ADMM DIP-TV can be seen as a particular case of ADMM DIP-WTV.

Let us consider the following minimization problem:

arg min

and its constrained counterpart, which reads as:

According to the standard DIP framework, we need an iterative process solving (7) . Therefore, we attempt to solve the minimization problem (8) by means of the ADMM algorithm, which has been recently deeply investigated and applied in a non-convex image restoration framework [20; 21; 22] . The augmented Lagrangian function with respect to the problem (8) reads:

where β t is a positive scalar, called penalty parameter, λ t is the Lagrangian parameter associated with the constraint Df θ (z) = t. According to the ADMM framework, we seek its saddle point by minimizing with respect to the primal variable θ and t, alternatively, and by maximizing with respect to the dual variable λ t . Upon suitable initialization of the variables involved, the k-th iteration of the ADMM iterative algorithm reads as follows:

The first problem (10) is solved inexactly by applying a prefixed number of iterations of a gradient-based method. In particular we use the Adam iterative scheme [8] . The numerical gradient is computed by means of automatic differentiation provided by Pytorch with respect to the variable θ [23] . We observe that this optimization problem is very similar to the one solved in the classical DIP framework (6) . In this particular case, we force Df θ k+1 (z) to be close to t k − λ k t βt . From a numerical point of view, this squared L 2 -norm term provides a stabilizing and robustifying effect to the DIP minimization.

The second problem (11) is separable and can be easily solved in a closed form by applying the 2D L 2 -norm proximity operator to the n components of Df θ k+1 (z) + λ k t βt . In our implementation we chose to vary the regularization parameters µ i along the iterations. In particular, their formulation is inspired by [19] and reads:

This entails that the smaller is the gradient magnitude the greater is the regularization provided at pixel i. Therefore, we regularize more on constant patches and less on patches with complex texture. Finally, we remark that concerning the ADMM DIP-TV implementation, we set µ k i = µ in problem (11), where µ is a fixed positive scalar.

In this section, we present the results of some numerical experiments aimed at evaluating the effectiveness of the proposed ADMM DIP-TV and ADMM DIP-WTV methods on image denoising problems, that is, on problems of the form (1) where the operator H is fixed as the identity operator. For all the experiments, we have used the CNN architecture represented in Figure 1 which is an adaptation of the U-net architecture proposed in [11] .

We consider a dataset of four images composed by two natural RGB images, one synthetic grayscale image and one real medical chest CT image of a patient affected by COVID-19 [24] . Both the ground truths related to the first three images and the acquired CT image are shown in Figure 2 .

The starting degraded images corresponding to the natural and the synthetic data are created by applying the image formation model (1) to the related ground truths. The codes and the images used for these numerical experiments are available online 1 .

As a first test, we take into account the denoising task of the geometric grayscale tomo image. This image has some low contrast and high contrast patches with big and small objects. We corrupt the ground truth by adding a white Gaussian noise with standard deviation equals to 30. The starting noisy image is reported in the first panel of Figure 3 with two different close-ups highlighting small and big details. Since the tomo image is an example of a piecewise constant image, it is well known that a good reconstruction can be obtained by employing, as a regularization term, the TV functional. For this reason, this test problem fits well as a first benchmark to highlight the benefits of applying a TV-based constraint to the DIP algorithm. In the second and third panels of Figure 3 we report the reconstructed images provided by DIP and ADMM DIP-TV. For the ADMM DIP-TV method we perform 5 Adam iterations at each stage of the iterative procedure to update θ k+1 . The image obtained by applying the DIP approach shows some issues in reconstructing the low contrast patches in the image: the edges are not sharp and look out of focus, and the small details are not perfectly retrieved. Moreover, the two close-ups reveal the presence of artifacts over the edges and the noise seems to not be perfectly removed. The addition of TV to the standard DIP framework seems to solve all the aforementioned issues. In the reconstruction provided by the ADMM DIP-TV scheme, the low contrast small circles are perfectly retrieved while the higher contrast details show sharp edges.

In order to analyse in details the results on this synthetic image over low and high contrast details, we represent in Butterfly ground truth Barbara ground truth

Tomo ground truth CT acquired data blue. It is evident that the standard DIP is insufficient to get rid of the noise since both the low and high contrast line profiles look swinging. The low contrast line profile obtained by DIP misses the small low contrast peak, which corresponds to the small low contrast circle in the solution by DIP reported in Figure 3 . As expected, this simple test shows how a TV-based method is more efficient in well recovering piecewise constant images with respect to the standard DIP. However, we remark that the good ADMM DIP-TV reconstruction have been obtained by properly tuning the regularization parameter µ in (3). Not suitable choices of µ can lead to bad reconstructed images. This well known strong dependency of the results from the regularization parameter can be overcome by using the WTV regularizer, since its weights are adaptively estimated during the iterative process. For the other test problems considered in this section, we compare the standard DIP method with both the ADMM DIP-TV and ADMM DIP-WTV algorithms. We firstly report the results on the butterfly test problem, whose ground truth has been corrupted by Gaussian noise with standard deviation equals to 20. The simulated acquisition is depicted in the first panel of Figure 5 . Moreover, the other three panels of Figure 5 show the best restored images, in terms of PSNR, provided by DIP, ADMM DIP-TV and It is quite evident how both the reconstructions obtained by employing ADMM DIP-TV and ADMM DIP-WTV are sharper and more faithful to the original image than that achieved by DIP. However, in the two close-ups it is possible to appreciate the effect of the WTV in better removing the noise presence and efficiently recovering the image discontinuites. As reported in the caption of Figure 5 , by adding the TV-based regularizers we observe that the PSNR metric increases. In particular ADMM DIP-WTV outperforms ADMM DIP-TV in terms of both PSNR and SSIM. As mentioned before, the reconstructed images, offered in Figure 5 , are related to the best PSNR values achieved by the different methods during the iterative process. Of course, such values are not known when we consider imaging problems with real data. For this reason we analyze the PSNR trend for the three approaches under consideration. Particularly, we investigate how the addition of the handcrafted TV acts on the typical semiconvergence behaviour of the DIP framework [11] . In Figure 6 we report the PSNR values along the outer iterations. We observe that DIP performances rapidly degrades while both ADMM DIP-TV and ADMM DIP-WTV show a more stable behaviour. In particular, the PSNR curve corresponding to ADMM DIP-WTV not only does not semiconverge but also does not present significant oscillations. The ADMM DIP-WTV algorithm seems to be the most robust one in terms of good restored images. To summarize, in real applications, since the best PSNR is not available and, hence, it is not possible to know when the iterative process should be stopped, it is preferable to use the TV-based ADMM approaches. However, even if both ADMM DIP-TV and ADMM DIP-WTV manage to prevent the semiconvergence effect, we remark again that for the ADMM DIP-TV method a pre-processing step is required to select a good regularization parameter while the ADMM DIP-WTV algorithm is not influenced by the choice of the initial value of such parameter.

Concerning the Barbara test problem, the corrupted image (reported in the first panel of Figure 7 ) has been generated by adding to the original one Gaussian noise with standard deviation equals to 30. ADMM DIP-TV and ADMM DIP-WTV solve problem (10) with 50 Adam iterations. Figure 7 show the reconstructed images obtained by the three approaches at the best PSNR achieved. Similar considerations to those made for the results of the butterfly test problem also hold in this case. The benefits in employing ADMM DIP-TV and ADMM DIP-WTV are clear. Moreover, the two close-usp of Figure 7 show that ADMM DIP-WTV also outperforms ADMM DIP-TV in recovering the details of the dress texture and the edges of the bookcase on the background. The different PSNR behaviour provided by the three methods observed in Figure 6 for the butterfly test problem can be also observed for the Barbara test problem.

As last problem, we consider a real chest CT medical image representing a section of two lungs of a patient affected by COVID-19 [24] . The first column of Figure 8 reports the acquired data together with the close-ups of two details (inflammation zones) in the lungs backside where are visible the effects of the interstitial pneumonia caused by COVID-19 disease. From these panels the standard artifacts related to the discrete angles sampling typical of the CT application are clearly visible. In the second column of Figure 8 we show the best denoised images provided by the original DIP approach. In order to obtain the best recovered image, we printed the reconstructed images at each DIP iteration and we stopped the algorithm when we visually found a good one, before DIP started to re-introduce the noisy artifacts in the reconstructions. A total of 300 iterations have been carried out for the DIP method to achieve the images in Figure 8 . The third and the fourth column of Figure 8 report the best denoised images attained by ADMM DIP-TV and ADMM DIP-WTV, respectively. The best images have been obtained as explained before for the DIP case, after 10 iterations. However, we remark that, for each of the outer iterations, 500 Adam iterations have been performed to inexactly solve the sub-problem for θ. We also observe that by increasing the outer iterations, both ADMM DIP-TV and ADMM DIP-WTV do not recover the CT artifacts as strongly as the DIP method does, but the recovered images are quite stable in terms of details detection and noise reduction. This aspect allows easier stopping techniques for both the ADMM approaches than those for the DIP one. From Figure 8 , it is possible to conclude that the best recovered images have been provided by the ADMM DIP-WTV method: the edges are sharper and all finer structures are sufficiently well reconstructed. The yellow arrows in the close-ups highlight the inflammation details, alveoli and bronchioles. It is evident that the restoration provided by ADMM DIP-WTV is more reliable than the ones provided by DIP and ADMM DIP-TV.

In this paper we have presented a new algorithm which extends the classical DIP framework by adding either the standard isotropic TV or the space-variant TV. The latter admits an automatic strategy for the estimation of the local regularization parameters, thus resulting more flexible with respect to the TV functional and providing more reliable restorations. The arising optimization problems have been solved in a flexible ADMM framework. The usage of the ADMM splitting ensures stability to the algorithm as the noise increases and allows to add different priors to the standard DIP framework. Numerical experiments have shown that our approach reaches better performances than the standard DIP for the denoising of both synthetic and real natural and medical images.

Introduction to inverse problems in imaging

Regularization in image restoration and reconstruction

Nonlinear total variation based noise removal algorithms

Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation

Adaptive parameter selection for weighted-TV image reconstruction problems

Convolutional neural networks for inverse problems in imaging: A review

ADMM DIP-WTV (fourth column) for the real CT problem. The first column reports the acquired data. The blue squares highlight different close-up of various patches

Deep learning

ADAM: A method for stochastic optimization

Stochastic gradient descent tricks

Preparing medical imaging data for machine learning

Deep Image Prior

Solving inverse problems using datadriven models

Regularization by architecture: A deep prior approach for inverse problems

A bayesian perspective on the deep image prior

Deepred: Deep image prior powered by red

Image restoration using Total Variation regularized Deep Image Prior

Compressed sensing with deep image prior and learned regularization

Computed tomography reconstruction using Deep Image Prior and learned reconstruction methods

Uniform Penalty inversion of twodimensional NMR relaxation data

Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems

Global convergence of ADMM in nonconvex nonsmooth optimization

On the inverse Potts functional for single-image super-resolution problems

Autograd: Effortless gradients in numpy

Automatic distinction between COVID-19 and common pneumonia using multi-scale convolutional neural network on chest CT scans

ACKNOWLEDGMENT This work has been partially supported by GNCS-INDAM grant 2020 Ottimizzazione per l'apprendimento automatico e apprendimento automatico per l'ottimizzazione and POR-FSE 2014-2020 funds of Emilia-Romagna region (Deliberazione di Giunta Regionale n. 255-30/03/2020).