key: cord-0652669-uwvpxxh4 authors: Karakus, Oktay; Mayo, Perla; Achim, Alin title: Convergence Guarantees for Non-Convex Optimisation with Cauchy-Based Penalties date: 2020-03-10 journal: nan DOI: nan sha: ec58f5f567afa68ce11074d12b1dc96fb46e9ea8 doc_id: 652669 cord_uid: uwvpxxh4 In this paper, we propose a convex proximal splitting methodology with a non-convex penalty function based on the heavy-tailed Cauchy distribution. We first suggest a closed-form expression for calculating the proximal operator of the Cauchy prior, which then makes it applicable in generic proximal splitting algorithms. We further derive the required condition for minimisation problems with the Cauchy based penalty function that guarantees convergence to the global minimum even though it is non-convex. Setting the system parameters by satisfying the proposed condition keeps the overall cost function convex and it can be minimised via the forward-backward (FB) algorithm. The proposed method based on Cauchy regularisation is evaluated by solving two generic signal processing examples, i.e. 1D signal denoising in the frequency domain and two image reconstruction tasks including de-blurring and denoising. We experimentally verify the proposed convexity conditions for various cases, and show the effectiveness of the proposed Cauchy based non-convex penalty function over state-of-the-art penalty functions such as L1 and total variation (TV) norms. T HE problem of estimating unknown physical properties directly from observations (e.g. measurements, data) arises in almost all signal/image processing applications. Problems of this kind are referred to as inverse problems, since having the observations and the forward-model between the observations and the sources is generally not enough to obtain solutions to these problems directly, due to their ill-posed nature. Indeed, unlike the forward-model which is well-posed every time (cf. Hadamard [1] ), inverse problems are generally ill-posed [2] . Therefore, dealing with the prior knowledge about the object of interest plays a crucial role in reaching a stable/unique solution. This leads to regularisation based methods, which received great attention hitherto in the literature [3] - [10] . In most of these examples, the common choice of regularisation functions is based on the L 1 norm, due to its convexity and capability to induce sparsity effectively. Another important example of a convex regularisation function is the total variation (T V ) norm. It constitutes the state-of-the-art in denoising applications, due to its efficiency in smoothing. Despite their common usage, the L 1 norm penalty tends to underestimate high-amplitude/intensity values, whilst T V tends to over-smooth the data and may lead to loss of details. Non-convex penalty functions can generally lead to better and more accurate estimations [11] - [13] when compared to L 1 , T V , or some other convex penalty functions. Notwithstanding this, due to the non-convexity of the penalty functions, the overall cost function becomes non-convex, which implies a multitude of sub-optimal local minima. Convexity preserving non-convex penalty functions are thus essential, the idea having been successfully applied by Blake, Zimmerman [14] , and Nikolova [15] , and further developed in [6] , [11] , [16] - [20] . Specifically, a convex denoising scheme is proposed with tight frame regularisation in [16] , whilst [17] proposes the use of parameterised non-convex regularisers to effectively induce sparsity of the gradient magnitudes. In [6] , the Moreau envelope is used for TV denoising in order to preserve the convexity of a TV denoising cost function. The non-convex generalised minimax concave (GMC) penalty function is proposed in [11] for convex optimisation problems. Another important reason behind the appeal of the aforementioned penalty functions in applications, is the existence of closed-form expressions for their proximal operators. Specifically, the proximal operator of a regularisation function has been introduced in conjunction with inverse problems, to help solving various signal processing tasks. Proximal operators are powerful and flexible tools with attractive properties, which enable solutions to nondifferentiable optimisation problems, and make them suitable for iterative minimisation algorithms [21] , such as forward-backward (FB), or the alternating direction method of multipliers (ADMM). Remarkably, many widespread regularisation functions have corresponding proximal operators available in closed form, or at least numerical methods to calculate them exist. For example, the soft thresholding function is the proximal operator for the L 1 norm, whereas the proximal operator is the generalised soft thresholding (GST) [8] for L p norm penalty function. It is efficiently computed by using Chambolle's method [22] for T V norm, whilst the GMC penalty only necessitates the use of soft-thresholding, or firm-thresholding in the case of diagonal forward operator A as shown in [11] . The quest for finding the most appropriate penalty function, eventually in relation to an explicit prior distribution characterising the data statistics, is far from being over. In this work, we consider the Cauchy distribution, a special member of the α-stable distribution family, which is known for its ability to model heavy-tailed data in various signal processing applications. As a prior in image processing applications, it behaves as a sparsity-enforcing one, similar to L 1 and L p norms [9] . It has already been used in denoising applications by modelling sub-band coefficients in transform domains [23] - [27] . Moreover, the Cauchy distribution was also used as a noise model in image processing applications, by employing it for the data fidelity term in combination with quadratic [28] and TV norm [29] based penalty terms. The general approach involves the use of a variational Bayesian methodology to solve Cauchy regularised inverse problems due to its lack of a closed-form proximal operator. This prevents the Cauchy prior from being used in proximal splitting algorithms such as the FB, and ADMM. Moreover, having a proximal operator would also make the Cauchy based regularisation function applicable in advanced Bayesian signal/image processing methods, such as in uncertainty quantification (UQ) via proximal Markov Chain Monte Carlo (p-MCMC) algorithms [30] , [31] . In this paper, we propose a convex proximal splitting methodology for solving inverse problems of the form where y ∈ R M denotes the observation (can be either an image or some other kind of signals), x ∈ R N is the unknown signal, which can also be referred to as target data (either an enhanced data or the raw data), A ∈ R M ×N is the forward model operator and n ∈ R M represents the additive noise. Specifically, we propose a number of original contributions, which include: 1) the use of a non-convex penalty function based on the Cauchy distribution, in order to capture the heavy-tailed and/or sparse characteristics of the target, x. 2) deriving a closed form expression for the Cauchy proximal operator inspired by [32] , which makes Cauchy regularisation applicable in proximal splitting algorithms. 3) deriving the condition that guarantees convergence of the Cauchy proximal operator to the global minimum. Even though the proposed Cauchy based penalty function is non-convex, satisfying the proposed condition keeps the overall problem strictly convex either (i) through the use of proximal splitting algorithms, or (ii) through convexity of the cost function itself when the forward operator A satisfies the assumptions of orthogonality or of being an over-complete tight frame. 4) investigating the performance of the proposed Cauchy-based penalty function in comparison to L 1 and T V norm penalty functions in two examples of 1D signal denoising and 2D image restoration including de-blurring and denoising. Furthermore, we study the effect of following/violating the proposed convexity conditions for the same examples. The rest of the paper is organised as follows: Section II presents the proposed Cauchy proximal operator. Convergence analysis of the proposed method is given in Section III along with the corresponding Cauchy proximal splitting method. In Section IV, the experimental validation on the proposed conditions, and an analysis on 1D and 2D inverse problems are presented. We conclude our study and describe future work directions in Section V. Recalling the generic signal model in (1), a stable solution to this ill-posed inverse problem is obtained through an optimisation of the following form: where F : R N → R is the cost function to be minimised, Ψ : R N → R is a function which represents the data fidelity term and ψ : R N → R is the regularisation function (the penalty term). Under the assumption of an independent and identically distributed (iid) Gaussian noise, the data fidelity term can be expressed as where σ refers to the standard deviation of the noise level. Based on a prior probability density function (pdf) p(x), the problem of estimating x from the noisy observation y by using the signal model in (1) turns into the following minimisation problem in a variational framework where we define the penalty function ψ(x) as the negative logarithm of the prior knowledge − log p(x). The selection of ψ(x) (or equivalently p(x)) plays a crucial role in estimating x in order to overcome the ill-posedness of the problem and to obtain a stable/unique solution. In the literature, depending on the application, the penalty term ψ(x) has various forms, such as L 1 , L 2 , T V or L p norms, to name but a few possible choices. In this study, we propose the use of a penalty function which is based on the Cauchy distribution. This is a special member of the α-stable family of distributions, which is known to be heavy-tailed and promote sparsity in various applications. Contrary to the general α-stable family, it has a closed-form probability density function, which is defined by [32] p where γ is the dispersion (scale) parameter, which controls the spread of the distribution. By replacing p(x) in (4) with the Cauchy prior given in (5), we obtain the following optimisation problem Using proximal splitting methods has numerous advantages when compared to classical methods. In particular, they (i) work under general conditions, e.g. for functions which are non-smooth and extended real-valued, (ii) generally have simple forms, so they are easy to derive and implement, (iii) can be used in large scale problems. In addition, most of the proximal splitting algorithms are generalisations of the classical approaches such as the projected gradient algorithm [33] . In order to solve the minimisation problem in (6) through efficient proximal algorithms such as forward-backward (FB) or the alternating direction of multipliers method (ADMM), the proximal operator of the Cauchy regularisation function should be defined. Proximal operators have been extensively used in solving inverse problems, whereby they can generally be computed efficiently using various algorithms for a given regularisation function, e.g. the soft thresholding function for L 1 norm, or Chambolle's method for the T V norm [22] . Besides, prox µ h has similar properties to the gradient mapping operators, which point in the direction of the minimum of h. Thus, for any function h(·) and µ > 0, the proximal operator, prox µ h : R → R is defined as [21] , [33] For a Cauchy based penalty function, we recall that the function h(·) is given by which implies the Cauchy proximal operator is The solution to this minimisation problem can be obtained by taking the first derivative of (9) in terms of u and setting it to zero. Hence we have Wan et al. [32] proposed a Bayesian maximum a-posteriori (MAP) solution to the problem of denoising a Cauchy signal in Gaussian noise, and referred to this solution as "Cauchy shrinkage". Similarly, the minimisation problem in (9) can be solved with the same approach as in [32] , using however a different parameterisation. Hence, following [32] , the solution to the cubic function given in (10) can be obtained through Cardano's method, which is given in Algorithm 1. In order to analyse the convergence properties of the proposed method, we start from the minimisation given in (6) . Since we have a quadratic data fidelity term and a non-convex penalty function, the overall cost function in (6) will be non-convex. To benefit from convex optimisation principles in solving (6), we seek to ensure that the cost function in (6) is convex by controlling the general system parameters e.g. σ and γ. For this purpose, we start with the following lemma. Proof. In order to prove that the function h(x) is twice continuously differentiable, we need to show: Thus, the function h is obviously twice continuously differentiable. The function h is convex if h ′′ (x) ≥ 0, ∀x. However, we recall that the second derivative is h ′′ (x) = 2γ 2 −2x 2 x 4 +2γ 2 x 2 +γ 4 , which satisfies h ′′ (x) ≥ 0 only for −γ ≤ x ≤ γ and thus, h is not convex. Since γ generally takes relatively small values when compared to x, it is not practical to enforce this condition for convexity. Therefore, we assume that the function h is non-convex almost everywhere on the support of x. It offers a graphical confirmation of the proof to Lemma 1. Specifically, red and magenta dots in Figure 2 show limit values for the first and second derivatives, respectively. Besides, the horizontal dashed-line shows derivative value equals to zero, where the second derivative takes negative values outside of the interval −γ ≤ x ≤ γ, which demonstrates the non-convexity of the function h. We now state the following theorem that establishes the condition to preserve the convexity of the cost function in (6). Let h be the twice continuously differentiable and non-convex penalty function in (8) with γ > 0, and the forward operator A either orthogonal satisfying A T A = I, or an overcomplete tight frame satisfying A T A ≈ rI with r > 0 where I is the identity matrix. Then, the cost function F : is strictly convex if Proof. According to Lemma 1, the function F is twice continuously differentiable, and we further express the Hessian of F as This must be positive definite in order for the cost function F to be convex: Recalling that A T A ≈ rI, then we have To complete the square on the left-hand side, we add and subtract σ 4 r and 4σ 2 γ 2 . Then, we have It can be easily seen that the term is always positive as well as the noise standard deviation σ. Thus, for the inequality in (22) to hold, the simplified condition of should be satisfied. This leads to the condition required to ensure (strict) convexity of the function F : and the existance of a unique solution for the given cost function. Theorem 1 provides the critical value for the scale parameter of the non-convex Cauchy-based penalty that ensures the whole cost function remains convex. As noted, this condition depends of the value of the noise standard deviation σ and the parameter r, which follows from the assumption that A T A has a diagonal form. In the following we make another remark. A is the Fourier or orthogonal wavelet transform, convergence is guaranteed according to Theorem 1. However, in cases where forward models do not satisfy the relation A T A ≈ rI, or estimating r is challenging, the condition given in Theorem 1 will not be suitable to ensure convergence. For more general situations,which include the assumptions in Theorem 1 and beyond, we propose another solution which guarantees convergence provided that the solution is obtained via a proximal splitting algorithm even though A T A = rI. We start by another lemma, which states a condition to ensure that the Cauchy proximal operator cost function is convex, and converges to a global minimum even though it corresponds to a non-convex penalty function. with γ > 0, µ > 0, is strictly convex if the following condition is obeyed: Proof. We first express the second derivation of J as Then, akin to the proof of Theorem 1, we continue with the convexity condition To complete the square on the left-hand side, we add and subtract µ 2 and 4γ 2 µ, which gives Since the term (u 2 − (µ − γ 2 )) 2 is always positive as well as the step size µ, for the inequality in (34) to hold, the condition should be satisfied. Hence, the cost function in the Cauchy proximal operator J becomes strictly convex if In Figure 3 , we demonstrate the effect of the relationship between µ and γ on J(u) and its second derivative J ′′ (u). Both sub-figures in Figure 3 obviously show that violating the expression for convexity given in Lemma 2, makes the cost function non-convex. Remark 3. Instead of providing a condition to ensure that the Cauchy based penalty function remains convex, Lemma 2 provides a condition which preserves the convexity of the Cauchy proximal operator. Please note that a solution to the proximal operator prox µ Cauchy can always be computed since it has an explicit expression which is given in Algorithm 1. However, the convexity condition given in Lemma 2 leads on to the theorem in the following section, which provides the required condition to guarantee the convergence for the cost function in (6) , when relaxing the assumption of orthogonality and over-completeness of the forward operator A in Theorem 1. There are several proximal splitting algorithms that can be used to solve the optimisation problem in (4), including the forward-backward splitting, Douglas-Rachford (DR) splitting, or alternating direction method of multipliers (ADMM) [21] to name but a few. In this paper, we focus on the forward-backward algorithm to obtain efficient solutions to the inverse problem in (1) . Indeed, an optimisation problem of the form arg min can be solved via the FB algorithm. Provided f 2 : R N → R is L-Lipchitz differentiable with Lipchitz constant L and f 1 : R N → R, then (37) is solved iteratively as [21] x where step size µ is set within the interval 0, 2 L . In this paper, the function f 2 is the data fidelity term and takes the form of y−Ax 2 2 2σ 2 from (6) whilst the function f 1 is the Cauchy based penalty function h. Following these preliminaries, we can now state the following: (6) is Proof. At each iteration n, in order to obtain the iterative estimate x (n+1) , by comparing to (7) and (39), we solve where the function G : Guaranteeing a convex minimisation problem at each FB iteration will make the whole process convex. As a result, the iterative procedure in (39) converges to the global minimum of G. Thus, for the cost function G to be convex, the condition ▽ 2 G(u) 0 should be satisfied. Calculating the Hessian of G, we have It is straightforward to show that the required condition to satisfy (43) can be obtained in the same way as in (29) . Hence, the rest of the proof follows that of Lemma 2. Consequently, despite having a non-convex penalty function, the FB sub-problem corresponding to the cost function G is strictly convex and converges to the global minimum, with the condition Remark 4. Note that satisfying the convexity condition for the Cauchy proximal operator via Lemma 2 guarantees the convexity of the general solution via the iterative algorithm (39). For this, either the step size µ can be set based on a γ value estimated directly from the observations, or alternatively, γ can be set in cases when the Lipchitz constant L is computed and/or estimating γ is ill-posed. Remark 5. Since the data fidelity function f 2 is convex and L-Lipchitz differentiable, using ADMM or DR algorithms instead of FB in solving the minimisation problem in (37) for the non-convex Cauchy based penalty function whilst satisfying condition (44), will not change anything and therefore, their solutions converge to the global minimum. Thus, the FB based approach considered in this paper can be replaced with other splitting algorithms. Remark 6. The non-convex Cauchy penalty function proposed in this paper guarantees convergence to a minimum by satisfying either (i) A T A ≈ rI (including r = 1) along with the condition in Theorem 1, or (ii) just the condition from Theorem 2 via a proximal splitting method such as the FB algorithm. The FB-based convex proximal splitting algorithm for the Cauchy-based penalty function is given in Algorithm 2. Algorithm 2 Cauchy-based forward-backward algorithm 1: Input: SAR data, y and M axIter 2: Input: µ ∈ 0, 2 L and γ ≥ 6: x (i+1) ← PROXCAUCHY(u (i) , γ, µ) via Algorithm 1 7: i + + 8: while We focus the experimental part of this paper on two separate applications. First, we evaluate the proposed approach on 1D signal denoising in the frequency domain. Secondly, we investigate it when applied to two classical image processing tasks, i.e. denoising and de-blurring. The first example demonstrates the use of the non-convex Cauchy based penalty function in 1D signal denoising application. In particular, we consider the classical sinusoidal signal "Heavy Sine" containing 128 samples and included in Matlab distributions. This signal was analysed in additive white Gaussian noise (AWGN) of several levels, with signal-to-noise-ratio (SNR) values between 2 and 12 decibels (dB). We synthesised the signal y ∈ R M via an over-sampled discrete inverse Fourier transform operator F −1 as y = F −1 x + n, where x ∈ C N and the number of points in the frequency domain was N = 512 > M = 128. The operator F is a normalised tight frame with F H F = I. We compared the performance of the Cauchy based penalty function with L 1 and T V norm penalty functions. The root-mean square error (RMSE) was used as evaluation metric in this case. The first experiment is depicted in Figure 4 , which shows the effect of the scale parameter γ on denoising results both when violating and when satisfying the conditions proposed for convexity. Specifically, the vertical red and black dotted-lines show the scale parameter value for γ = σ/2 √ r from Theorem 1 and γ = √ µ/2 from Theorem 2, respectively. A range of values for γ between 10 −2 and 10 2 was set, and denoised signals were obtained for each γ values by using the Algorithm 2. The error term ε was set to 10 −3 whilst the maximum number of iterations M axIter was set to 500. We follow [21] for the selection of the step size µ and then use Theorems 1 and 2 to decide the minimum value for γ that preserves convexity. From the definition [21] , the data fidelity term y − F −1 x 2 2 is convex and differentiable with a L-Lipschitz continuous gradient, where L is the Lipschitz constant. Thus, we can select the step size µ within the range 0, 2 L . There is no strict rule in choosing the µ values, but the literature suggests that choosing µ close to 2 L is more efficient. Hence, for this example, we decided to set µ = 3 2L . On examining Figure 4 , it is clear that the lowest RMSE value is achieved for a γ value higher than the critical values shown with red and black doted-lines. It can also be seen that γ values 2-3 times higher than both critical values give relatively good results when compared to those with γ values which are 20 times higher. In order to further compare the performance of the proposed Cauchy denoiser, we calculated RMSE values for initial SNR values between 2 and 12 dBs. For each noise level, simulations were repeated 100 times and corresponding average RMSE values for each penalty function and SNR values are presented in Figure 5 . It can be seen that the lowest RMSE values are obtained when employing the Cauchy based penalty function for all SNR values. TV denoising performance gets closer to that of the proposed penalty function when increasing the noise level. For visual assessment, Fig. 6 shows denoising results corresponding to L 1 , T V and Cauchy based penalty functions for an SNR of 7 dBs. For all the penalty functions tested the denoising effect can be clearly seen but the proposed penalty function leads to the lowest RMSE. In the second set of experiments, we investigated the influence of the proposed Cauchy-based regularisation on the classical 2D image reconstruction tasks of denoising and de-blurring. Specifically, we start by discussing the effects of the scale parameter γ on the reconstruction results depending on whether the conditions in Theorems 1 and 2 are violated or satisfied. For the image de-blurring example, the forward operator A was selected as a 5×5 Gaussian point spread function (PSF) with standard deviation of 1. The noise is AWGN with blurred-signal to noise ratio (BSNR = 10 log 10 {var(Ax)/σ 2 }) of 40 dBs. For the denoising example, the forward operator A is the identity matrix I, the additive noise corresponds to an SNR of 20 decibels. We used the standard cameraman image for benchmarking for both examples. The analysis was performed in terms of the peak signal to noise ratio (PSNR) and RMSE. A range of values for γ between 10 −4 and 10 4 was set, and the reconstructed images were obtained for each γ values by using Algorithm 2. The error term ε was set to 10 −3 whilst the maximum number of iterations M axIter was set to 250. The step size µ was set to 3 2L for this example. Figure 7 shows the effect of γ values on reconstruction results. The left y-axes in both sub-figures show RMSE values whilst the right y-axes represent the PSNR values for different vaues of γ on the x-axes. As can clearly be seen from both sub-figures, reconstruction results are poor when the conditions in both Theorem 1 and 2 (left sides of the vertical dotted-lines) are violated. However, starting from either conditions and higher values of γ, we obtained better reconstruction results with an important reconstruction gain around 16dBs for denoising and 2 dBs for deblurring in terms of PSNR. This proves experimentally the correctness of the convexity conditions derived in Theorems 1 and 2. Unlike in the the 1D case, for image reconstruction, we observe a similar performance for higher values of γ .We conclude that there is no strict rule for choosing the optimum value of γ but we noticed that the best performance is generally achieved within a specific interval and hence we recommend using γ ∈ √ µ, 20 √ µ . Please also note that we do not compare the two conditions proposed in Theorems 1 and 2. They are not antagonistic, but rather conditions that together provide solutions in various situations. Their usage depends on the problem at hand (cf. Remark 6), and both guarantee the convergence in specific circumstances. It can be seen that the Cauchy based penalty function has a poor denoising performance when γ = in Figure 7 (e). T V , L 1 and Cauchy-based results are visibly similar, but the Cauchy penalty determines the highest PSNR value. In this paper, we investigated a non-convex penalty function based on the Cauchy distribution. We proposed a FB proximal splitting methodology that employs the Cauchy proximal operator. Furthermore, we derived a closed form expression for the Cauchy proximal operator. In order to guarantee the convexity of the overall cost function in spite of the non-convexity of the penalty term, we derived a condition relating the Cauchy scale parameter γ and the step size parameter µ of the FB algorithm. Moreover, in special cases where the forward operator is orthogonal (A T A = I), or an overcomplete tight frame (A T A = rI) with r > 0, we derived another condition for convexity that is independent on the proximal splitting algorithm employed. In order to demonstrate the effectiveness of the proposed penalty function, we tested its performance in generic denoising and de-convolution examples in comparison to the L 1 and T V norm penalty functions. The Cauchy based penalty achieved better reconstruction results compared to both. We further showed the effect of violating the proposed convexity condition in both examples. We concluded that the best parameter set always lays in the correct side of the derived critical value (i.e. γ ≥ √ µ 2 ). Our current work is focussed on applications of the proposed penalty function in solving SAR imaging inverse problems and will be reported in a future communication. In addition, the existence of a closed-form expression for the Cauchy proximal operator makes is suitable for advanced Bayesian inferences, such as uncertainty quantification, e.g. via p-MCMC methods, which is another of our current endeavours. Lectures on Cauchy's problem in linear partial differential equations. Courier Corporation Inverse problems arising in different synthetic aperture radar imaging systems and a general bayesian approach for them Line detection in images through regularized Hough transform Sparsity-driven synthetic aperture radar imaging: Reconstruction, autofocusing, moving targets, and compressed sensing Sparse signal approximation via nonseparable regularization Total variation denoising via the moreau envelope Ship wake detection in sar images via sparse regularization A generalized iterated shrinkage algorithm for non-convex sparse coding Bayesian approach with prior models which enforce sparsity in signal and image processing Line detection as an inverse problem: application to lung ultrasound imaging Sparse regularization via convex analysis Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares The convergence guarantees of a non-convex approach for sparse recovery Visual reconstruction Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction Convex denoising using non-convex tight frame regularization Convex image denoising via non-convex regularization with parameter selection A class of nonconvex penalties preserving overall convexity in optimizationbased mean filtering Non-convex total variation regularization for convex denoising of signals Image fusion via sparse regularization with non-convex penalties Proximal splitting methods in signal processing," in Fixed-point algorithms for inverse problems in science and engineering An algorithm for total variation minimization and applications Image denoising using bivariate α-stable distributions in the complex wavelet domain Spatially adaptive wavelet-based method using the cauchy prior for denoising the sar images Wavelet-based SAR image despeckling using Cauchy pdf modeling Dual-tree complex wavelet transform based sar despeckling using interscale dependence Directionlet-based denoising of sar images using a cauchy model Variational approach for restoring blurred images with Cauchy noise Cauchy noise removal by nonconvex admm with convergence guarantees Uncertainty quantification for radio interferometric imaging-I. Proximal MCMC methods Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau Segmentation of noisy colour images using cauchy distribution in the complex wavelet domain Proximal algorithms