key: cord-0045677-mikuptli authors: Malyshev, Alexander title: Depth Map Estimation with Consistent Normals from Stereo Images date: 2020-05-25 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50426-7_41 sha: 2e68931c0652e984229570669926d83cae48beff doc_id: 45677 cord_uid: mikuptli The total variation regularization of non-convex data terms in continuous variational models can be convexified by the so called functional lifting, which may be considered as a continuous counterpart of Ishikawa’s method for multi-label discrete variational problems. We solve the resulting convex continuous variational problem by the augmented Lagrangian method. Application of this method to the dense depth map estimation allows us to obtain a consistent normal field to the depth surface as a byproduct. We illustrate the method with numerical examples of the depth map estimation for rectified stereo image pairs. Estimation of the depth map for a three-dimensional (3D) scene is a major task of computer vision [17, 18] . For example, depth maps can be generated from several two-dimensional (2D) color or grayscale images of a 3D scene taken by one or more cameras positioned at different space locations. More specifically, given a set of 2D images, which consists of rectified stereo pairs of images taken by a stereo pair of cameras, we have to estimate depth maps for each stereo pair of images and merge them into one 3D point cloud. The point cloud is used afterwards to reconstruct a 2D surface of the scene. Efficient surface reconstruction methods [4, 8, 10, 11] , which produce 2D surfaces from 3D point clouds, most often use approximate normal vectors to the reconstructed surfaces. Good surface reconstructions are obtained by means of consistent normal maps [2, 7, 9] , which respect edges and other feature points on the reconstructed surface, i.e. the normal vectors are smooth within the smooth parts of the surface but discontinuous across the edges and corners on the surface. We estimate the depth map in the form of the global minimum of suitable functionals, which are non-convex in general. The global variational approaches to depth estimation can be more attractive than the faster local procedures because they are more robust to various image corruptions and provide wider regularization conditions. However, finding global minima of non-convex functionals may be very difficult or even not feasible in practice. Our estimation of the depth map uses the continuous variational model arg min where the first term is called the total variation (TV) regularization introduced in [16] . The value |u x (x)| = u 2 x1 + . . . + u 2 xn denotes the Euclidean length of the gradient vector u x = [u x1 , . . . , u xn ]. The function ρ(t, x) is supposed to be not convex in variable t. Since the point-wise global minima of ρ(u(x), x) are very noisy functions, the total variation times a well-chosen parameter α > 0 enforces necessary smoothness in the solution u(x) of (1). The authors of [13] have proposed an elegant convexification procedure called the functional lifting, which reduces the non-convex model (1) to a convex variational model but with an extra dimension. Further developments and extensions are found in [12, 14] . A similar method for discrete variational models has been earlier proposed by H. Ishikawa [5, 6] . The paper [13] also contains a detailed comparison of the continuous functional lifting with Ishikawa's method. The present paper contributes to the estimation of the depth maps by extending the results from [13] . The numerical method of [13] is replaced by a faster method, which is a variant of the augmented Lagrangian method (ALM); cf. [19] . We point out that the convexification from [13] is not directly suitable for ALM and must be refined as in Theorem 1. A consistent normal field to the depth map surface is obtained as a byproduct of ALM; see formula (20). Similar to [13] , the theoretical arguments below are not fully rigorous but rather informal. For instance, the expression of the total variation, formally valid for functions from the Sobolev space H 1 , is applied to functions, which are not necessarily in H 1 , etc. The variational model (1) defines an unknown real-valued function u : Note that the regularization parameter α > 0 is introduced for convenience only. The function ρ(t, x) can be non-convex in variable t, which creates serious difficulties when developing reliable numerical methods for solving (1) . Fortunately, the non-convex variational model (1) can be reformulated into a convex form by adding an extra dimension, say t, to the available dimensions x 1 , x 2 , . . . , x n . Such convexification uses special binary functions called the indicator functions of superlevel sets. Namely, for a given function u(x), the indicator function of superlevel sets φ : [a, b] × Ω → {0, 1} is defined as The function φ(x, y) is binary and monotonically non-increasing with respect to the variable t. Owing to monotonicity, the original function u(x) is reconstructed from φ(t, x) via the formula Theorem 1 (Functional uplifting). If u(x) is a solution of the minimization problem (1) , then the indicator function of superlevel sets φ(t, x), constructed in (2) , is a solution to the following minimization problem: The converse is also true: if φ is the indicator function of superlevel sets associated with u and solves the minimization problem (4)- (5) , then u solves (1) . Proof. Following [13] , the co-area formula from [3] gives is a solution of (4)- (5) for all threshold values θ ∈ (0, 1). Proof. The co-area formula allow us to derive that the energy functional satisfies the following identity for all φ ∈ Φ: Hence the measure equals zero. An argument similar to those in [1, 14] can be used to include θ belonging to the exceptional set of zero measure. Let us introduce a dual function p(t, x) such that p = (p 0 , p 1 ) = ∇φ, where ∇φ = (φ t , φ x ) is the full gradient of φ(t, x) and |p 1 | = p 1 2 is the Euclidean norm of p 1 . Then the problem (6)-(7) is equivalent to the following constrained convex minimization problem The convex variational problem (9)-(10) can be solved by means of the augmented Lagrangian method (cf. [19] ) with the augmented Lagrangian where p = (p 0 , p 1 ) and λ = (λ 0 , λ 1 ). The inner product λ, p − ∇φ is Euclidean. The constant c > 0 is sufficiently large but is not required to tend to infinity. Each iteration of the ALM method consists of alternative minimizations with respect to φ and p and special updates of λ. 1. Set k = 0 and initialize p 0 = 0 and λ 0 = 0. 2. Find the solution φ k+1 of the minimization problem 3. Find the dual variable p k+1 by solving the minimization problem 4. Update multiplier λ in accordance with the augmented Lagrangian method 5. Set k = k + 1 and go to step 2. Minimization with Respect to φ. Step 2 of the ALM algorithm concerns the optimization problem where φ(t, x) satisfies the boundary conditions φ(a, x) = 1, φ(b, x) = 0. Standard arguments from the variational calculus yield the Poisson equation with the Dirichlet and Neumann boundary conditions Minimization with Respect to p. Step 3 of the ALM algorithm requires solution to the pointwise minimization problem arg min p0,p1 : p0≤0 = arg min Let us denote Solution to min p0≤0 − 2ρ c p 0 + |p 0 − q 0 | 2 is p 0 = min(q 0 + ρ/c, 0). A simple geometric argument reveals that the minimum point of the function 2α c p 1 2 + p 1 − q 1 2 2 must have the form p 1 = rq 1 with 0 ≤ r ≤ 1. For r ∈ (0, 1) the necessary condition for extremal points is satisfied when Hence r = 1 − α/(c q 1 2 ) if q 1 2 − α/c > 0. Otherwise, r = 0. As a result, solution to (15) is given by the pointwise formulas By Theorem 2, an approximation to u(x) can be obtained in the form The normal field to the surface t − u(x) = 0 in the space with coordinates (t, x) is the set of the gradient vectors n(t, x) = [1, −u x1 , . . . , −u xn ] T . We use formula (19) to express the derivatives of u(x) in terms of φ 1/2 (t, x) Since the derivatives φ xi (t, x) in the AL method are approximated by (p 1 ) i , the normal field can be approximated as (20) Formula (20) is one of the main contributions of the paper. The meaning of the variable x is changed in Sect. 4. Instead of the points x = (x 1 , ..., x n ) ∈ R n , we deal with the pixels (x, y) ∈ R 2 . Our numerical experiments deal with estimation of the dense depth map; see, e.g., [17, 18] for more detail about depth estimation. Suppose that two functions L(x, y) and R(x, y) represent a pair of left and right 2-dimensional grayscale images with the horizontal coordinates 0 ≤ x ≤ L x and vertical coordinates 0 ≤ y ≤ L y . We assume that the images L and R are rectified so that one can define the disparity map t = t(x, y) as the parallel translation along the horizontal direction such that a point (x, y) in the left image L matches the point (x − t(x, y), y) in the right image R. The disparity map is converted into the depth map via the simple formula depth = f ocus · baseline/disparity, where baseline is the distance between the two camera positions, from which the images L and R have been taken. We assume that both cameras are identical, have focal length f ocus and equally oriented in 3D space. In order to estimate the disparity map for grayscale images, we use the simplest dissimilarity function The range of the variable t in (21) for fixed variables x and y is the interval The function ρ(t, x, y) equals zero outside this range. More sophisticated dissimilarity functions can be found in [17, 18] . The dissimilarity function for RGB images can be constructed as the sum of absolute differences of intensities for all three color channels The Poisson Eq. (12) with the Dirichlet boundary conditions (13) and the Neumann boundary conditions (14) can be solved by several numerical methods. When the size N of the discretized function φ(t, x, y) is not very large, the Poisson equation can be solved by means of the fast discrete sine and cosine transforms. The arithmetic complexity of the fast Poisson solver is O (N log N ) . For very large N , the Poisson equation can be efficiently solved by the multigrid method, which has the linear arithmetic complexity O(N ). In order to demonstrate behaviour of the proposed method, we use a synthetic dataset, where all functions are constant along the axis y. This dataset is convenient for visualization because it is enough to plot only sections of such functions for a fixed y. These sections are the horizontal lines of the images displaying the functions. The depth function d(x, y) = 10 + sin(mod(x, π)) of the synthetic dataset is defined for |x| ≤ 5π and defines the surface z = d(x, y). Two identical cameras with focus equal to 1 are located at the points (x, y) = (−2, 0) and (x, y) = (2, 0). The coordinate axes x, y and z of both cameras are oriented in parallel to the axes x, y and z of the surface z = d(x, y) . The left and right digital images L(x, y) and R(x, y) have size 128 × 10. The horizontal lines of L and R for a fixed y are shown in Fig. 1 . Note that the most left 15 pixels of L and the most right 15 pixels of R on each horizontal line are outside the scene. The horizontal lines of the true disparity are shown on the right side of Fig. 2 . The augmented Lagrangian method has been implemented with the fast Poisson solver, which is based on the fast Fourier transform. The following parameters We have run 100 iterations of the augmented Lagrangian method. However, a sufficiently good convergence is achieved much earlier, say after 30 iterations. Figure 3 displays the computed disparity versus the true disparity. The computed disparity is plotted using the blue solid line, and the true disparity using the red dash-dot line. We recall that the displayed disparity is defined in the coordinate system of the left image L. Note also that the most left 15 pixels lie outside the scene, i.e. the parts of both lines for x = 1, 2, . . . , 15 are fictitious. This section of the paper aims to justify numerically that the augmented Lagrangian method produces consistent normals to the surfaces. The surface of the synthetic example is the disparity map d(x, y) , which is constant along y. The computed solution u(x, y) turns out to be also constant along y. The computed function (p 1 ) 2 (x, y) is zero everywhere. We recall that the normal field n(x, y) is computed by the formulas Apart from introducing the functional lifting for convexification, the authors of [13] propose an efficient numerical method for solving the resulting convex which is equivalently transformed into the primal-dual formulation where Such a transformation is obtained by using the vectors [p 1 (t, x, y), p 2 (t, x, y)] coinciding with the normalized vectors [φ x , φ y ]/ φ 2 x + φ 2 y . Apparently, this representation does not lead to formula (20) for a consistent normal field. The numerical algorithm for solving (24), proposed in [13] , is called a primaldual proximal point (PDPP) method and iteratively minimizes the functional in (24) with respect to the primal variable φ and then maximizes the same functional with respect to the dual variable p. Each iteration consists of the following two steps: The operator P D denotes the projection onto the set D, which can be computed by a simple truncation of φ k+1 to the interval [0, 1] and setting φ(a, x, y) = 1 and φ(b, x, y) = 0. The operator P C denotes the projection onto the set C, which can be computed via the formulas p k+1 The parameters τ p and τ d must guarantee stability of the method. The authors of [13] propose the choice τ p = τ d = 1 √ 3. For the synthetic dataset, sufficient convergence is observed after 1000 iterations (Fig. 5) . The comparison of the normal fields in Fig. 6 demonstrates that both the augmented Lagrangian method and the PDPP method produce normal fields of similar visual quality. Let us use the dataset Tsukuba from http://vision.middlebury.edu/stereo in order to demonstrate the working capacity of the AL method for calculation of a consistent normal field. The rectified stereo image pair Tsukuba of the smallest size 384 × 288 consists of two RGB images. The image from the left camera is shown as the left image in Fig. 7 . The corresponding image of the true disparity is also available at the Middlebury repository [15] ; see the right image in Fig. 7 . The disparity map has 256 gray levels but it can be scaled to the interval [0, 15] of integer numbers without any loss. The scaled image has only the following gray values (or labels): Figure 8 shows the computed disparity map and consistent normals for the Tsukuba rectified stereo pair. We have developed a variant of the augmented Lagrangian method for a convex relaxation of the continuous variational problem with the total variation regu-larization. The most time consuming part of this method is solving a boundary value problem for the 3D Poisson equation. We solve it by the fast Poisson solver. The augmented Lagrangian method has a significantly faster convergence than the primal-dual proximity point method from [13] . An additional benefit of the augmented Lagrangian method consists in producing a consistent normal vector field to the solution surface as a byproduct. Numerical differentiation of the solution computed by the method from [13] seems to produce a normal vector field of similar quality. However, thorough verification of these properties requires further investigation. An unconstrained multiphase thresholding approach for image segmentation Point cloud segmentation via constrained nonlinear least squares surface normal estimates An integral formula for total gradient variation Surface reconstruction from unorganized points Exact optimization for Markov random fields with convex priors Graph cuts-combinatorial optimization in vision Parallel globally consistent normal orientation of raw unorganized point clouds Poisson surface reconstruction Consistent propagation of normal orientations in point clouds Global optimization for shape fitting Robust and efficient implicit surface reconstruction for point clouds based on convexified image segmentation Sublabel-accurate relaxation of nonconvex energies A convex formulation of continuous multi-label problems Global solutions of variational models with convex regularization Nonlinear total variation based noise removal algorithms A taxonomy and evaluation of dense two-frame stereo correspondence algorithms Computer Vision: Algorithms and Applications Augmented lagrangian method, dual methods, and split bregman iteration for ROF, vectorial TV, and high order models