key: cord-178421-tl4qtz2x authors: Jost, Ferdinand; Peter, Pascal; Weickert, Joachim title: Compressing Flow Fields with Edge-aware Homogeneous Diffusion Inpainting date: 2019-06-28 journal: nan DOI: nan sha: doc_id: 178421 cord_uid: tl4qtz2x In spite of the fact that efficient compression methods for dense two-dimensional flow fields would be very useful for modern video codecs, hardly any research has been performed in this area so far. Our paper addresses this problem by proposing the first lossy diffusion-based codec for this purpose. It keeps only a few flow vectors on a coarse grid. Additionally stored edge locations ensure the accurate representation of discontinuities. In the decoding step, the missing information is recovered by homogeneous diffusion inpainting that incorporates the stored edges as reflecting boundary conditions. In spite of the simple nature of this codec, our experiments show that it achieves remarkable quality for compression ratios up to 800 : 1. Motion estimation has many practical applications such as traffic surveillance, object tracking in driver assistance systems and robotics, or prediction in video compression. Especially in video compression many scientific contributions to this problem have been made. This is due to an important trade-off: On one hand the stored motion fields should be as accurate as possible for a good prediction, on the other hand they have to be represented compactly. Therefore, it is crucial for this application to encode motion fields efficiently and accurately. Most modern video codecs like HEVC [1] use block matching algorithms to compute motion vectors for coarse blocks of pixels. The resulting piecewise constant motion fields can be encoded very efficiently, but introduce block artefacts. Alternatively, optical flow methods have been used to compute dense flow fields [2] , [3] . These describe the motion between frames more accurately, but are harder to encode efficiently. Accurate flow fields of natural image sequences are usually piecewise smooth. Compression methods with diffusion-based interpolation work well for this kind of data. In contrast to classical transform-based codecs such as JPEG2000 [4] or BPG (Better Portable Graphics) [5] they exploit sparsity in the spatial domain instead of a transform domain. They only store a few selected pixels and reconstruct missing data by interpolation, also called inpainting. Especially specialized approaches that use additional edge information excel for piecewise smooth images [6] , [7] . We present a framework for the compression of flow fields based on edge-aware homogeneous diffusion inpainting. Our method benefits from the piecewise smooth structure of motion fields by storing additional edge information for the inpainting process. We propose an example method with established, easy to implement components to show that our framework can achieve favourable quality in this setting. Our codec relies on three major components for encoding: edge detection, selection of mask pixels, and quantisation. We use the Marr-Hildreth edge detector [8] together with hysteresis thresholding to detect boundaries between smooth regions of the flow field. This gives us connected edges that we can store efficiently with chain codes. As our framework uses inpainting for the reconstruction, we additionally have to select mask pixels with known flow vectors. We choose a regular grid that allows us to encode the position of mask pixels very efficiently. Additionally, our codec quantises the flow values of stored mask pixels with a simple uniform quantisation. The decoder reconstructs the missing data using homogeneous diffusion inpainting that incorporates edge structures. Compared to classical homogeneous diffusion, our newly proposed edge-aware operator ensures the preservation of discontinuities in the motion field. Compression methods with diffusion-based inpainting were introduced by Galić et al. [9] in 2005. Their method stores only a few selected pixels of an image and reconstructs missing data with edge-enhancing anisotropic diffusion [10] . The R-EED algorithm of Schmaltz et al. [11] improves this idea with an efficient tree structure to adaptively encode mask pixels and can beat the quality of JPEG2000. The concept of diffusion-based inpainting has also been extended to video compression. Peter et al. [12] developed a method based on R-EED that allows decoding in real time. However, this approach compresses each frame individually and does not exploit temporal redundancies. Andris et al. [13] proposed a proof-of-concept video codec that additionally uses optical flow methods for inter frame prediction. The motion fields are compressed with a simple subsampling, resulting again in block artefacts. Ottaviano and Kohli [14] developed a motion estimation algorithm that incorporates coding costs for a wavelet-based compression of the resulting flow field. Our method is related to codecs for the compression of depth maps, as they also have a piecewise smooth structure. The approach of Gautier et al. [15] stores mask pixels on both sides of edges and uses homogeneous diffusion inpainting to reconstruct smooth regions in-between. A similar approach by Mainberger et al. [6] for cartoon-like images also selects mask pixels along edges. Hoffmann et al. [7] extended this idea by explicitly storing segment boundaries with chain codes and selecting mask pixels on a hexagonal grid. In Section II we introduce an edge-aware inpainting operator based on homogeneous diffusion. Using this concept we propose the encoding step of our method in Section III and the corresponding decoding in Section IV. We compare our approach with JPEG2000 and BPG in Section V and present our conclusions in Section VI. The centrepiece of our codec is the edge-aware homogeneous diffusion that is used to reconstruct a flow field with only a small amount of known data. This method relies on additional edge information to preserve discontinuities of the original flow field. Let us consider a flow field f (x) : Ω → R 2 where x := (x, y) denotes the position in a rectangular domain Ω ⊂ R 2 . We assume that flow vectors are only known at mask points K ⊂ Ω. Furthermore, we assume that a set of edges E of f is known. The reconstructed flow field u(x) can then be computed using homogeneous diffusion [16] by solving the Laplace equation with reflecting boundary conditions where n denotes the normal vector to the boundary ∂Ω. The values of known mask points are not altered: As an extension to classical homogeneous diffusion we prevent any diffusion across the edge set E by introducing additional reflecting boundary conditions across the boundaries of E: This problem can be discretised with finite differences where edges are given at between-pixel locations. We solve the resulting linear system of equations with a conjugate gradient solver [17] . The result of this edge-aware homogeneous diffusion inpainting gives us a flow field with discontinuities along edges E and smooth transitions in between. In contrast to the segment-based homogeneous diffusion of Hoffmann et al. [7] , our method allows arbitrary edge structures and does not require a segmentation with closed contours. In this section we describe the encoder of our framework. It gathers and stores all information for the proposed edge-aware homogeneous diffusion inpainting. As a first step our framework collects edge information at between-pixel locations that act as boundaries for the homogeneous diffusion process. Then it selects the pixel mask needed for inpainting and quantises the selected pixel values for a more compact representation. Our approach uses the Marr-Hildreth operator combined with hysteresis thresholding similar to Canny [18] as an edge detector. This method aims to detect zero-crossings of the Laplacian of a Gaussian-smoothed image with a standard deviation of σ. As the Laplacian is a second order derivative operator, zero-crossings indicate extrema in the gradient. We apply an additional hysteresis thresholding on the gradient magnitude of all edges detected by the Marr-Hildreth operator to keep only important structures. Edges where the gradient magnitude exceeds a threshold T 1 become seed points. We then recursively add all candidates that are adjacent to seed pixels and exceed a threshold T 2 < T 1 to the set of edges. For high quality flow fields with sharp discontinuities this method gives well-localised and connected edges. The resulting relevant edges in xand y-direction lie at between-pixel locations on two different grids. This gives us two binary images to encode (Fig. 1 ). An alternative to storing edges as binary images are chain codes as used by Hoffmann et al. [7] . These are very efficient in our setting as there are only three possible directions to follow. We first extract different kinds of T-junctions as starting points for our chains (Fig. 2) . As there can also be isolated edges without such T-junctions, we add two other types of starting elements to cover remaining edges (types 5 and 6). For each starting element we have to store a reference point and a type. We then obtain the chain code by following the contours from starting elements until there is no more edge to encode. It is a sequence of four different symbols: three symbols for each possible direction and a terminating symbol indicating the end of an edge. Our edge-aware homogeneous inpainting algorithm also requires a set of mask points to be able to reconstruct a flow field. To reduce the coding cost of our method we choose mask pixels on a regular grid instead of arbitrary positions. Mask positions are uniquely determined by the image dimensions and a density parameter d. Adaptive approaches like the rectangular subdivision of Schmaltz et al. [11] usually give better results for diffusion processes. They accumulate mask pixels around edges to compensate the inability of homogeneous diffusion to preserve edges. Our codec, however, already stores edge information explicitly. This way adaptive schemes do not provide a large benefit over a regular mask. In this case the small improvement in quality that these methods provide cannot compensate for the overhead of storing a more complex mask. As edges can form isolated segments that do not contain a point of the regular grid, we additionally store the average flow value of such segments. This is equivalent to storing a single mask pixel at an arbitrary position in this segment. We further reduce the amount of stored data by quantising the values of mask pixels. Flow vectors are usually given as two 32-bit floating-point numbers. This representation is not very convenient for compression purposes, as optical flow fields do not use the entire range of possible values. We want to represent the range of actually occurring flow values as integers q ∈ {0, ..., k − 1}. One of the simplest methods to achieve this is uniform quantisation. This allows us to quantise each channel of the flow field individually. Let min be the minimal and max the maximal value occurring in one channel of the flow field, and let x ∈ [min, max] be a flow value. We also define the length of a quantisation interval as a := max−min k−1 . The quantised value q can then be computed as In order to reconstruct a flow value x q we can simply compute x q = min+a·q. Note that the first and last interval only have a width of a 2 . This is necessary to ensure that the values of min and max are preserved. As we have to store the minimal and maximal value of each channel explicitly, this ensures that the range of values does not shrink after multiple quantisations. After the quantisation step we have all data necessary to represent a flow field. Our file structure consists of a header and a data part: The header contains the image dimensions, the density d needed to reconstruct the inpainting mask, the quantisation parameter k, and the minimum and maximum flow values for each channel. We also have to store the size of all starting elements and the chain code to be able to split the different types of data. The data part of our encoded file contains the edge information in form of starting elements and chain codes, and the quantised flow values. As we have a high spacial correlation for this type of image, we store flow values channel wise. The entire file is then entropy encoded using lpaq2 [19] . The decoding step of our codec is a straightforward process. We first read the header information and split the data into T-junctions, chain codes, and flow values. The header information then allows us to reconstruct the regular inpainting mask. Next we follow the chain codes starting at T-junctions in each direction until we encounter a terminating symbol to reconstruct edges. Finally, we restore flow vectors from the stored quantised values and place them on their corresponding mask positions. The flow field is then reconstructed by solving the edge-aware homogeneous diffusion inpainting proposed in Section II. We use a conjugate gradient solver with a relative residual norm decay of 10 −5 as a stopping criterion. In this section we show the potential of our method by comparing it to JPEG2000 and BPG, an adaption of HEVC's intra-coding mode for the compression of still images [20] . As both methods are not designed to handle float-valued pixel data, we quantise the input flow fields to integer values in the range {0, ..., 255} with the uniform quantisation described in Section III. We also use these quantised flow fields as input to our method. As an implementation of JPEG2000 we choose Kakadu [21] and use the original implementation of Fabrice Bellard [5] for BPG. We select two ground truth flow fields of the MPI Sintel Flow Dataset [22] as test images. This provides us with high quality flow fields with realistic complexity. We choose a simple flow field of the scene alley1 and a more complex one of the scene market2 shown in Fig. 3 . To optimise the parameters of our approach we perform a simple grid search. We fix the standard deviation σ in the edge detection step to 0.5 as this gives good results for sharp edges. Fig. 4 shows the results for JPEG2000, BPG, and our edgeaware approach for a compression ratio of 400 : 1. Note that the compression ratio is computed relative to the quantised input flow fields. Transform-based codecs like JPEG2000 have problems preserving sharp edges. This results in unpleasant wave-like artefacts around object boundaries. BPG performs much better than JPEG2000 in preserving edges, but still blurs boundaries slightly. In contrast, our edge-aware homogeneous diffusion inpainting reconstructs all important edges very well. In a quantitative comparison we measure the peak-signal-tonoise ratio (PSNR) of the different approaches. Fig. 5 shows that our method outperforms both JPEG2000 and BPG for almost all compression ratios. VI. CONCLUSIONS AND OUTLOOK Our new framework for flow field compression combines homogeneous diffusion with explicitly stored edge data. A concrete implementation with straightforward edge detection achieves a remarkable quality and consistently outperforms state-of-the-art competitors such as JPEG2000 and BPG. This shows the significant potential of inpainting-based compression for motion data. The modular nature of our framework allows to adapt and improve our method by simply changing the different components. For example, more advanced edge detection or even segmentation might improve the quality of our method for flow fields with blurry edges. Future work should evaluate such methods for actual video compression. To this end the residual signal of hybrid video coding should be further analysed. In this context it is also interesting to investigate optical flow methods that give suitable results for edge-aware approaches like ours. Information technology -High efficiency coding and media delivery in heterogeneous environments -Part 2: High efficiency video coding Optical flow techniques applied to video coding Video compression with dense motion fields JPEG 2000: Image Compression Fundamentals, Standards and Practice Better portable graphics (BPG) Edgebased compression of cartoon-like images with homogeneous diffusion Compression of depth maps with segment-based homogeneous diffusion," in Scale Space and Variational Methods in Computer Vision, ser. Lecture Notes in Computer Science Theory of edge detection Geometric and Level-Set Methods in Computer Vision, ser. Lecture Notes in Computer Science Theoretical foundations of anisotropic diffusion in image processing Beating the quality of JPEG 2000 with anisotropic diffusion," in Pattern Recognition, ser. Lecture Notes in Computer Science Beyond pure quality: Progressive mode, region of interest coding and real time video decoding in PDE-based image compression Comparison of JPEG2000, BPG, and our edge-aware approach for two flow fields and a compression ratio of 400 : 1. Both JPEG2000 and BPG show artefacts around edges Compression ratio (X : 1) PSNR (dB) Compression ratio (X : 1) PSNR (dB) Comparison of our method with JPEG2000 and BPG in terms of PSNR (higher is better). Our codec outperforms JPEG2000 and BPG consistently on both alley (left) and market A proof-of-concept framework for PDE-based video compression Compressible motion fields Efficient depth map compression based on lossless edge coding and diffusion Picture Coding Symposium Basic theory on normalization of pattern (in case of typical one-dimensional pattern) Iterative Methods for Sparse Linear Systems A computational approach to edge detection Adaptive weighing of context models for lossless data compression Performance analysis of HEVC-based intra coding for still image compression Kakadu JPEG2000 A naturalistic open source movie for optical flow evaluation