key: cord-0044893-p99w96ir
authors: Destercke, Sébastien; Rico, Agnès; Strauss, Olivier
title: Approximating General Kernels by Extended Fuzzy Measures: Application to Filtering
date: 2020-05-15
journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems
DOI: 10.1007/978-3-030-50143-3_9
sha: 6b8a708df714396c84ad6a27fb0c7b4d20b1e97e
doc_id: 44893
cord_uid: p99w96ir

Convolution kernels are essential tools in signal processing: they are used to filter noisy signal, interpolate discrete signals, [Formula: see text]. However, in a given application, it is often hard to select an optimal shape of the kernel. This is why, in practice, it may be useful to possess efficient tools to perform a robustness analysis, talking the form in our case of an imprecise convolution. When convolution kernels are positive, their formal equivalence with probability distributions allows one to use imprecise probability theory to achieve such an imprecise convolution. However, many kernels can have negative values, in which case the previous equivalence does not hold anymore. Yet, we show mathematically in this paper that, while the formal equivalence is lost, the computational tools used to describe sets of probabilities by intervals on the singletons still retain their key properties when used to approximate sets of (possibly) non-positive kernels. We then illustrate their use on a single application that consists of filtering a human electrocardiogram signal by using a low-pass filter whose order is imprecisely known. We show, in this experiment, that the proposed approach leads to tighter bounds than previously proposed approaches.

Filtering a signal aims at removing some unwanted components or features, i.e. other signals or measurement noise. It is a common problem in both digital analysis and signal processing [5] . In this context, kernels are used for impulse response modelling, interpolation, linear and non-linear transformations, stochastic or band-pass filtering, etc. However, how to choose a particular kernel and its parameters to filter a given signal is often a tricky question.

A way to circumvent this difficulty is to filter the signal with a convex set of kernels, thus ending up with a set-valued signal, often summarized by lower/upper bounds. The set of kernels has to be convex due to the fact that, if two kernels are suitable to achieve the filtering, a combination of those two kernel should be suitable too. A key problem is then to propose a model to approximate this convex set of kernels in a reasonable way (i.e., without losing too much information) that will perform the set-valued filtering in an algorithmic efficient way guaranteeing the provided bounds (in the sense that the sets of signals obtained by filtering with each kernel of the set is contained within the bounds).

In the case of positive summative kernels, i.e., positive functions summing up to one, previous works used the formal equivalence between such kernels and probabilities at their advantage, and have proposed to use well-known probability set models as approximation tools. For example, maxitive [6] and cloudy [4] kernels respectively use possibility distributions and generalized p-boxes to model sets of kernels, and have used the properties of the induced lower measure on events to propose efficient filtering solutions from a computational standpoint.

However, when the kernel set to approximate contains functions that are not positive everywhere (but still sum up to one), this formal analogy is lost, and imprecise probabilistic tools cannot be used straightforwardly to model the set of kernels. Yet, recent works [9] have shown that in some cases such imprecise models can be meaningfully extended to accommodate negative values, while preserving the properties that makes them interesting for signal filtering (i.e., the guarantees of obtained bounds and the algorithmic efficiency). More formally, this means that we have to study the extension of Choquet-integral based digital filtering to the situation where kernels κ : X → [−A, B] ⊆ R can be any (bounded) function.

In this work, we show that this is also true for another popular model, namely probability intervals [1] , that consists in providing lower/upper bound on singleton probabilities. In particular, while principle applied to sets of additive but possibly negative measures lead to a model inducing a non-monotone set function, called signed measure, we show that the Choquet integral of such a measure still leads to interesting bounds for the filtered signal, in the sense that these bounds are guaranteed and are obtained for specific additive measure dominated by the signed measure. Let us call this new kind of interval-valued kernels imprecise kernels.

The paper is structured as follows. Section 2 recalls the setting we consider, as well as a few preliminaries. We demonstrate in Sect. 3 that probability intervals can be meaningfully extended to accommodate sets of additive but non-positive measures. Section 4 shows how these results can be applied to numerical signal filtering.

We assume a finite space X = {x 1 ,...,x n } of n points, that is a subset of an infinite discrete space Ω that may be a discretization of a continuous space (e.g., the real line), and an observed signal whose values at these points are f (x 1 ),..., f (x n ), that can represent a time-or space-dependent record (EEG signal, sound, etc.). A kernel is here a bounded discrete function η : Ω → [−A, B], often computed from a continuous kernel (corresponding, for example, to assumed filter impulse response). This kernel is such that ∑ x∈Ω η(x) = 1, and for a given kernel we will denote b η and a η the sum of the positive and negative parts of the kernel, respectively. That is:

Filtering the signal f by using the kernel η consists of estimating the filtered signal f at each point x of X by:

where ∀y ∈ X , η x (y) = η(y − x). Since filtering f amounts to compute the value off at each point of X , let us simplify the previous formal statements by assuming that, at each point x of X exists a kernel κ = η x . Note that the domain of κ can be restricted to X without any loss of generality. Moreover, b κ = ∑ x∈Ω max(0, κ(x)) = b η and a κ = ∑ x∈Ω max(0, −κ(x)) = a η .

From any kernel κ, we can build a set function μ κ :

, hence the additivity of the measure.

Estimating the value of the signalf in a given point x ∈ X requires to compute a value C κ that can be written as a weighted sum

If we order and rank the values of f such that f (x (1) 

where A (i) = {x (i) ,...,x (n) }, f (x (0) ) = 0 and A (1) = X . One can already notice the similarity with the usual Choquet integral.

Example 1. Consider a kernel η that is a Hermite polynomial of degree 3, and its sampled version pictured in Fig. 1 , with a = −2. From the picture, it is obvious that some of the values are negative.

In this paper, we consider the case where the ideal η is ill-known, that is we only know that, for each x ∈ X , η belongs to a convex set N , which entails that κ belongs to a convex set K . This set N (and K ) can reflect, for example, our uncertainty about which kernel should be ideally used (they can vary in shape, bandwidth, etc.). In the following section, we propose to approximate this set K of kernels that we deem suitable to filter f at point x by a measure μ K that is close to a fuzzy measure: it shares with such measures the fact that μ K ( / 0) = 0 and μ K (X ) = 1, and will be a standard fuzzy measure in the case where all κ ∈ K are positive. However, in the case of negative κ ∈ K , we may have μ K (A) < 0 for some A, as well as have for some

We now consider a set K of kernels κ defined on X , that can be discretised versions of a set of continuous kernels. We make no assumptions about this set, except for the fact that each kernel κ ∈ K is bounded and such that ∑ x∈X κ(x) = 1. If K contains a large amount of kernels, filtering the signal with each of them and getting the set answer {C κ |κ ∈ K } for each possible point of the signal can be untractable. Rather than doing so, we may search some efficient way to find some lower and upper bounds of {C κ |κ ∈ K } that are not too wide. To achieve such a task, we propose here to use tools inspired by the imprecise probabilistic literature, namely non-additive set functions and the Choquet integral. To use such tools, we must first build a set-function approximating the set K . To do so, we will extend probabilistic intervals [1] that consist in associating lower and upper bounds [l(x), u(x)] to each atom, given a set of probabilities. In our case, for each elements x ∈ X , we consider the interval-valued kernel

such that the bounds are given, for each x ∈ X , as

Clearly, both can be negative and are not classical probability intervals. Nevertheless, we will show that set-functions induced by these bounds enjoy properties similar to those of standard probability intervals, and hence can be used to efficiently approximate {C κ |κ ∈ K }. From the imprecise kernel ρ, we propose to build the set function μ :

which is formally the same equation as the one used for probability intervals. We still have that μ( / 0) = 0 and μ(X ) = 1 (as ρ(x) ≤ κ(x) for any μ ∈ K , we necessarily have ∑ x∈X ρ(x) ≤ 1). However, we can have μ(A) > μ(B) with A ⊂ B, meaning that μ is not a classical fuzzy measure, and not a so-called coherent lower probability 1 . It is also non-additive, as we have μ(A ∪ B) = μ(A) + μ(B) for A ∩ B = / 0 in general. Simply replacing μ κ by μ K in Eq. 2 gives us

where A (n+1) = / 0 and f (x 0 ) = 0. In the positive case,

In the rest of this section, we show that this is also true when the set K contains non-positive functions. In particular, we will show:

1. that Eq. (6) still provides a lower bound of inf μ κ ≥μ K C κ ( f ) and, 2. that this lower bound corresponds to an infimum, meaning that it is obtained for a peculiar additive measure of K .

To show the first point, we will first prove the following proposition concerning μ:

Proof. To prove it, consider a given κ ∈ K , we have μ κ

and since this is true for any κ ∈ K , we have the inequality.

Note that μ is a tight measure on singletons, since μ(x) = ρ(x) = inf κ∈K κ(x), hence any set-function higher than μ on singletons would not be a lower envelope of K . The fact that C K ( f ) ≤ C κ ( f ) for any κ ∈ K then simply follows from Proposition 1. If K reduces to a single kernel κ, then we find back the classical filtering result. Note that C K ( f ) is equivalent to filter f , that is to compute C κ , with the specific kernel κ(x (i) ) = μ(A (i) ) − μ(A (i+1) ). To prove that C K is a tight lower bound, it remains to show that such a kernel is within K . To show that the bound obtained by Eq. (6) is actually obtained by an additive measure dominated by μ K , we will first show that it still satisfies a convexity property. Proposition 2. Given a set K of kernels, the measure μ K is 2-monotone and convex, as for every pair A, B ⊆ X we have

Proof. We will mainly adapt the proof from [1, Proposition 5] to the case of nonpositive kernels and signed measures, as its mechanism still works in this case.

A key element will be to show that for any two subsets C, D with C ∩ D = / 0, there exists a single additive measure μ κ with κ ∈ K such that

as if we then take C = A ∩ B and D = (A ∪ B) \ (A ∩ B) and choose κ so that it coincides on μ K for events C, D, we do have

By Eq. (5), we know that

which means that for any event A, we have two possibilities (the two terms of the max). This means four possibilities when considering C and C ∪ D together. Here, we will only show that Eq. (8) is true for one of those case, as the proofs for the other cases follow similar reasoning. So let us consider the case where

Let us now consider the κ distribution such that κ(

that fits the requirements of Eq. (8) and so far satisfy the constraints on K . To get an additive kernel whose weights sum up to one, we must still assign λ = 1 − ∑ x∈C ρ(x) − ∑ x∈(C∪D) c ρ(x) mass over the singletons composing D. One can see that ∑ x∈D ρ(x) ≤ λ ≤ ∑ x∈D ρ(x): for instance, that ∑ x∈D ρ(x) ≤ λ immediately follows from the fact that in this sub-case 1 − ∑ x ∈C∪D ρ(x) ≥ ∑ x∈C∪D ρ(x). This means that one can choose values κ(x) ∈ [ρ(x), ρ(x)] for each x ∈ D such that ∑ x∈D κ(x) = λ . So in this case we can build an additive κ ∈ K with ∑ κ(x) = 1. That a single additive κ ∈ K reaching the bounds μ K (C) and μ K (C ∪ D) can be built in other sub-cases can be done similarly (we refer to [1, Proposition 5] , as the proofs are analogous).

Hence C K is a signed Choquet integral with respect to the convex capacity μ K . We have μ K ( / 0) = 0, so according to [10, Theorem 3] , C K is the minimum of the integrals or expectations taken with respect to the additive measures dominating μ K , i.e., C K ( f ) = min{C μ |μ ∈ core(μ K }, where the core of a capacity is the set of additive set function that lie above the capacity everywhere. This allows us to state the following property.

We have therefore shown that, to approximate the result of filtering with any set of kernels (bounded and with no gain), it is still possible to use tools issued from imprecise probabilistic literature. However, it is even clearer in this case that such tools should not be interpreted straightforwardly as uncertainty models (as set-functions are not monotone, a property satisfied by standard fuzzy measures and coherent lower probabilities), but as convenient and efficient tools to perform robust filtering.

An upper bound C K ( f ) can be obtained by using the conjugate capacity μ K (A) = 1− μ K (A c ) in Eq. (6) . As μ K is a concave capacity, we also have C K ( f ) = max{C μ |μ ∈ anticore(μ K )}, where the anticore of a capacity is the set of additive set function that lie below the capacity everywhere. Note that μ ∈ core(μ K ) is equivalent to μ ∈ anticore(μ K ). Finally, it should be noted that applying Eq. (6) does not require to evaluate our lower measure on every possible events, but only in a linear number of them (once function values have been ordered). Moreover, evaluating the value of this lower measure on any interval is quite straightforward given Eq. (5). So, even though the measure is non-additive (and not necessarily monotonic), evaluating the filtered values can be done quite efficiently.

We now illustrate the use of our method on a real case scenario involving the filtering of human electrocardiogram (ECG) signals, using data initially collected to detect heart conditions under different settings [7] . ECG signals contain many types of noises -e.g. baseline wander, power-line interference, electromyographic (EMG) noise, electrode motion artifact noise, etc. Baseline wander is a low-frequency noise of around 0.5 to 0.6 Hz that is usually removed during the recording by a high-pass filtering of cut-off frequency 0.5 to 0.6 Hz. EMG noise, which is a high frequency noise of above 100 Hz, may be removed by a digital low-pass filter with an appropriate cut-off frequency. In [7] they propose to use a cut-off frequency of 45 Hz to preprocess the ECG signals. The noisy ECG signal to be filtered is presented in Fig. 2 . To prevent phase distortion in the bandpass, a Butterworth kernel should preferably be chosen. Moreover, since the signal has not to be processed on line, a symmetric Butterworth filter can be used, that is the combination of a causal and an anti-causal Butterworth kernel. Using such an even kernel prevents from phase delay.

In this experiment, we suppose that the 45 Hz cutoff frequency proposed in [7] is appropriate while the suitable order of the kernel is imprecisely known. Figure 3 presents the superposition of 13 kernels that are the impulse responses of the 13 symmetric lowpass Butterworth kernels of orders 1 to 13. Figure 3 .a shows the superimposed kernels, that constitute the set N of kernels we have to approximate.

Applying our approximation to N provides the upper (ρ) and lower (ρ) bounds of the imprecise kernel that are pictured in Fig. 3 .c, with the lower in red, the upper in blue. These bounds are simply obtained by computing ρ = min n=1...13 η n and ρ = max n=1...13 η n where η n is the impulse response of the lowpass symmetric Butterworth kernel of order n with cutoff frequency equal to 45 Hz. To have a comparison point, we will also apply to the same signal the maxitive approach proposed in [9] , where a signed kernel is approximated by a couple of extended possibility distributions (π − , π + ). This couple of functions is computed in this way: π + = max n=1...13 π + n , where π + n is the most specific maxitive kernel that dominates η + n = max(0, η n ) and π − = max n=1...13 π − n , where π − n is the most specific maxitive kernel that dominates η − n = max(0, −η n ) (see [9] Equation (4)). Figure 3 .b plots π + (in blue) and −π − (in red). One can readily notice that, if their shape are similar, their boundary values are quite different (the imprecise maxitive kernel varying between −0.5 and 1.5, and our imprecise kernel between −0.01 and 0.07).

In Figs. 5 and 4, we have plotted the ECG signal of Fig. 2 filtered by the 13 kernels of Fig. 3 .a, as well as the imprecise signal obtained by using the most specific signed maxitive kernel defined by the couple of functions (π − , π + ), and the imprecise signal obtained by using the imprecise kernel plotted in Fig. 3 .c, respectively. The upper bounds of the imprecise filtered signals are plotted in blue while their lower bounds is plotted in red. It seems obvious, by looking at Fig. 4 , that the imprecise signal obtained by this new approach reaches our pursued goal, i.e. the obtained imprecise signal contains all the signals that would have been obtained by using the conventional approach. Moreover, the bounds are reasonably tight, which means that the core of the imprecise kernel is specific enough as an approximation. Indeed, non-parametric imprecise representation of kernels always leads to include unwanted kernels, and may lead to over-conservative bounds.

This is even more patent if we compare it to the signal bounds obtained with the maxitive approach, as this latter one leads to a less specific interval-valued signal. For instance, the values spanned by the interval-valued signal in our approach span from −500 to 200, and −800 to 500 for the maxitive approach. Another possible advantage of our approach is that the Kroenecker impulse is not necessarily included in the described set of kernels, while it is systematically included in a maxitive kernel, meaning that in this latter case the interval-valued signal always include the noisy original signal itself. 

In this paper, we have explored to which extent some of the tools usually used to model and reason with sets of probabilities can still be used when considering sets of additive measures that can be negative and fail the monotonicity condition. Such a situation happens, for instance, when filtering a signal.

We have proved that approximating such sets with interval-valued bounds on singletons by extending probability intervals still provides tools that allow on the one hand to use efficient algorithms, and on the other hand to get tight bounds (in the sense that obtained bounds are reached by specific additive measures). We have provided some preliminary experiments showing how our results could be used in filtering problems.

Future works could include the investigation of other imprecise probabilistic models that also offer computational advantages in the case of positive measures, such as using lower and upper bounds over sequences of nested events [3, 8] . Complementarily, we could investigate whether computations with some parametric sets of signed kernels can be achieved exactly and efficiently without resorting to an approximation, as can be sometimes done for positive kernels [2] .

Probability intervals: a tool for uncertain reasoning

Density estimation with imprecise kernels: application to classification

Unifying practical uncertainty representations: I. Generalized p-boxes

Filtering with clouds

Digital Signal Filtering, Analyses and Restoration

On the granularity of summative kernels

Stability analysis of the 12-lead ECG morphology in different physiological conditions of interest for biometric applications

F-boxes for filtering

Where the domination of maxitive kernels is extended to signed values

Nonmonotonic Choquet integrals