Truth Selection for Truth Discovery Models Exploiting Ordering Relationship Among Values


HAL Id: hal-01912290
https://hal.archives-ouvertes.fr/hal-01912290

Submitted on 6 Nov 2018

HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.

Truth selection for truth discovery models exploiting
ordering relationship among values

Valentina Beretta, Sébastien Harispe, Sylvie Ranwez, Isabelle Mougenot

To cite this version:
Valentina Beretta, Sébastien Harispe, Sylvie Ranwez, Isabelle Mougenot. Truth selection for truth
discovery models exploiting ordering relationship among values. Knowledge-Based Systems, Elsevier,
2018, 159, pp.298-308. �10.1016/j.knosys.2018.06.023�. �hal-01912290�

https://hal.archives-ouvertes.fr/hal-01912290
https://hal.archives-ouvertes.fr


Accepted Manuscript

Truth Selection for Truth Discovery Models Exploiting Ordering
Relationship Among Values

Valentina Beretta, Sébastien Harispe, Sylvie Ranwez,
Isabelle Mougenot

PII: S0950-7051(18)30332-0
DOI: 10.1016/j.knosys.2018.06.023
Reference: KNOSYS 4396

To appear in: Knowledge-Based Systems

Received date: 19 December 2017
Revised date: 25 June 2018
Accepted date: 28 June 2018

Please cite this article as: Valentina Beretta, Sébastien Harispe, Sylvie Ranwez, Isabelle Mougenot,
Truth Selection for Truth Discovery Models Exploiting Ordering Relationship Among Values,
Knowledge-Based Systems (2018), doi: 10.1016/j.knosys.2018.06.023

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.

https://doi.org/10.1016/j.knosys.2018.06.023
https://doi.org/10.1016/j.knosys.2018.06.023


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Truth Selection for Truth Discovery Models Exploiting

Ordering Relationship Among Values

Valentina Berettaa,∗, Sébastien Harispea, Sylvie Ranweza,
Isabelle Mougenotb

aLGI2P, IMT Mines Ales, Univ Montpellier, Ales, France.
bUMR 228 Espace Dev UM, Maison de la Télédétection, 500 rue JF Breton, 34093

Montpellier Cedex 5, France.

Abstract

Data veracity is one of the main issues regarding Web data. Truth Discovery
models can be used to assess it by estimating value confidence and source
trustworthiness through analysis of claims on the same real-world entities
provided by different sources. Many studies have been conducted in this
domain. True values selected by most models have the highest confidence es-
timation. This naive strategy cannot be applied to identify true values when
there is a partial order among values that is considered to enhance the final
performance. Indeed, in this case, the resulting estimations monotonically
increase with respect to the partial order of values. The highest confidence is
always assigned to the most general value that is implicitly supported by all
the others. Thus, using the highest confidence as criterion to select the true
values is not appropriate because it will always return the most general val-
ues. To address this problem, we propose a post-processing procedure that,
leveraging the partial order among values and their monotonic confidence
estimations, is able to identify the expected true value. Experimental results
on synthetic datasets show the effectiveness of our approach.

Keywords: Truth Identification, Truth Discovery, Conflicting values, Value
Relationships, Ontology

∗Corresponding author
Email addresses: valentina.beretta@mines-ales.fr (Valentina Beretta),

sebastien.harispe@mines-ales.fr (Sébastien Harispe),
sylvie.ranwez@mines-ales.fr (Sylvie Ranwez),
isabelle.mougenot@umontpellier.fr (Isabelle Mougenot)

Preprint submitted to Knowledge-Based Systems July 30, 2018


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

1. Introduction

Developing systems able to automatically evaluate the veracity of the
avalanche of data produced by modern information society is of critical im-
portance. Data veracity can be determined comparing information provided
by multiple sources on the same subject [1–3]. Numerous scientific commu-
nities contribute to studying this complex issue, most notably with respect
to (w.r.t.) data integration in information systems and databases.Among
the several difficult tasks that data integration addresses and the different
approaches that can be used to solve them [4–7], this paper focuses on auto-
matic truth discovery for solving situations in which different sources provide
potentially conflicting data about a specific property of an entity of interest,
e.g. on the place of birth of a person.
Truth Discovery (TD) consider the accuracy associated with the data sources
as an important factor to discriminate data veracity [6, 7].
The main aim of the TD models is to identify true information. They in-
tend to automatically solve, in an unsupervised manner, conflicts that may
occur among claims. They leverage both the redundancy of the data and the
information that is possible to derive from sources (particularly their relia-
bility). More precisely, the backbone of TD is based on the postulate that
reliable sources provide true information and that, conversely, true informa-
tion is given by reliable sources [3]. To identify reliable sources and true
information, TD approaches estimate both source trustworthiness and value
confidence; the true value is then considered to be the one with the high-
est confidence. Note that approaches that leverage information about data
sources to check data veracity are currently the focus of a lot of attention in
several domains such as social sensing[7] and question-answering [8].
Here we address the problem of selecting the truth for functional predicates
when a priori knowledge in the form of a partial order of values (e.g. sub-
sumption relationship in an ontology) is considered to improve value con-
fidence and source trustworthiness estimations. A partial order highlights
when different values are not conflicting, but they represent the same con-
cepts with different levels of granularity. Indeed, conflict and granularity are
two different aspects to consider when identifying the most reliable informa-
tion. While conflict values produce inconsistency, different granularities only
indicate imperfection in data [9]. In formal logic, a predicate p is consid-

2


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

ered functional if for any subject there is a unique value v ∈ V for which
p(subject,v) is true1 – birthplace is an example of a functional predicate.
Note that this definition does not take subsumption relationships among val-
ues into account. This is in accordance to the fact that, for instance, everyone
was born in a specific location, but it does not consider that this place can
be described using different levels of precision, e.g. district or region. In this
case, multiple values can be true given the same subject. Thus, consider-
ing partial order of values and closed world assumption (our knowledge of
the world is complete), a predicate is functional if for any subject there is
a unique value v ∈ V , for which p(subject,v) is true, such that there is not
another value v′ ∈ V subsumed by v for which p(subject,v′) is true. As far
as we know, we are the first to propose to take ontologies (as a priori knowl-
edge) into account [10]. In this situation, the traditional final value selection
step in the majority of TD approaches cannot be applied. Indeed, in this
case, since more abstract values will de facto be associated with a higher
confidence value in accordance with the partial ordering of values modeling
implications among them, the true value cannot be defined as that with the
highest confidence. This is due to the hypotheses used by approaches that
leverage information related to the structure that may exist among values.
Briefly, sources that explicitly claim a value implicitly support all of its gen-
eralizations. Therefore, if a source claims that ”Pablo Picasso was born in
Malaga”, it also implicitly supports the assertion that he was born in Spain,
Europe, etc (considering that it is in agreement with the ontology). Thus
the most general value is implicitly supported by all the others. Hence, its
confidence will always be the highest. However, considering the most gen-
eral value to be the only truth is not trustworthy (since it is a tautology),
e.g. stating that Pablo Picasso was born in a Location is not meaningful.
This paper proposes to overcome this problem by studying a solution able
to identify more specific true answers (than the most general one) that may
exist.
Our contribution consists of:

• proposing a post-processing approach able to identify the truth given
the confidence estimations returned by any TD model that considers
structured values;

1Elements of V are here considered to be independent.

3


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

• performing empirical experiments on synthetic datasets – this evalua-
tion uses estimations returned by an adaptation of Sums able to take
prior knowledge in the form of a partial order among values into ac-
count [10] – and comparing the proposed approach with existing ones
evaluating identified true values.

The rest of the paper is structured as follows. Section 2 presents an overview
of TD approaches taking advantage of potential relationships among sources
or claims. This section ends with a discussion about the consequences of
using a partial order among values as relationship information. In Section
3, notations are introduced and the problem is formalized. The solution
strategy we propose is detailed in Section 4. The model is assessed via sev-
eral experiments reported in Section 5 and discussed in Section 6. Section
7 summarizes the main findings of the study and the results that have been
obtained; while the perspectives opened by our contribution are finally dis-
cussed.

2. Related work

Truth discovery aims to solve conflicts among data provided by several
sources. The data treated in this domain consists of claims specifying the
values that sources associate with certain data items (i.e. a data item repre-
sents a particular aspect of a real-world entity). Values can be numerical or
categorical/strings. The main assumption of TD is that true information is
provided by reliable sources and reliable sources provide true information (
[3, 11]). This rationale can be modeled by defining the value confidence and
source trustworthiness.
Many studies have been proposed in this field [3]. The baseline model con-
sists of a voting strategy. For each data item it regards the value which is
the most frequently claimed as truth. All sources are therefore implicitly
considered similarly in this model. Otherwise TD estimates for each source a
different trustworthiness level based on the claims it provides. Some models
deal with numerical values [12], others with categorical claims [7, 13–15] and
others with both [11, 16, 17]. While basic TD models limit their complexity
to correctly estimate confidence and trustworthiness with different formulas,
other approaches incorporate additional information to improve the overall
performances. The latter group of approaches is the most relevant for our
study and is detailed hereafter.

4


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Among all of these methods, we focus our attention on those that use, as
additional knowledge, correlations2 that may exist among sources ([17–19]),
data items ([7, 20, 21]) or values ([11, 22]).
The first class is related to source interdependences. These models consider
source relationships mainly by analyzing the pattern of similar claims with
correlated accuracy estimations. They also usually assume that sources shar-
ing common false values are more likely to be dependent than sources sharing
common true values. Indeed, it is difficult to identify dependencies between
sources stating different false values [23]. Most of these studies only analyze
static correlations. To the best of our knowledge, time-course dependency
relationship patterns has been only considered in [24]. In this case, depen-
dency among sources is captured by studying the similarity between patterns
of updates associated with sources [23]. Several methods take advantage of
dependency in terms of the copying relationship [17–19, 22]. Other corre-
lations among sources may also occur, such as the common errors made by
different extractors that use the same extraction rules or the common values
identified by the extractor that use different rules, and so on[17]. Moreover,
the dependence relationship is often considered between source pairs, but
dependencies may also occur at the group level [18].
The second correlation class is related to data items. The first body of works
in this context proposed to deal with the social sensing problem. In crowd
sensing, humans coupled with their smartphones become sensors that ex-
plicitly or implicitly provide observations about their physical environment.
Then it becomes necessary to understand the validity of data sent by sensors.
TD models applied to this domain take advantage of both physical [25] and
temporal [26] correlations as well as causal relationships [21]. For physical
correlations, they assume that co-located data items should have similar val-
ues. For instance, gas stations located in the same area should have similar
gas prices. For temporal correlations, the assumption is that two temporally
close observations cannot have very different values. This kind of correla-
tion is especially useful when analyzed data has a long-tail characteristic, i.e.
many data items observed by few sources and few data items observed by
many sources. Indeed, in this case the estimations can easily deteriorate if
the few sources that provide claims for a data item are also unreliable. Using

2We mean by correlation the interdependences between entities, the relationships that
may influence them.

5


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

correlations, information associated with data items having a high number
of observations provided by reliable sources can be propagated to data items
having only a few claims associated with them. The findings of the two
studies [25] and [26] permit to partition data items into small groups with-
out considering any dependency among groups, but the complexity of their
solutions is exponential w.r.t. the maximum group size. Alternative models
have been proposed to overcome this limitation to be able to deal with a
large number of dependencies, e.g. [20] [21]. The former classifies the prob-
lem as an optimization problem, and the latter, modelling the problem as a
Bayesian network, leverages potential conditional independences among data
items. Moreover, in this study they also take into account a second kind of
correlation related to the data items: i.e. the category. In this case, a trust-
worthiness level may be attributed to each source w.r.t. the category a data
item belongs to. The main limitation of this approach is that the Bayesian
network has to be known or empirically learned from historical data by spe-
cific algorithms.
The third type of correlation regards the values. The basic idea is that two
correlated values support each other. If one of them is considered true, then
the other has a high probability to be true. In order to evaluate value corre-
lations, previous studies such as [11], [22], and [16] use value similarity. For
instance, they compute the edit distance of strings, similarity among sets,
and difference among numerical values. Otherwise, in our previous study
we took advantage of value correlations in the form of partial ordering that
may exist among the provided values [10]. An example of partial order is
shown in Fig. 1. Given this partial order, we adapted the Sums approach to
incorporate this new information. The rationale is that if a source associates
the value Spain to an aspect of a real-world entity, then it also implicitly
supports all more general values, e.g. Europe. Thus we modified the formula
to estimate the confidence of a value changing the set of sources used for the
calculus. In Sums, confidence is computed by considering all sources that pro-
vide an analyzed value, while in the adapted model (AdaptedSums) sources
that explicitly claim a more specific value than the one being analyzed are
also taken into account as well. Indeed, these sources implicitly support all
more general values than the one they provide. Using the adapted approach,
at the end of the iterative procedure, the value confidence estimations mono-
tonically decrease w.r.t. the partial order among values. In other words, for
each value more specific than another, the confidence of the former is lower
than or equal to the confidence of the latter. As a result, when considering

6


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Figure 1: Example of a partial order that may exist among values.

the partial order of values, the highest confidence score will be always as-
signed to the most general value. In this context, using the usual strategy
adopted by the existing models is not worthy. Indeed, selecting as truth the
value having the highest confidence, they will always return as true value
the most general one. In the rest of the paper, we describe a refined post-
processing strategy able to select the true value leveraging these monotonic
estimations.

3. Problem Formulation

Let’s consider a set of data items D such that each d ∈ D is composed of
a pair (subject,predicate) where the subject represents a real-world entity
and the predicate represents its aspect of interest, e.g. (picasso,bornIn).
S denotes the set of sources, V the set of values, V s ⊆ V the set of values
provided by s ∈ S (for each data item for which s provided values), and
Vd ⊆ V the set of values associated with data item d ∈ D.
Formally, TD models first aim to identify the set of true values V ∗d ⊆ V
for each data item d ∈ D. In the case of a data item d characterized by a
functional predicate, we have |V ∗d | = 1 if elements from V are disjoint, i.e.
∀(v,v′) ∈ V 2,¬(v =⇒ v′) ∧¬(v′ =⇒ v). Note that a value v implies

7


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

( =⇒ ) a value v′ when v is subsumed by v′. To ease the formal introduction,
and in accordance with the literature, we will as often as possible consider
the special case of data items associated with the functional predicate.
Dealing with data items composed of functional predicates TD identifies
the true values v∗d ∈ V for each data item d estimating value confidences
c : V → [0, 1] – how an information is likely to be true – and source trust-
worthiness t : S → [0, 1] – how reliable is a source. This is done through
an iterative procedure that alternatively estimates them. The execution of
the model finishes when the stopping criteria is verified, e.g. convergence of
estimations, maximum number of iterations, and so on. Hence, each value
vd ∈ V (w.r.t. a data item) is associated with a confidence level c(vd) and
each source s ∈ S with a trustworthiness level t(s).
Existing approaches usually assume that for a specific data item d, elements
of Vd are disjoint/independent and they therefore recognize the true value of
d is that with the highest confidence score. This straightforward procedure
cannot be applied using adapted models that consider ordering among val-
ues. Incorporating this information into the model, each value more general
than a true value can only be considered as true as well3. Therefore, consid-
ering all values associated with a data item, the estimated confidence scores
monotonically increase w.r.t. the partial ordering of values, i.e. ∀v,v′ ∈ V :
if (v =⇒ v′), then c(v) ≤ c(v′). Consequently, the highest confidence score
is always assigned to the most general value (that is implicitly supported
by all provided claims). To solve this problem, we propose a post-processing
procedure able to select the true value for each data item given the estimated
confidence scores and the relationships that may exist among values.
We assume that the value dependencies are known a priori in the form of
a partial order modelled by an ontology O = (�,V ). Note that even if the
domain knowledge is not available, partial order can be automatically con-
structed [27]. The partial order � can be represented by a Directed Acyclic
Graph (DAG), GO = (V,E), where V = {v0,v1, · · · ,vm} is the set of values
representing our vocabulary according to our knowledge of the world (all pos-
sible values that can be claimed by sources), and E = {(x,y) ∈ V ×V |x � y}
is the set of edges specifying the partial ordering that exists between values.
Specifically x � y when there is a directed path from x to y in the DAG;
i.e. when y is reachable from x [28]. Note that a path from x to y is defined

3Assuming that the value ordering is consensual.

8


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

as a non-empty sequence of n different nodes 〈v0,v1, · · · ,vn−1〉 with x = v0,
y = vn−1 and for which ∀i ∈ [0,n− 2] (vi,vi+1) ∈ E. An important charac-
teristic of the graph GO is that it has to be transitively reduced. This is not
a problem because by considering any DAG its transitive reduction can be
obtained [29].
Here we introduce several functions that will be useful for manipulating the
graphs (with G a set of graphs):

• ancestors: G×V →P(V ) such that ancestors(GO,x) = {y|x � y}.

• children : G×V →P(V ) designed as children(GO,x) = {z|(z,x) ∈ E}

• root : G → V such that as root(GO) = {x|∀y ∈ V,y � x}

These properties enable us to easily explain our procedure to traverse the
partial value ordering graph in the next section.
Further important information that can be derived from any ontology is the
Information Contents (IC) of its concepts (e.g. [30]). This quantity, related
to the concept specificity (see Section 3.3 in [31]) represents the degree of
abstraction/concreteness of a concept w.r.t an ontology. One of main IC
property is that the IC score monotonically decreases from the root to the
leaves, i.e. if x � y, then IC(x) ≥ IC(y) (IC(root) = 0). This score will
help us to discriminate between different values w.r.t. their granularity.
All of the elements presented in this section will help define the approach
used to select the true values that is described in Section 4.

4. Proposed Approach

The entire truth-discovery procedure, from the input consisting of a set of
claims to the output consisting of the true values and the degree of reliability
associated with each source, is presented in Fig. 2. In this section, we propose
a post-processing procedure that selects the true values given the estimations
obtained by TD models that relax the assumption related to the disjointness
of values.
It involves three steps: (i) selection of the best true value candidates; (ii)
ranking of selected values; and (iii) filtering of ranked values w.r.t. defined
desirable properties. For instance it may be useful to return a set of solutions
that share an ordering relationship or, on the contrary, to return a value set
composed only of “alternatives” that are not ordered. The choice related to

9


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

the appropriate features of the solution set depends mainly on the application
scenario.
The first step of the process: (i) permits to retrieve the most specific possible
true value(s) and all of its ancestors using available information, such as
confidence scores and partial ordering of values. The second step: (ii) orders
the selected values based on predefined criteria. The third step: (iii) is
required to filter the top k results. For TD, the final aim k should be equal
to 1, but in cases where there is uncertainty it may be useful to return a
set of values, even if the predicate is functional. Moreover, answers that do
not have defined desirable properties (see Section 4.3 for further details) are
removed from the result list. Those three steps are detailed hereafter.

4.1. True value selection

The first part of the post-processing procedure concerns the selection of
the promising candidate(s) as the most expected value(s) for each data item.
We have defined a selection strategy that takes advantage of the partial order
of values and step by step refines the granularity of the correct value asso-
ciated with each data item. Now we will give an overview of the approach
followed by Algorithm 1.

Figure 2: Diagram of the overall TD procedure.

10


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Starting from the most general value (implicitly supported by all provided
values and surely true), the process aims to detect the most specific expected
value(s). A traversing procedure was thus applied on the graph that repre-
sents the partial order of values. It starts from the root, it selects the best
alternatives among the children of the considered node and moves forward
through the selected values. Our assumption is that values with the highest
confidence locally should be the most likely to be true. Therefore the choice
of the best alternative(s) is done by comparing the confidence scores associ-
ated with the children of the previously selected node.
In the case of functional predicate, the values can be partially ordered by
their granularity, see Fig. 1. Therefore the selection procedure refines, at
each step, the level of precision used to describe the single true value associ-
ated with a data item. The semantics of each selected node expresses the fact
that the node subsumes the correct solution (i.e. the expected true value).
The last selected nodes should correspond to the most specific answers that
can be identified through the selection process.
The selection process has to handle two main undesirable situations that may
occur: (1) selection of values with a confidence score too low to be considered
as true, and (2) difficulty in discriminating the best alternative(s) among the
children of a node since their confidence scores are not significantly different.
As a solution, two thresholds have been defined: θ and δ.
The threshold θ ∈ (0, 1] enables us to specify the confidence lower bound
required for a value to be part of the set of true values. Note that the value 0
is not included in the θ interval. Indeed, considering claims with confidence
scores equal to 0 makes no sense because it would mean considering, as truth,
values provided by totally unreliable sources (all with trustworthiness equal
to 0). The confidence score that is compared to θ has to be previously nor-
malized w.r.t. each data item, i.e. the confidence score associated with the
most general value of each data item always has to be equal to 1. This nor-
malization step is required to avoid the definition of an inconsistent threshold
w.r.t. the different data items.
The threshold δ ∈ [0, 1] represents the minimum difference that has to exist
among values with the highest confidence and all the others so that one pre-
vails over the others. In particular, if the difference between the confidences
of two values is less than or equal to δ, then it is hard to make a choice among
them. This comparison is done among values that are children of the same
father to select the best alternatives.
The definition of different parameter settings produces different behaviours

11


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Table 1: Interesting settings for the selection procedure.

Config θ δ Selection procedure behaviour

1 α 0 Naive greedy procedure that maximizes the confidence score
at each step.

2 α 1 Greedy procedure that selects all values greater than α.
Since δ is equal to 1, all values with confidence higher than
α are selected.

3 α β At each iteration a value is collected only if the difference of
its confidence and the highest confidence at the current step
is lower or equal to β. All values in the returned set have
confidence that is greater than α.

of the selection phase ending in the possibility of obtaining different kinds of
solution sets. The main parameter settings are summarized in Table 1.
Configuration 1 (θ = α,δ = 0) reproduces a naive greedy algorithm that, at
each step, selects values with the highest confidence greater than α without
performing any other control.
Configuration 2 (θ = α,δ = 1) is able to return all claimed values with confi-
dence higher than α. It may seem useless, but it is a selection configuration
necessary to obtain a particular set of values at the end of the post-processing
procedure. A set composed of “promising” alternatives that allow to report
values that are, as much as possible, fine-grained and semantically different.
In this way, we increase the probability of finding the correct value since we
increase the number of different concepts that are considered. Therefore this
strategy is useful to deal with cases in which there is a lot of uncertainty. The
idea is to return all claimed values and their ancestors and then, using the
ranking phase to position in the first places, the most promising alternatives
with the properties we have just explained.
Configuration 3 (θ = α,δ = β) is a generalization of the two previous con-
figurations. It selects the set of values that are greater than a threshold θ
and they differ, at each step, more than δ from the confidence of the other
alternatives.
Algorithm 1 reports the pseudo-code of the selection procedure. The algo-

rithm starts performing a transitive reduction of the graph representation of
the partial ordering (line 2). We thus ensure that the choice of the best al-
ternative is done among a set of children that do not share ordered relations.
Moreover, this avoids useless comparison of a large number of confidence

12


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Algorithm 1 True value set computation for any d ∈ D considering a partial
order of values represented as a DAG GO = (V,E), a threshold θ ∈ (0, 1], a
threshold δ ∈ [0, 1], and a function c : V → [0, 1], i.e. confidence of each value

1: procedure SelectionTrueValues(d,GO,c,θ,δ)
2: G ← transitive reduction(GO)
3: V ∗visited ←{ }
4: queue ← list(root(G))
5: while !(queue.isEmpty()) do
6: v ← queue.pop()
7: V ∗visited ← V ∗visited ∪{v}
8: Vch ← children(G,v)
9: confmax ← max

child∈Vch
(c(child))

10: Vch∗ = {v
′ ∈ Vch : c(v′) ≥ θ ∧ (confmax − c(v′)) ≤ δ}

11: queue.addAll(Vch∗ \V ∗visited)
12: return

⋃
v∈V ∗

visited

ancestors(G,v)

scores. Then, at each iteration, the algorithm applies a greedy search by
maximizing the confidence of the values (lines 5 – 9). It selects all values
having confidence higher than or equal to θ whose scores are not significantly
different from the highest confidence (line 10). Then, it adds them to the
queue (line 11). Note that the the confidence scores were computed applying
AdaptedSums. This model computes the value confidences summing up all
trustworthiness of those sources providing the considered value or one of its
possible specializations. The procedure stops when the last selected value has
no more specific values to be visited. In order to be in accordance with our
assumption and problem settings, all values that are more general than that
selected will compose the set of true values – due to multiple inheritances
some of those values may not have been visited by the greedy procedure (line
12). The fact that confidence score monotonically increases w.r.t. the partial
order ensures that the scores related to ancestors of the visited values are
higher than or equal to θ.
The termination of Algorithm 1 is ensured by line 6 and line 11. The com-
plexity of the selection of the true value algorithm is related to the number
of comparisons required to find the maximum value confidence traversing
graph GO. Therefore, the complexity of the algorithm is O(E) which in turn

13


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

is O(V 2). At each step, a number of comparisons equal to the number of node
children is required. The worst case scenario is verified when the following
conditions holds at the same time: (i) graph GO has depth 2, (ii) its nodes
are uniformly distributed between level 2 and 3, (iii) nodes at the same level
have the same fathers and the same children and, moreover, (iv) they have
equal or not significantly different confidence scores. The conditions related
to the morphology of the DAG ensure that the number of comparisons is
maximum, and the condition on the confidence score guarantees that the
procedure traverses all nodes.
All of the configurations of the algorithm input parameters enable us to se-
lect a set of possible true values. Since the aim of TD is to find the most
expected solution, a method able to choose it is required. The ranking phase
described in the next section is devoted to this.

4.2. True value ranking

Given the true value set selected in the previous step, we have to define
a ranking method in order to select the k ∈ N+ most expected values where
k is a fixed number. In our investigations, k is experimentally set, at the
most, at 5. The solution set of most expected true values is indicated as
V ∗ ⊆P(V ∗candidates), where V ∗candidates is the value set returned by the selection
phase.
Now we propose to rank the values based on rather:

• their IC. This method is useful for situations in which specific answers
are expected and when there is not much uncertainty on the data item
under consideration. Note that in the following experiments IC is a
measure computed according to the definition provided by Seco based
on the analysis of the partial ordering topology [30]. In particular, it
takes advantage of the number of descendants of a value:

ICSeco(v) = 1 −
log(|descendants(v)| + 1)

log(|V |) (1)

where |V | is a non-empty set since an ontology is considered to have at
least one concept, i.e. the root value.
IC has been proposed because the user generally expects very precise
answers. Often general true values for a data item are well known a
priori, i.e. it is well known that a person is born in a place. If two
or more true values have the same IC, then random selection can be

14


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

done or, alternatively, another criterion can be used to rank this value
subset.

• their source average trustworthiness, denoted WAtrust(v). The ratio-
nale is that if a lot of unreliable sources support a false value A (in-
creasing its confidence score – Sums does not normalize based on the
number of sources claiming a value, therefore its confidence estimation
is biased), and there are only a few reliable sources that support a true
value B, then sources providing B should have higher average trust-
worthiness scores. This measure is obtained by computing the average
trustworthinesses associated with sources that explicitly or implicitly
claim to have a particular value v and by weighting it by a normaliza-
tion factor:

WAtrust(v) = (1 −
1

η + |Sv+|) ∗avgtrust(v) (2)

where the average of source trustworthiness is represented by avgtrust(v),
Svd+ is the set of sources that implicitly or explicitly provide the value
vd and η is a small number used to avoid that WAtrust(v) = 0 when v
is provided by only one source. The first factor, i.e. normalization, was
introduced in order to tune the average w.r.t. the number of sources
providing the value. Indeed, inspired by the study presented in [32],
the higher the number of sources providing a value, the higher our con-
fidence in the computed average should be.
Moreover in this case, if two values have the same WAtrust, then an-
other criterion can be used to rank them.

Once the values are ranked, the next and final step of the post-processing
procedure can be performed.

4.3. Filtering of top-k true values

The filtering phase collects the top k values in the rank and returns them
to end-users. Before performing selection of the top k values, all the ranked
ones have to be controlled. This is necessary because truth discovery models
can be applied to different scenarios: high or low uncertainty situations, high
or low risky cases in which making an error is, respectively, very dangerous
or not. For instance, if truth discovery models are used to populate a medical
knowledge base containing, for each symptom, all possible correlated diseases,

15


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

then the end-users want to be really careful in accepting a value as true.
Therefore, based on the possible application contexts, different properties
that the solution set V ∗ has to respect can be defined. In this way various
true value sets with different characteristics can be identified:

• the solution set V ∗ord contains only values sharing partial ordering rela-
tionships; formally ∀(x,y) ∈ V ∗ord ×V ∗ord,x � y∨y � x. The procedure
to create a set containing values that respect this property is as follows:
it iteratively selects and removes the first element in the ranked list.
Then it adds this value to the solution set only if it is an ancestor or
descendant of all elements that are already present in it.

• the solution set V ∗disj is composed of values that do not share any partial
ordering relationships; formally ∀(x,y) ∈ V ∗disj×V ∗disj,¬(x � y)∧¬(y �
x)∧@ w,z ∈ V ∗candidatesw � x∨z � y. This means that all values in the
solution set are the most specific among those returned by the selection
phase. Indeed, only values that do not have descendants in the returned
true value set belong to the solution. The procedure used returns a set
of alternatives that are as much as possible very specific and different.
In other words, this set of values consists of elements that do not have
any of their exclusive descendants in the sorted list. For example, if the
values returned by the previous step are Europe, Continent, Country,
City, Location, then, in accordance with the partial order in Fig. 1, the
V ∗disj is composed only of Europe, Country and City.

The first kind of solution can be desirable when there is not much uncer-
tainty (end-users expect to easily find the true answers) or the end-users do
not want to deal with potentially different values in a domain where they are
not experts. The second property can be adopted when there is a lot of un-
certainty and especially when the application context could result in making
errors without dangerous consequences. Indeed, when there is uncertainty,
to postpone the selection of true values to the end-users, avoiding to auto-
matically select only a specific value and its ancestor, may be useful. In order
to support the end-users final choice, returning a set of values composed of
the most promising alternatives is important.
Obtaining V ∗ord is suitable when δ = 0. Indeed, taking the value with the
highest confidence at each step, the process ends with the selection of only
one specific true value (and its ancestors). Considering this set of returned
values, the first property is often verified without filtering any value out. In

16


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

any case, very general values often are not returned since only the top-k val-
ues are selected after the verification of the property. Otherwise, the second
property, i.e. V ∗disj, is not useful considering δ = 0. Only the single most
specific value contained in the set of returned values is selected when this
property must hold. Indeed, all of the others share partial ordering relation-
ships. This corresponds to consider that δ = 0, k = 1 and a solution set V ∗ord.
Obtaining V ∗disj is preferable when δ = 1. Indeed, in those cases all values
having confidence higher than θ are returned by the selection phase, but for
the final aim of truth discovery (finding the truth) it is suitable to only keep
the set of “promising” alternatives that correspond to a set of values that
are different and specific as much as possible.

5. Experiments

In this section we describe all experiments performed on synthetic datasets
to confirm the validity of our approach. First, we focus on the synthetic
datasets: how we generated them and what are the parameter settings we
tested. Then, we present the evaluation methodology we used to evaluate
the approach and to compare it with existing models.

5.1. Datasets

In order to assess the behaviour of our approach w.r.t. the ontology used
to derive the relationship among values, we integrate preliminary experiments
carried out using the synthetic birthPlace datasets (see [10] for further details
related to their generation) with additional ones performed using different
partial ordering structures.
Each synthetic dataset contains a set of claims concerning a specific predi-

cate, a set of sources and the subset of claims provided by each source. We

Table 2: Features of the different partial order structures.

Features CC MF BP genre birthPlace

Values 3984 10243 28822 1838 682658

� depth (max depth) 12 15 16 8 14
Average depth 5,223 5,610 6,906 3,93 5,424

Average # of children 1,451 1,196 1,898 1,041 1,535

Max # of children 466 291 451 824 160194

Leaves 3016 8192 14797 1563 663373

17


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

(a) # of sources per data item (b) # of dataitems per source

Figure 3: Statistics of sources-data items for the CC datasets.

generate 5 different datasets considering the predicates birthPlace and genre
from DBpedia [33], and the predicates Cellular Component(CC ), Molecular
Function (MF ) and Biological Process (BP) from Gene Ontology [34]. All
the datasets are randomly generated based on a ground truth (containing
a set of true claims, for each predicate), a partial order among the values
contained in the ground truth (an example is shown in Fig. 1) and a set of
factitious sources. Note that Table 2 reports the features associated to the
different partial ordering structures that we use. Given these elements the
generation process can start.
First, a trustworthiness level is associated with each source. We assume that
the majority of sources are sufficiently reliable and only a few of them are al-
ways or never correct. A Gaussian distribution with an average and standard
deviation equal to, respectively, 0.6 and 0.4 was used to model the described
behaviour.
Second, we reproduce the long-tail phenomena [35] for which many sources

provide values for a few data items and a few sources provide values for
many data items. This is modelled using a simple exponential distribution.
It associates, with each source, the number of data items on which it has to
provide a value. The statistic that confirm that this behaviour is respected
by the datasets that were generated are reported in Fig. 3. In Fig. 3a we
observe that approximatively 80% of data items are claimed by less than 500
sources. Fig. 3b shows that most of sources have provided at least 1000 data
items.
Third, each source claims a true or false value for a specific data item w.r.t.

18


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

its trustworthiness. In case of true claims, the value is selected among the
inclusive ancestors of the value contained in the data item. In the case of
false claims, it is selected from the set of values that are neither inclusive an-
cestors nor descendants of the true one denoted v∗d. In both cases the values
are selected w.r.t. a similarity measure between the values and v∗d.
For the selection of the true values, three different strategies were adopted:
EXP, LOW E, UNI (Fig. 4). EXP simulates cases in which sources are quite
sure about the true values, so they tend to claim values similar to the ex-
pected one (contained in the ground truth) when they have to provide a
true value. UNI reproduces a world where there is a lot of uncertainty, then
the sources tend to indiscriminately select the value from the entire set of
possible true values. LOW E is a trade-off between the previous two types.
Sources uniformly select the value from the set of possibilities, but there is a
slightly higher probability of choosing values similar to the expected one. For
instance, Fig. 4 reports, on the x axis, the values of Fig. 1 ordered according
to their similarity measures w.r.t. the true value Malaga. Considering that
v∗d is Malaga and the EXP law, sources will more often provide values such as
Malaga, Spain and City than values like Continent or Location. Otherwise,
considering the UNI distribution, the probability of claiming these values will
be the same.
For the selection of the false values, only a single strategy was considered. A
source that has to provide a false claim tends to provide a false value that is

Figure 4: Distributions used to select true values.

19


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Table 3: Set of experiments performed for each predicate.

Conf. θ δ Rank Filter
Name 1st 2nd

TSbCtrust 0,..., 0.5 0 WAtrust ICSeco V
∗
ord

TSbCIC 0,..., 0.5 0 ICSeco WAtrust V
∗
ord

TSaCtrust 0,..., 0.5 1 WAtrust ICSeco V
∗
disj

TSaCIC 0,..., 0.5 1 ICSeco WAtrust V
∗
disj

similar to the expected true value. For instance, if the true value is Malaga,
then a source provides the value Portugal with an higher probability than the
value Brazil. Moreover, sources tend to claim the same false values. There-
fore, the probability of a value to be selected as false one increases w.r.t. the
number of sources that previously claimed it.
For each predicate, 20 datasets were produced w.r.t. the different laws that
can be used to select the true values provided by sources. Further details
on the generation of the ground truth and how the partial order of values has
been derived are provided with the datasets at www.github.com/lgi2p/TDSelection.

5.2. Experiment settings

In order to provide robust results, considering each predicate, we gener-
ated 60 synthetic datasets (20 for each different distribution used to select the
true values). Several experiments were conducted on them. Table 3 reports
all of the experimental settings in which the datasets were tested. The name
associated with each configuration indicates the delta setting. When δ = 0
the approach is called TSbC (Truth Selection of the Best Child). Indeed,
the selection algorithm chooses, at each step, the value with the highest con-
fidence. In other words, it selects the best node among the children of the
considered one. Otherwise, when δ = 1 the approach is called TSaC (Truth
Selection of all Children). Indeed, using this configuration the algorithm se-
lects, at each step, all the children of the considered nodes. Moreover, the
subscript specifies the first ranking criteria used, i.e. TSbCIC means that IC
is used for the ranking phase as first criteria to order the values. For all the
experiments, different threshold θ values were used: 0, 0.1, 0.2, 0.3, 0.4, 0.5.
Note that when δ is equal to 0, we test only the property of the solution set
indicating that its values share ordering relationships. Indeed, the selection
procedure in this case chooses, at each step, only values having the highest
confidence and therefore only a single most specific value and its ancestors

20


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

can be returned. No alternatives to the most specific value can be selected.
When δ is equal to 1, we test only the property indicating that the values
in the solution set do not share a partial order. The procedure may select
more than one branch. In this situation, if we force the returned true values
to share an ordering relationship, we oblige the algorithm to select only one
path. Thus, the main advantage of this configuration, i.e. to propose a set
of alternatives, is wasted.
For the confidence and trustworthiness estimations, we initialize the confi-
dence value at 0.5 in order to start the iterative procedure, i.e. Adapted-
Sums. The stopping criteria used for the iterative procedure is the same
as that used in the original paper of Sums[36]: the procedure was stopped
after 20 iterations. The algorithms were implemented in Python 3.4. The
experiments were performed on a PC with an Intel Core 2 Duo processor
(2.93GHz×8GB). To give an idea, using the codebase developed for these
experiments4, memory consumption varies from 1.6 to 4.3 GB depending on
the number of values composing the partial order. Using TSbC and TSaC,
running times were, respectively, around 0.24 and 1.7 milliseconds per data
item. Note that running times may increase significantly when partial or-
ders have specific topological properties. In particular, optimizations have
to be studied when dealing with partial orders having values with numerous
children (hubs). For instance, the partial order used for the birthPlace pred-
icate contains the value ”Settlement” with 160 thousands children; running
times using this partial order were 0.02 and 0.8 seconds per data item for
TSbC and TSaC respectively. The source code and datasets associated with
this study are open-sourced and published on the Web at the following link
www.github.com/lgi2p/TDSelection. Note that for Sums and AdaptedSums,
we used the code developed by [10]. Otherwise, for the experiments related
to the other existing models, we used the DAFNA-EA5 implementation [37].
This API provides the source code for the main existing models.

5.3. Evaluation

The evaluation of the model we proposed to select the true values was
carried out using both traditional and hierarchical performance measures of
classification problems.

4Note that this codebase has been developed for experimental purpose and was not
optimized to lower memory consumption and running time.

5www.github.com/daqcri/DAFNA-EA

21


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Among traditional metrics, precision and recall were mainly used to compare
our approach with the existing models that do not consider the partial order.
Our positive class consists of all pairs (d ∈ D,v∗d ∈ Vd) where v∗d is the value
contained in the ground truth for the data item d, and the negative class is
composed of all pairs (d ∈ D,vd ∈ Vd − v∗d). Therefore, the precision is the
proportion of pairs (d,v∗d) returned by the approach among all the pairs it
returns. The recall is the proportion of pairs (d,v∗d) returned by the approach
among all pairs contained in the ground truth.
The hierarchical evaluation measures (HEM) were used to analyse the be-
haviour obtained by different parameter settings of our approach. Indeed,
hierarchical metrics distinguish the severity of different errors taking the
hierarchy of classes into account. Reasonably if Malaga is the true value,
then an approach that returns Portugal should be less penalized than an-
other that returns Brazil. Indeed Portugal is in the same continent than
Malaga, i.e. Europe, while Brasil is in a different continent, i.e. America. A
detailed study related to hierarchical measures was presented in [38]. They
distinguish the main dimensions that characterize hierarchical classification
problems and suggest, for each possible combination, which are the best
evaluation metrics to use. They recommend FLCA,PLCA,RLCA and MGIA
when dealing with single-label problem and DAG hierarchy. This situation
corresponds to our initial problem settings: for each data item there is a
single expected true value and our partial order among values is represented
using a DAG. FLCA,PLCA and RLCA are set-based measures. They use hi-
erarchical relations to augment the sets of returned and true values and to
compute precision and recall. Since adding ancestors over-penalize errors
that occur to nodes with many of them, FLCA,PLCA,RLCA use the notion
of the Lowest Common Ancestor to limit this undesirable effect. MGIA is
a pair-based metric that uses graph distance measures to compare returned
and true values. Its limitation is that it does not change with depth. For
further details related to the computation of these measure please refer to
[38]. Now, we briefly describe the main characteristics of these hierarchical
measures through an illustrative example. This enable the reader to better
understand the result discussion in the next section. Considering the DAG
in Fig. 1 and Malaga as the true value, the HEMs related to several returned
values are reported in Table 4. As shown, if the returned value is more gen-
eral than the expected one, then PLCA is not affected, while RLCA decreases
when increasing the distance from the expected value. Otherwise, if the re-
turned value is an error (neither the expected value nor more general one),

22


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Table 4: Example of HEM considering the DAG in Fig. 1 and Malaga as the true value.

Returned value PLCA RLCA FLCA MGIA

Malaga 1 1 1 1

Spain 1 0.5 0.7 0.9

Country 1 0.3 0.5 0.8

Madrid 0.5 0.5 0.5 0.8

France 0.5 0.3 0.4 0.7

Figure 5: Recall obtained by applying our approaches TSbCIC (dotted line) and TSaCtrust
(dashed line), both with k = 1 and θ = 0, and the models provided by DAFNA API (solid
lines) on the synthetic datasets.

then PLCA and RLCA decrease w.r.t. the position of the returned value in the
partial order. MGIA indicates the distance among the returned value and
the expected one without considering if one value is more general or specific
than the other.

Figure 6: Recall obtained by applying our approach and the proposed models (with θ = 0)
on the synthetic datasets w.r.t. the dataset type and number of returned values.

23


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

6. Results

All of the experimental settings presented in Section 5 were tested. Here,
the results are presented and discussed. Note that a robust analysis were
conducted given the artificial nature of the synthetic datasets.
Results show that our approach enables successfully addressing the problem
of selecting true values. Recall that our study considers a setting where
value confidence estimations w.r.t. the partial order of values monotonically
increases. The most effective configuration settings of our selection proce-
dure were TSaCtrust and TSbCIC as shown in Fig. 6. These settings coupled
with the AdaptedSums model were able to outperform, in terms of recall, ex-
isting truth discovery methods on the different datasets and predicates that
were used for the experiments, see Fig. 5. Note that in these experiments
we compared our post-processing strategies considering k = 1 with the other
models. Indeed, the general aim of TD is to return a single answer for each
data item.
In the rest of this section we detail the comparison of the proposed approach
with existing truth discovery models and we study different configuration
settings of the post-processing procedure analysing its behaviour considering
different k, δ and θ values.
Both TSaCtrust and TSbCIC obtained good performance, but TSaCtrust was
the most robust approach independently of the predicate and dataset type,
as shown in Fig. 5. It resulted to be only slightly influenced by source dis-
agreement increase (UNI dataset case). Indeed, TSaCtrust aimed to analyse
and compare the trustworthiness of sources providing the most specific val-
ues that do not share partial order relationships. This was done selecting
and returning all provided values higher than θ, i.e. δ = 1. Then ranking the
values according to the weighted average trustworthiness of sources claiming
them. Finally, filtering the first k values that did not share ordering rela-
tionships. Following this post-processing procedure, TSaCtrust performance
was not affected when the number of sources providing true general val-
ues increased (UNI dataset). Precisely, analysing the recall obtained by the
different models from EXP to UNI dataset types, we observed that, when
increasing sources that provided general true values, TSaCtrust had a recall
drop equal to 0.073 against a recall drop around 0.528 obtained by exist-
ing truth discovery models. Indeed, the average recall, over the different
predicates, obtained by TSaCtrust was 0.954, 0.912 and 0.881 respectively
for EXP, LOW E and UNI dataset types. The average recall achieved by

24


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

existing truth discovery models was 0.595, 0.243 and 0.067 respectively for
EXP, LOW E and UNI dataset types.
On the contrary TSbCIC performance was more influenced by source dis-
agreement increase than TSaCtrust performance. It is the post-processing
strategy that employed the greedy algorithm to select the true value, i.e. at
each step the selection phase chooses the values with the highest confidence.
Then it ordered them w.r.t. their IC. Finally, it kept only values that shared
a partial order. Therefore, it used as selection criterion, at each step, the
value confidence. When sources provided more general true values the infor-
mation associated to these claims were propagated to less values. Thus the
confidence estimations were less informative in the last steps of the proce-
dure. Anyway, also TSbCIC outperformed existing methods obtaining recall
levels that were equal to 0.889, 0.670 and 0.531 for EXP, LOW E and UNI
dataset respectively (thus with a recall drop of 0.358).
Observing Fig. 5 we analysed for which predicates our approaches, TSaCtrust
and TSbCIC, obtained slightly lower performances. Even in these cases our
models stills outperformed existing ones.
Considering TSaCtrust, the worst recall performance were achieved for birth-
Place and BP predicate. Analysing the features shown in Table 2 related to
the different predicate partial order, it is clear that this configuration setting
was influenced by the average number of children in the partial order. Indeed
birthPlace and BP were the two predicates with the highest children average
number. Moreover, the ranking of predicates w.r.t. their recall corresponded
to the predicate ranking w.r.t. the children average number in decreasing
order.
Otherwise, when considering TSbCIC approach the worst performance in
terms of recall were obtained considering genre and BP predicate. We found
out that TSbCIC performance depended both on the children average num-
ber and the average depth of expected solutions w.r.t. the maximum depth.
Indeed, at each step of TSbCIC the probability of error is related to the
number of alternatives among which the procedure can select a value. More-
over, it also related to the percentage of the partial order that the selection
procedure has to traverse in order to reach the expected solutions w.r.t. the
maximum depth. The probability of error increased when the part of the
graph to traverse augmented. For instance genre predicate had the lowest
children average number, but it obtained performance lower than MF, CC
and birthPlace predicate. This because its expected values had a depth that
required to traverse a bigger part of the partial order than in the other cases.

25


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

To better understand the best parametrization for the post-processing proce-
dure several experiments were conducted w.r.t. the different settings reported
in Table 3.
First of all, we compared the different post-processing strategies we proposed,
evaluating the recall at different levels of k. The results are reported in Fig. 6.
Note that we show the results for the predicates genre and MF, but a similar
behaviour was obtained with all the others.
We observe that the best results were obtained by the TSaCtrust for any k
value. It took advantage of the fact that it returned a set of alternatives as
different as possible from each other and, at the same time, as specific as
possible. Usually TSbCIC also outperformed the baseline model (Sums), but
for higher values of k it was worse than Sums. This is because we forced the
result of TSbCIC to share ordered relationships, while in the case of Sums, k
values with the highest confidence were returned (no additional filter was ap-
plied on these values). Note that the recall of TSbCIC did not improve when
increasing the value of k. This means that a situation in which a returned
value is more specific than the expected one never occurs. This is in accor-
dance with the policy we adopted to generate the synthetic datasets. Given
the expected value, we cannot say anything about its descendants. Each of
them may be a true specification of the expected truth or not. Consequently,
we removed all of the descendants from the set of possible true and false
values. In other words, no sources provide a claim that contains one of the
descendants of the expected value associated with the considered data item.
Otherwise, in all the other configurations, increasing the number of values
returned (k) enhanced the recall.
The TSaCIC and TSbCtrust configurations were for the majority of cases
worse than those of the baseline approaches. TSaCIC consists of the selec-
tion strategy with δ = 1, i.e. all provided values having confidence higher
than θ are selected, and the use of IC as first ranking criterion. It obtained
low performance because ICSeco was not a good discriminator among values
that did not share ordering relationships. Indeed it is based on the number of
descendant values and it may happen in situations in which x is the expected
value and y has the same father as x. If x has descendants, while y has none,
y will be preferred by the ranking based on the ICSeco even if it is not a true
value. Thus, the WAtrust ranking is more suitable in these cases.
Otherwise TSbCtrust is a post-processing strategy with δ = 0, i.e. at each
step of the selection process only one value is selected, with the use of source
average as ranking criterion. Obtaining low recall for this model means that

26


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

T
a
b

le
5
:

H
E

M
o
b

ta
in

ed
fo

r
th

e
d

iff
er

en
t

p
re

d
ic

a
te

s
w

.r
.t

.
th

e
m

o
d

el
a
n

d
th

e
th

re
sh

o
ld
θ

co
n

si
d

er
ed

.

M
o
d

e
l

T
S
b
C
I
C

T
S
a
C
T
R
U
S
T

P
r
e
d

ic
a
te

H
E

M
θ

θ
0

0
.1

0
.2

0
.3

0
.4

0
.5

0
0
.1

0
.2

0
.3

0
.4

0
.5

C
C

F
L
C
A

0
.8

3
6

0
.8

2
6

0
.7

7
0

0
.6

9
4

0
.6

1
7

0
.5

6
1

0
.9

5
8

0
.8

9
0

0
.7

8
3

0
.6

9
0

0
.6

1
3

0
.5

6
0

P
L
C
A

0
.8

2
4

0
.8

7
4

0
.9

4
3

0
.9

8
6

0
.9

9
1

0
.9

8
8

0
.9

5
9

0
.9

5
4

0
.9

6
7

0
.9

8
5

0
.9

8
9

0
.9

8
9

R
L
C
A

0
.8

6
2

0
.8

1
2

0
.6

9
3

0
.5

6
8

0
.4

6
9

0
.4

0
7

0
.9

5
9

0
.8

6
1

0
.7

0
1

0
.5

6
3

0
.4

6
5

0
.4

0
6

M
G

IA
0
.8

7
9

0
.9

1
0

0
.9

0
7

0
.8

9
0

0
.8

5
5

0
.8

1
8

0
.9

6
3

0
.9

4
5

0
.9

1
7

0
.8

8
7

0
.8

5
1

0
.8

1
6

M
F

F
L
C
A

0
.8

7
8

0
.8

6
5

0
.8

0
0

0
.6

9
7

0
.6

3
7

0
.5

7
2

0
.9

6
2

0
.9

1
4

0
.8

0
7

0
.6

9
5

0
.6

3
6

0
.5

7
2

P
L
C
A

0
.8

7
0

0
.9

0
7

0
.9

6
0

0
.9

9
0

0
.9

9
4

0
.9

9
4

0
.9

6
4

0
.9

6
5

0
.9

7
1

0
.9

8
9

0
.9

9
4

0
.9

9
4

R
L
C
A

0
.8

9
8

0
.8

5
0

0
.7

2
9

0
.5

6
8

0
.4

9
2

0
.4

1
4

0
.9

6
3

0
.8

9
3

0
.7

3
4

0
.5

6
7

0
.4

9
1

0
.4

1
4

M
G

IA
0
.9

0
9

0
.9

3
7

0
.9

2
6

0
.8

9
2

0
.8

6
2

0
.8

2
4

0
.9

6
6

0
.9

5
8

0
.9

2
8

0
.8

9
0

0
.8

6
0

0
.8

2
4

B
P

F
L
C
A

0
.7

4
5

0
.6

8
9

0
.6

2
0

0
.5

4
0

0
.4

8
4

0
.4

3
8

0
.8

8
1

0
.7

2
5

0
.6

0
7

0
.5

2
7

0
.4

7
7

0
.4

3
6

P
L
C
A

0
.7

3
2

0
.8

5
9

0
.9

5
7

0
.9

7
9

0
.9

7
6

0
.9

6
8

0
.8

8
6

0
.9

3
5

0
.9

6
3

0
.9

7
6

0
.9

7
4

0
.9

6
7

R
L
C
A

0
.7

8
3

0
.6

2
4

0
.4

9
4

0
.3

9
1

0
.3

3
5

0
.2

9
3

0
.8

8
2

0
.6

4
2

0
.4

8
1

0
.3

7
9

0
.3

2
9

0
.2

9
1

M
G

IA
0
.7

9
2

0
.8

5
3

0
.8

3
6

0
.7

7
4

0
.7

0
7

0
.6

4
1

0
.8

8
1

0
.8

6
5

0
.8

1
5

0
.7

5
4

0
.6

9
6

0
.6

3
5

b
ir

th
P

la
ce

F
L
C
A

0
.7

9
1

0
.7

7
3

0
.7

0
9

0
.6

4
0

0
.5

8
7

0
.5

3
2

0
.9

4
6

0
.8

5
5

0
.7

1
3

0
.6

2
7

0
.5

7
6

0
.5

3
0

P
L
C
A

0
.7

8
8

0
.8

4
1

0
.9

3
6

0
.9

8
8

0
.9

9
3

0
.9

9
0

0
.9

4
8

0
.9

4
1

0
.9

5
3

0
.9

8
8

0
.9

9
1

0
.9

8
9

R
L
C
A

0
.8

0
0

0
.7

4
4

0
.6

0
1

0
.4

8
3

0
.4

2
4

0
.3

7
2

0
.9

4
6

0
.8

1
3

0
.6

0
2

0
.4

6
9

0
.4

1
4

0
.3

6
9

M
G

IA
0
.9

0
9

0
.9

1
2

0
.8

9
7

0
.8

7
7

0
.8

4
5

0
.8

0
7

0
.9

6
8

0
.9

4
8

0
.9

0
0

0
.8

6
9

0
.8

3
8

0
.8

0
5

g
e
n

re

F
L
C
A

0
.7

8
4

0
.7

7
5

0
.7

0
8

0
.6

5
7

0
.6

1
7

0
.5

7
1

0
.9

6
3

0
.9

3
0

0
.7

2
9

0
.6

5
7

0
.6

1
7

0
.5

7
1

P
L
C
A

0
.7

8
1

0
.7

9
1

0
.8

5
5

0
.9

7
9

0
.9

9
5

0
.9

9
7

0
.9

6
6

0
.9

5
2

0
.8

7
8

0
.9

8
0

0
.9

9
4

0
.9

9
7

R
L
C
A

0
.7

9
3

0
.7

7
4

0
.6

4
1

0
.5

0
5

0
.4

5
4

0
.4

0
9

0
.9

6
2

0
.9

2
0

0
.6

6
0

0
.5

0
5

0
.4

5
4

0
.4

0
9

M
G

IA
0
.9

0
3

0
.9

0
4

0
.8

8
9

0
.8

8
7

0
.8

6
7

0
.8

3
3

0
.9

7
4

0
.9

6
7

0
.8

9
7

0
.8

8
7

0
.8

6
7

0
.8

3
3

27


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

WAtrust was not a good discriminator to rank the values sharing partial order
relationships returned by the selection phase.
Moreover, Fig. 6 shows that when disagreement among sources providing
true values increased these two latter approaches (TSaCIC and TSbCtrust)
could be useful anyway. The recall they obtained for k = 1 was higher than
the recall of Sums model. Therefore in case of high level of disagreement also
a not optimal procedure can be advantageous.
As expected, in all the cases, the precision always decrease when increasing k.
Moreover, comparing the different settings of the proposed approach, we ob-
served that the ranking based on their precision performances was the same
that the one obtained w.r.t. their recall. Therefore, we omit these repetitive
results.
Our further analysis focused on models TSaCtrust and TSbCIC since they
were the models among the proposed ones that achieved the best perfor-
mances. We examined the impact of different threshold values, setting k = 1,
w.r.t. the hierarchical evaluation metrics: FLCA,PLCA,RLCA and MGIA.
The results are reported in Table 5. Considering TSbCIC, we noticed that,
when slightly increasing θ, MGIA increased in the majority of the cases.
This occurred because there are expected values (supported by few reliable
sources) with a confidence lower than false ones (supported by many unreli-
able sources), even though the former have a higher WAtrust than the latter.
Thus, using TSbCIC and θ = 0, these values were selected as true values.
Increasing θ allows the procedure to avoid a part of these errors. Indeed,
eliminating the values with confidence score very low enables the procedure
to return, with high probability, the father of the expected value. Anyway,
further increasing the threshold caused a loss of MGIA because the returned
values result to be very general. This does not happen with TSaCtrust since
this kind of errors are already overcome considering WAtrust as first ranking
criterion.
Moreover, we observed that, in the majority of cases, when increasing θ
the RLCA always decreased, while the PLCA always increased. Precisely, the
highest RLCA for both TSaCtrust and TSbCIC was obtained with θ = 0. The
highest PLCA was obtained for both approaches with different θ values de-
pending on the predicate as shown in Table 5.
Summarising, the most effective configuration settings were TSaCtrust and
TSbCIC. They were both able to obtain better performance than existing
truth discovery models. We noted that increasing the number of values re-
turned for each data item allow increasing the performance. Nevertheless

28


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

this can be applied only in the case where a group of experts can select the
true values among the ones proposed by the proposed approach for each data
item. Otherwise, we have to force the parametrization k = 1. Regarding the
threshold θ, a high θ value is recommended when the application scenario
does not permit to assume many risks. In this case it is important to have a
high precision. In other words, obtaining a general true value than the spe-
cific false one is preferred. Therefore, the different parameter settings of the
proposed post-processing procedure allow dealing with different application
scenarios taking their requirements into account.

7. Conclusion

In this paper, we have presented a post-processing procedure able to
select true values after estimation of the value confidences using the Adapt-
edSums approach we proposed in our previous work. This general procedure
can be used with any TD approach when partial order of values is taken
into account as a priori knowledge. The post-processing process involves
three main steps. The first one consists of the selection procedure. It aims
to identify the set of possible true values using relationships among them
and includes two parameters (δ and θ). Based on their tuning, different be-
haviours of the selection process can be obtained. The second step ranks
the returned values of the selection phase. Finally, the third step permits
to filter the top k values and ensure desirable properties (values that share
or not relationships). The results confirmed our preliminary finding: using
partial ordering of values helps to improve both source trustworthiness esti-
mation, as already demonstrated by our preliminary study [10], and the true
value identification. More precisely, the best results are obtained with the
configuration of the algorithm that selects a set of alternatives, not sharing
ordering relationships, and ranks them through the average trustworthiness
of sources claiming those values. The results showed a similar behaviour on
the datasets obtained by the two different ontologies (DBpedia and Gene
Ontology).
As prospects, we envisage to incorporate our framework by adapting it to
another existing model. Indeed we would like to show the flexibility of our
approach and further enhance the results. Moreover, we intend to explore
other kinds of additional information such as correlations among data items
and values. For instance, usually the birth location of people is correlated
with the language they speak. Therefore, if we know that a person speaks

29


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Italian, we can increase the confidence in those claims that contain Italy as
value for the bornIn predicate. More precisely, we plan to design a model
that integrates, into existing approaches, information extracted from exter-
nal knowledge bases in the form of rules. The idea is to add, in the confidence
formula, a boosting factor indicating the confidence level of each claim ac-
cording to the external knowledge base.

References

[1] J. Bleiholder, F. Naumann, Data fusion, ACM Computing Surveys
(CSUR) 41 (1) (2009) 1.

[2] C. Li, V. S. Sheng, L. Jiang, H. Li, Noise filtering to improve
data and model quality for crowdsourcing, Knowledge-Based
Systems 107 (Supplement C) (2016) 96 – 103. doi:https:
//doi.org/10.1016/j.knosys.2016.06.003.
URL http://www.sciencedirect.com/science/article/pii/
S0950705116301666

[3] Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, J. Han, A survey
on truth discovery, SIGKDD Explorations 17 (2) (2015) 1–16.

[4] T. Knap, J. Michelfeit, M. Necaskỳ, Linked open data aggregation: Con-
flict resolution and aggregate quality, Computer Software and Applica-
tions Conference Workshops (COMPSACW), 2012 IEEE 36th Annual
(2012) 106–111.

[5] A. Guzman-Arenas, A.-D. Cuevas, A. Jimenez, The centroid or
consensus of a set of objects with qualitative attributes, Ex-
pert Systems with Applications 38 (5) (2011) 4908 – 4919.
doi:https://doi.org/10.1016/j.eswa.2010.09.169.
URL http://www.sciencedirect.com/science/article/pii/
S0957417410011267

[6] L. Berti-Equille, J. Borge-Holthoefer, Veracity of data: From truth dis-
covery computation algorithms to models of misinformation dynamics,
Synthesis Lectures on Data Management 7 (3) (2015) 1–155.

[7] D. Wang, T. Abdelzaher, L. Kaplan, Social sensing: building reliable
systems on unreliable data, Morgan Kaufmann, 2015.

30


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

[8] G. Zhou, J. Zhao, T. He, W. Wu, An empirical study of topic-
sensitive probabilistic model for expert finding in question an-
swer communities, Knowledge-Based Systems 66 (2014) 136 – 145.
doi:https://doi.org/10.1016/j.knosys.2014.04.032.
URL http://www.sciencedirect.com/science/article/pii/
S0950705114001543

[9] B. Khaleghi, A. Khamis, F. O. Karray, S. N. Razavi, Multisensor data
fusion: A review of the state-of-the-art, Information Fusion 14 (1)
(2013) 28 – 44. doi:https://doi.org/10.1016/j.inffus.2011.08.001.
URL http://www.sciencedirect.com/science/article/pii/
S1566253511000558

[10] V. Beretta, S. Harispe, S. Ranwez, I. Mougenot, How can ontologies
give you clue for truth-discovery? an exploratory study, in: Proceedings
of the 6th International Conference on Web Intelligence, Mining and
Semantics, WIMS 2016, Nı̂mes, France, June 13-15, 2016.

[11] X. Yin, J. Han, S. Y. Philip, Truth discovery with multiple conflicting
information providers on the web, IEEE Transactions on Knowledge and
Data Engineering 20 (6) (2008) 796–808.

[12] R. W. Ouyang, L. M. Kaplan, A. Toniolo, M. Srivastava, T. J. Norman,
Aggregating crowdsourced quantitative claims: Additive and multiplica-
tive models, IEEE Transactions on Knowledge and Data Engineering
28 (7) (2016) 1621–1634.

[13] R. W. Ouyang, M. Srivastava, A. Toniolo, T. J. Norman, Truth discov-
ery in crowdsourced detection of spatial events, IEEE Transactions on
Knowledge and Data Engineering 28 (4) (2016) 1047–1060.

[14] J. Pasternack, D. Roth, Latent credibility analysis, in: Proceedings of
the 22nd international conference on World Wide Web, ACM, 2013, pp.
1009–1020.

[15] B. Zhao, B. I. Rubinstein, J. Gemmell, J. Han, A bayesian approach to
discovering truth from conflicting sources for data integration, Proceed-
ings of the VLDB Endowment 5 (6) (2012) 550–561.

[16] X. Yin, W. Tan, Semi-supervised truth discovery, Proceedings of the
20th international conference on World wide web (2011) 217–226.

31


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

[17] R. Pochampally, A. Das Sarma, X. L. Dong, A. Meliou, D. Srivastava,
Fusing data with correlations, Proceedings of the 2014 ACM SIGMOD
international conference on Management of data - SIGMOD ’14 (2014)
433–444arXiv:1503.00306, doi:10.1145/2588555.2593674.

[18] G.-J. Qi, C. C. Aggarwal, J. Han, T. Huang, Mining collective intelli-
gence in diverse groups, Proceedings of the 22nd International Confer-
ence on World Wide Web, WWW ’13 (2013) 1041–1052.

[19] X. Wang, Q. Z. Sheng, X. S. Fang, L. Yao, X. Xu, X. Li, An integrated
bayesian approach for effective multi-truth discovery, Proceedings of the
24th ACM International on Conference on Information and Knowledge
Management, CIKM ’15 (2015) 493–502.

[20] C. Meng, W. Jiang, Y. Li, J. Gao, L. Su, H. Ding, Y. Cheng, Truth
discovery on crowd sensing of correlated entities, Proceedings of the 13th
ACM Conference on Embedded Networked Sensor Systems, SenSys ’15
(2015) 169–182.

[21] S. Wang, L. Su, S. Li, S. Hu, T. Amin, H. Wang, S. Yao, L. Kaplan,
T. Abdelzaher, Scalable social sensing of interdependent phenomena,
Proceedings of the 14th International Conference on Information Pro-
cessing in Sensor Networks, IPSN ’15 (2015) 202–213.

[22] X. L. Dong, L. Berti-Equille, D. Srivastava, Integrating conflicting data:
the role of source dependence, Proceedings of the VLDB Endowment
2 (1) (2009) 550–561.

[23] L. Berti-Equille, A. D. Sarma, Xin, Dong, A. Marian, D. Srivastava,
Sailing the Information Ocean with Awareness of Currents: Discovery
and Application of Source Dependence, CIDRarXiv:0909.1776.

[24] X. L. Dong, L. Berti-Equille, D. Srivastava, Truth Discovery and Copy-
ing Detection in a Dynamic World, Vldb 2 (1) (2009) 562–573.

[25] D. Wang, T. Abdelzaher, L. Kaplan, R. Ganti, S. Hu, H. Liu, Exploita-
tion of physical constraints for reliable social sensing, Proceedings of
the 2013 IEEE 34th Real-Time Systems Symposium, RTSS ’13 (2013)
212–223.

32


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

[26] S. Wang, D. Wang, L. Su, L. Kaplan, T. F. Abdelzaher, Towards cyber-
physical systems in social spaces: The data reliability challenge, Real-
Time Systems Symposium (RTSS), 2014 IEEE (2014) 74–85.

[27] A. Bronselaer, M. Szymczak, S. Zadrony, G. D. Tr, Dynamical order
construction in data fusion, Information Fusion 27 (Supplement C)
(2016) 1 – 18. doi:https://doi.org/10.1016/j.inffus.2015.05.001.
URL http://www.sciencedirect.com/science/article/pii/
S1566253515000391

[28] D. C. Kozen, The Design and Analysis of Algorithms, Springer-Verlag
New York, Inc., 1992.

[29] A. V. Aho, M. R. Garey, J. D. Ullman, The transitive reduction of a
directed graph, SIAM J. Comput. 1 (2) (1972) 131–137.

[30] N. Seco, T. Veale, J. Hayes, An intrinsic information content metric
for semantic similarity in wordnet, Proceedings of the 16th European
conference on artificial intelligence (2004) 1089–1090.

[31] S. Harispe, S. Ranwez, S. Janaqi, J. Montmain, Semantic similarity from
natural language and ontology analysis, Synthesis Lectures on Human
Language Technologies 8 (1) (2015) 1–254.

[32] P.-A. Jean, S. Harispe, S. Ranwez, P. Bellot, J. Montmain, Uncertainty
detection in natural language: a probabilistic model, in: Proceedings
of the 6th International Conference on Web Intelligence, Mining and
Semantics, WIMS 2016, Nı̂mes, France, June 13-15, 2016.

[33] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives,
Dbpedia: A nucleus for a web of open data, in: Proceedings of the 6th
International The Semantic Web and 2Nd Asian Conference on Asian Se-
mantic Web Conference, ISWC’07/ASWC’07, Springer-Verlag, Berlin,
Heidelberg, 2007.

[34] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M.
Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, et al.,
Gene ontology: tool for the unification of biology, Nature genetics 25 (1)
(2000) 25–29.

33


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

[35] Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, J. Han,
A confidence-aware approach for truth discovery on long-tail data, Pro-
ceedings of the VLDB Endowment 8 (4) (2014) 425–436.

[36] J. Pasternack, D. Roth, Knowing what to believe (when you already
know something), Proceedings of the 23rd International Conference on
Computational Linguistics, COLING ’10 (2010) 877–885.

[37] D. A. Waguih, L. Berti-Equille, Truth discovery algorithms: An experi-
mental evaluation, CoRR abs/1409.6428.

[38] A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androut-
sopoulos, Evaluation measures for hierarchical classification: a unified
view and novel approaches, Data Mining and Knowledge Discovery
29 (3) (2015) 820–865.

34