This may be the author’s version of a work that was submitted/accepted
for publication in the following source:

Pitchforth, Jay & Mengersen, Kerrie
(2013)
A proposed validation framework for expert elicited Bayesian Networks.
Expert Systems with Applications, 40(1), pp. 162-167.

This file was downloaded from: https://eprints.qut.edu.au/52041/

c© Consult author(s) regarding copyright matters

This work is covered by copyright. Unless the document is being made available under a
Creative Commons Licence, you must assume that re-use is limited to personal use and
that permission from the copyright owner must be obtained for all other uses. If the docu-
ment is available under a Creative Commons License (or other specified license) then refer
to the Licence for details of permitted re-use. It is a condition of access that users recog-
nise and abide by the legal requirements associated with these rights. If you believe that
this work infringes copyright please provide details by email to qut.copyright@qut.edu.au

License: Creative Commons: Attribution-Noncommercial-No Derivative
Works 2.5

Notice: Please note that this document may not be the Version of Record
(i.e. published version) of the work. Author manuscript versions (as Sub-
mitted for peer review or as Accepted for publication after peer review) can
be identified by an absence of publisher branding and/or typeset appear-
ance. If there is any doubt, please refer to the published source.

https://doi.org/10.1016/j.eswa.2012.07.026

https://eprints.qut.edu.au/view/person/Pitchforth,_Jay.html
https://eprints.qut.edu.au/view/person/Mengersen,_Kerrie.html
https://eprints.qut.edu.au/52041/
https://doi.org/10.1016/j.eswa.2012.07.026


A proposed validation framework for expert elicited

Bayesian Networks

Jegar PitchforthI, Kerrie Mengersen

Queensland University of Technology

Abstract

The popularity of Bayesian Network modelling of complex domains using ex-

pert elicitation has raised questions of how one might validate such a model

given that no objective dataset exists for the model. Past attempts at delin-

eating a set of tests for establishing confidence in an entirely expert-elicited

model have focused on single types of validity stemming from individual

sources of uncertainty within the model. This paper seeks to extend the

frameworks proposed by earlier researchers by drawing upon other disciplines

where measuring latent variables is also an issue. We demonstrate that even

in cases where no data exist at all there is a broad range of validity tests

that can be used to establish confidence in the validity of a Bayesian Belief

Network.

Keywords: expert, validation, bayesian network, sensitivity

IPhone: +61 403 961 878
Email address: jegar.pitchforth@qut.edu.au ()

1Lvl 11, 126 Margaret St. , Brisbane 4000

Preprint submitted to Expert Systems with Applications May 26, 2012


1. Introduction1

Bayesian Networks (BNs) are an increasingly popular tool for modelling2

complex systems, particularly in the absence of easily accessed data. A BN3

describes the joint probability distribution of a network of factors using a4

Directed Acyclic Graph (Pearl, 1988). Factors that influence the likelihood5

of the outcome node being in any given state are represented as nodes on6

the graph. If the state of one model factor influences the state of another a7

directional arc is drawn between the two nodes representing these factors in8

the model. The combination of the nodes and their relationships is the BN9

structure. Each node in the graph can adopt any one of a finite set of states.10

For example, a factor representing magnitude could be classified as ’high’ or11

’low’. While nodes do not strictly have to be discretised the practice is by far12

more commonly undertaken than not due to its computational convenience,13

and as such we do not discuss models that include non-discretised nodes in14

this paper. Finally, each node and relationship between nodes is quantified15

according to the likelihood of the node adopting a given state. In the case16

of input nodes these probabilities are seen as unconditional, whereas nodes17

internal to the model are dependent upon the states of the preceding nodes.18

The strength and direction of the relationship between model factors is de-19

fined in the conditional probability table associated with the child node.20

BNs are often created through a process of expert elicitation, in which ex-21

perts are asked to create a complex systems model by giving their opinions22

on the model structure, discretisation, and parameterisation. The validity23

of these models is generally tested through one of two procedures: by com-24

paring the model predictions to data available for the subject matter, or by25

2


asking the experts who contributed to the model creation to comment on its26

accuracy. This paper argues that these tests are limited in their ability to27

accurately test the validity of BNs, and presents a framework for more thor-28

ough validity testing. The work presented here stems from questions raised29

during the creation of a BN from expert elicitation to model the inbound30

passenger processing time at Australian airports. The network was elicited31

in collaboration with managerial and operational experts from Australian32

Customs and Border Protection Service (ACBPS) for the purpose of gaining33

more informative reporting of key performance indicators. In particular, the34

modelling of critical infrastructure underlined the importance of establishing35

that both experts and modellers have confidence in the final model produced.36

The paper is structured as follows. First, the concept of validation as it ap-37

plies to BNs is introduced in section 1.1. Second, the sources of confidence38

in BN validity are discussed, including network structure, discretisation, and39

parameterisation in section 1.2. Third, prior approaches to validating latent40

and expert elicited scales and models are introduced, drawing from psycho-41

metrics, system dynamics and other BN research in section sec:prevapproach.42

These principles are then applied to BNs with examples from the airport in-43

bound passenger processing model in section 3.44

1.1. Confidence in Bayesian Belief Network validity45

Model validity is often conceptualised as a simple test of a model’s fit46

with a set of data. However validity is a much broader construct: in essence,47

validity is the ability of a model to describe the system that it is intended48

to describe both in the output and in the mechanism by which that output49

is generated. In this paper we consider this broader definition of validity.50

3


The need for an explicit set of validity tests for BNs over and above com-51

parisons with data is clear. In current practice, where data are available on52

the phenomenon of interest, these data may be used to validate model pre-53

dictions. Several tests of this nature exist, such as a variety of Normal Max-54

imum Likelihood model selection criteria (Silander et al., 2009). However, a55

common reason for using BN models is a lack of available data. Examples56

of phenomena for which data are scarce include population characteristics57

in many developing countries (Shakoor et al., 1997), global epidemiological58

phenomena (Masoli et al., 2004), organised crime (Sobel and Osoba, 2009),59

conservation (Johnson, 2009) and biosecurity risk analysis (Barrett et al.,60

2010). In such cases, expert opinion can be elicited to create a Bayesian61

Belief Network (BBN). A common technique for validating BBNs based on62

expert opinion in the absence of data, is simply to ask the experts whether63

they agree with the model structure, discretisation, and parameterisation64

(see Korb and Nicholson (2010) for an excellent overview of BN applications65

and methods). This simple test is necessary, but not sufficient, to indepen-66

dently verify the validity of a complex model. Even where data are available,67

model fit is only a part of the model’s overall validity. These considerations68

lead to this paper’s proposition of a general validity framework for BNs.69

1.2. Sources of confidence in Bayesian Network validity70

In order to approach a validation framework for BNs, a short discussion of71

the background assumptions of this framework is required. First, we assume72

there exists a latent, unobservable ’true’ model (or set of acceptable ’true’73

models) for the phenomenon of interest against which the expert elicited74

model can be compared. Second, for the purposes of the validity framework75

4


presented in this paper, we consider a BN model to consist of four elements:76

model structure (section 1.2.1), node discretisation(section 1.2.2), and dis-77

crete state parameterisation(section 1.2.3). Each of these elements has been78

raised as a source of uncertainty in BN modelling. We provide a discussion of79

each element and consider the importance of validity within each model ele-80

ment, and within the model as a whole. The model elements are summarised81

in figure 1.

Figure 1: Sources of confidence in Bayesian Network validity

82

5


1.2.1. Structure83

There are a number of questions when creating the structure of a BN. The84

first is the appropriate number of nodes to include which is a question of the85

modelling domain, level and scope. It is widely acknowledged that networks86

with a large number of nodes can easily become computationally intractable,87

as can networks with a large number of arcs between nodes (Koller and88

Pfeffer, 1997). The BN creator should ensure that the model is neither too89

simple nor too complex in its explanation of the system.90

1.2.2. Discretisation91

The discretisation process allows us to model systems probabilistically92

by taking continuous factors and assigning them intervals, ordinal states or93

categories, then modelling over the discrete domain. In more recent research,94

Uusitalo (2007) pointed out that such discretisation is a major disadvantage95

of BN modelling if it is necessary for the model, and Myllymaki et al. (2002)96

outlines how the process has the potential to destroy useful information.97

Given the information loss inherent in the discretisation process, ensuring98

that the states are a valid interpretation of the state space of the node is99

critical for a defensible network.100

1.2.3. Parameterisation101

Parameterisation refers to adding the values elicited from experts to the102

belief network (Woodberry et al., 2005). Much work has been conducted103

on controlling this stage of the process (Renooij, 2001), but little has been104

written about how to validate expert responses post-elicitation.105

6


1.2.4. Model Behaviour106

Finally, the behaviour of the model can be seen as the joint likelihood of107

the entire network as well as its sub-networks and relationships, hence con-108

fidence in model behaviour is founded upon the validity of the other three109

dimensions of the model. It is important to note that in the case of BNs,110

we are not only interested in whether the model can tell us what a system111

is doing under certain conditions, but also the factors and relationships that112

bring about this behaviour. This makes the problem of validating the model113

incredibly complex when attempted wholesale and justifies the need for par-114

titioning the dimensions of uncertainty for BNs. As such it is recommended115

that the structure, discretisation and parameterisation are tested for validity116

before any model behaviour tests can be run.117

2. Previous approaches to validity118

2.1. Psychometrics119

The discipline of psychometrics arose as a counterpart to the field of psy-120

chology, which at its foundation attempts to measure latent, unobserved,121

’true’ variables such as intelligence. Due to this rich tradition, the founda-122

tions of measurement validation in psychometry are particularly solid, and123

serve as a useful base to begin discussion of a similar framework for BNs.124

Psychometrics first identified four types of validity (Cronbach and Meehl,125

1955); more recent research has reclassified and added dimensions of valid-126

ity to establish a full validation framework (Trochim, 2001). Based on the127

framework depicted in figure 2, a psychometric test can pass all these tests of128

validity to varying degrees, providing a multidimensional measure of how well129

7


a particular test measures a latent variable. In psychometric testing there130

are seven commonly tested dimensions of validity: nomological validity, face131

validity, content validity, concurrent validity, predictive validity, convergent132

validity, and discriminant validity. In psychometrics, before any other tests

Figure 2: The psychometric validity testing framework adapted from Trochim (2001).

133

of validity can be undertaken, the nomological validity of the validity domain134

should be established. High nomological validity indicates that the measure-135

ment sits well within current academic thought on the subject. Face validity136

refers to the heuristic interpretation of a measure as a valid representation of137

the underlying psychometric construct. Content validity describes both the138

inclusion of all variables believed to be within a domain and the relevance of139

the factors included in the scale. Concurrent validity refers to the behaviour140

8


of a measurement scale; specifically, that the measure varies at the same point141

in time as another theoretically related measure taken on the same sample.142

Convergent validity refers to the criterion that scores on the measure to be143

validated (e.g. intelligence) should match scores on another, theoretically re-144

lated measure (e.g. school grades) in the same sample. Finally, discriminant145

validity refers to the criterion that scores on the measure to be validated146

should be different from scores on tests that measure constructs that are147

theoretically unrelated. While this is a useful paradigm upon which to base148

our exploration, the differences between judging the validity of a complex149

model and the validity of a score of a single construct are significant enough150

to necessitate further exploration into other approaches.151

The parameterisation process is the most similar to the psychometric dis-152

cipline, as the parameters can be treated as scores denoting a given belief153

about the behaviour of that node. Using this approach, we can use the ex-154

tensive literature on psychometrics and group behaviour to help validate the155

parameters we elicit from our experts.156

2.2. System Dynamics157

In his review of system dynamics validation tests Barlas (1996) describes a158

series of eight tests to validate system dynamics models; parameter confirma-159

tion, dimensional consistency, modified behaviour prediction, Turing tests,160

Qualitative Features analysis, extreme conditions testing, behaviour sensi-161

tivity tests and structure confirmation. Each of the tests can be classified in162

terms of the psychometric validity framework but can also be directly applied163

to specific sources of BN model uncertainty. For example, parameter confir-164

mation can be seen as a special test of concurrent validity applied specifically165

9


to model parameterisation. The tests introduced in the Barlas (1996) paper166

are described in more depth in the following section with specific reference167

to BN modelling.168

2.3. Machine Learning169

It is worth mentioning the significant research that has been conducted170

in the field of machine learning, particularly regarding content validity of the171

network structure. Machine learning researchers often use BNs and Bayesian172

Belief Networks to discover true networks using full datasets ( Heckerman173

et al. (1995) is a strong and widely cited example of this method). While174

this work is outside the scope of this paper, it is worth mentioning due to175

the minimalist approach used by machine learning researchers. In particular,176

the discipline is concerned with finding methods of excluding as many nodes177

and relationships from a BN as possible without losing explanatory power.178

2.4. Bayesian Network specific tests179

There are very few validity tests specific to BN modelling, but the few180

that are present are used commonly. Pollino et al. (2007) refers to the con-181

cepts of ’sensitivity to findings’ and ’sensitivity to parameters’ as methods of182

testing the predictive validity of expert-elicited networks. Other tests that183

have been introduced, such as d-separation analysis (Geiger et al., 1990) and184

causal independence-based tests (Cheng et al., 1997) are structural tests only,185

and are often used to establish internal consistency which is more elegantly186

defined as a reliability criterion.187

10


2.5. Problem Statement188

Unlike areas in which objective data are available, BNs built from expert189

elicitation cannot be validated using complete test datasets. As such, the190

concept of validity is not absolute but a question of additive strength. Often191

we cannot say whether a test has been conclusively passed or not, only take192

the weight of evidence over all the tests that have been applied. With this in193

mind we can begin to move toward a framework for validating all sources of194

uncertainty within the BN. While there are some tests introduced in previous195

research, these only test individual aspects of the network and can often only196

reflect the reliability rather than the validity of the model. For BN’s based197

either entirely upon expert elicitation, or a combination of data and expert198

elicitation, to be judged as valid assessments of the knowledge around a199

domain, a more comprehensive and robust framework of validity measures200

needs to be established.201

3. A validity testing framework for expert-elicited Bayesian Net-202

works203

The prior approaches to test and model validation are discussed and re-204

lated to BNs in the following section, with examples from the airports in-205

bound passenger processing network. When applying this validity testing206

framework to BNs, model structure, node discretisation, and overall model207

behaviour must be considered in addition to parameterisation. For this rea-208

son, in the following framework we consider the seven types of validity from209

psychometrics (including their special tests from system dynamics and BN210

modelling disciplines), and their application to the four sources of BN model211

11


uncertainty.212

213

3.1. Nomological validity214

In terms of an expert elicited BN, building nomological validity means215

establishing confidence that the model domain fits within a wider domain216

as established by the literature. For example, the passenger processing BN217

for ACBPS should sit within literature on airport terminals, way finding and218

security as well as other types of complex systems models and spatio-temporal219

model methods. If this test cannot be passed by the network, an argument220

must be made for why this model sits outside all current known research. This221

is very unusual, but may occur in fields such as advanced physics, where new222

information is shifting the entire paradigm of the discipline regularly. If this223

is the case, there may be an argument for a network having low nomological224

validity. Nomological validity is generally applied to the whole domain, but225

the nomological map serves as a reference for finding appropriate comparison226

models in later tests of specific sources of uncertainty. Given the power of227

nomological validity to place the research in a wider context, we begin the228

validation process with the questions:229

• Can we establish that the BN model fits within an appropriate context230

in the literature?231

• Which themes and ideas are nomologically adjacent to the BN model,232

and which are nomologically distant?233

12


3.2. Face validity234

Face validity is one of the most commonly used tests for expert-elicited235

BNs. For example, we can look at our passenger processing BN and check236

that baggage delivery time is part of the model and that it is related to the237

time spent picking up baggage to approximately the right level. However,238

despite the ease of establishing face validity it is considered the weakest form239

of validity within the psychometric framework. One of the primary dangers240

in establishing face validity is criterion contamination an issue that arises241

when the test dataset is the same as the validation set (Darkes et al., 1998).242

In our case, we might ask our set of experts whether they think the network243

looks the same as expected. Unsurprisingly, there are very few cases where244

the experts disagree with their own judgment. A more robust way of estab-245

lishing face validity would be to split the population of experts into test and246

validation groups, and ask the validation group only about the face validity of247

the network (Johnson et al., 2010). In cases where few experts are available,248

we can undertake a number of other strategies normally used for elicitation,249

such as using different experts for different parts of the BN, asking experts250

to assess their answers from a rival’s perspective, asking experts whether the251

model is applicable outside their domain and many others(Low Choy et al.,252

2009; James et al., 2010). In addition, often the entire model is tested at253

once (Korb and Nicholson, 2010). In order to learn as much as possible about254

the model through the validation process it is worthwhile to assess the face255

validity of the structure (including sub-networks), discretisation and param-256

eterisation independently. We therefore suggest the second set of questions257

in this validation stage:258

13


• Does the model structure (the number of nodes, node labels and arcs259

between them) look the same as the experts and/or literature predict?260

• Is each node of the network discretised into sets that reflect expert261

knowledge?262

• Are the parameters of each node similar to what the experts would263

expect?264

3.3. Content Validity265

To test for content validity of the structure we can check that all noted266

factors and relationships from the literature are included in the model, and267

discover which relationships are novel to the BN model. For example, in268

the passenger processing BN we could ensure that all the factors considered269

to important by the regulating bodies are included. To check the content270

validity of the discretisation of nodes within the model, we can ensure that271

all intervals implicated in the literature are included in the network. For272

example, if we were to discover that a node is generally classified at three273

levels in the literature, then a node with binary states would have low content274

validity. From a systems dynamics perspective, Barlas (1996) describes a275

dimensional consistency test which when applied to a BN paradigm could276

be defined as ensuring that all possible states of the node are included in277

the discrete states. For example, if a node were to include binary states278

of above twelve people and below twelve people, then the node would lack279

dimensional consistency as the possibility of there being exactly twelve people280

has been excluded. Finally, the content validity of the parameterisation can281

be checked through comparing expert elicited probabilities and relationships282

14


to analogous relationships in the literature. If parameters in the expert283

elicited model are significantly different, an argument should be made for284

the difference. To assess the content validity of a BN model, the following285

questions are suggested:286

• Does the model structure contain all and only the factors and relation-287

ships relevant to the model output?288

• Does each node of the network contain all and only the relevant states289

the node can possibly adopt?290

• Are the discrete states of the nodes dimensionally consistent?291

• Do the parameters of the input nodes and CPT reflect all the known292

possibilities from expert knowledge and domain literature?293

3.4. Concurrent Validity294

In the context of BNs, concurrent validity can refer to the possibility that295

a network or section of a network behaves identically to a section of another296

network, preferably driven by data. While this seems improbable, the na-297

ture of BN modelling seems to lend well to concurrent validity. For example,298

the passenger processing BN shares some sub networks and nodes with the299

customer satisfaction model for the same airport. In her introduction to Ob-300

ject Oriented Bayesian Networking, Koller and Pfeffer (1997) describes the301

technique as a way of capitalising on this high concurrent validity by build-302

ing networks from instances, or nodes representing sub-networks that can be303

easily transposed to other networks. This method allows large and highly304

complex BNs to be built without the researcher repeating modelling work305

15


performed by other researchers in the same domain. To test the concurrent306

validity of the structure of a BN, we can check other networks in related307

domains for sub-networks that are similar to sub-networks in the network.308

A model with high concurrent validity would have sub-networks in common309

with networks that are theoretically related, with the same number of nodes310

and relationships, with the relationships in the same direction. Similarly,311

when similar sub-networks from theoretically related networks are identified,312

we can judge the validity of the discretisation of nodes and their param-313

eterisation against the intervals of nodes and probabilities supplied in the314

comparison network. In the Barlas (1996) review of system dynamics tests,315

the application of concurrent validity criteria specifically to the parameters316

of the model factors is known as ’parameter confirmation’. Given these ap-317

proaches, the following questions are suggested as tests of a BN’s concurrent318

validity:319

• Does the model structure or sub-networks act identically to a network320

or sub network modelling a theoretically related construct?321

• In identical sub networks, are the included factors discretised in the322

same way as the comparison model?323

• Do the parameters of the input nodes and CPTs in networks of interest324

match the parameters of the sub network in the comparison model?325

3.5. Convergent Validity326

Convergent and discriminant validity are usually considered together, as327

they both reflect the relationship the BN has with other models. Convergent328

16


validity in BNs refers to how similar the model structure, discretisation,329

and parameterisation are to other models that are intended to describe a330

similar system. For example, we would expect our passenger processing BN331

to look similar to a network describing the processing of cargo at a seaport.332

The selection of comparison models is dependent upon the literature and333

knowledge of the domain at hand, but the original nomological map created334

in the first step of validation can be used as a reference for which sources may335

be of use. In particular, the comparison model for establishing convergent336

validity should be taken from an area as nomologically proximal as possible.337

In practise this could mean using a comparison model drawn from another338

complex systems discipline applied to the same domain, or alternatively using339

a BN drawn from a theoretically similar domain. As with the other types340

of validity, we can test the expert elicited BN regarding the convergent and341

discriminant validity of the structure, discretisation and parameterisation in342

isolation using the following questions:343

• How similar is the model structure to other models that are nomologi-344

cally proximal?345

• How similar is the discretisation of each node to the discretisation of346

nodes that are nomologically proximal independent of their network347

domain.348

• Are the parameters of nodes that have analogues in comparison models349

assigned similar conditional probabilities?350

17


3.6. Discriminant Validity351

The counterpart to convergent validity is discriminant validity, defined in352

this framework as the degree to which a model is different to models that353

should be describing a different system. For example, we would expect our354

passenger processing BN to look different to a model describing students’355

progression through school. As in the case of convergent validity, the com-356

parison model can be chosen using the nomological map as a reference guide357

for useful sources. The ideal method for establishing good discriminant valid-358

ity would be to select models from nomologically distal disciplines and work359

toward the construct of interest. Given that convergent validity has already360

been established, the ideal model would be one that is similar in most re-361

spects to the convergent comparison model, but dissimilar in all respects to362

the discriminant comparison model, which would be drawn from an area of363

research very close to the convergent validity comparison model.364

A system dynamics test of experts’ judgement of the discriminant validity of365

any source of uncertainty in a BN model is known as a Simulation Turing test366

(Schruben, 1980). The test requires many versions of the model to be shown367

to the researcher, only one of which is the expert-elicited model in every368

respect. Experts can be asked to choose the correct structure, discretisation369

or parameterisation from either a set of models of through binary choice ex-370

periments in which every model is compared to every other model. As in371

the case of face validity, the Turing test is ideally carried out on a separate372

set of experts to the set that originally created the model to avoid crite-373

rion contamination. The fewer differences in the final model chosen to the374

expert-elicited network, the higher the discriminant validity of that source375

18


of uncertainty. For this framework, the following questions are suggested as376

tests of the discriminant validity of the BN model:377

• How different is the model structure to other models that are nomo-378

logically distal?379

• How different is the discretisation of each node to the discretisation380

of nodes that are nomologically distal independent of their network381

domain?382

• Are the parameters of nodes in the comparison models that have oppo-383

sitional definitions to the node in question parameterised differently?384

• When presented with a range of plausible models, can experts choose385

the ’correct’ model or set of models?386

3.7. Predictive Validity387

In BNs, predictive validity can be considered to encompass both the388

model behaviour and the model output. This is the type of validity cov-389

ered by traditional model and data fitting techniques.390

When applying predictive validity tests within a complex systems and specif-391

ically a BN paradigm, the comparison model can be an alternative hypoth-392

esised model rather than a data-driven model. Such hypothesised models393

could be elicited using a number of techniques, such as case studies or for-394

mal walkthroughs (Barlas, 1996; Pollino et al., 2007). Luu et al. (2009) used395

case studies to formulate alternative hypothetical networks against which396

to compare the predictive validity of their BN model. While they did not397

specifically apply the tests presented in this paper, their work represents one398

19


of few papers to attempt to establish confidence in the predictive validity of399

an expert-elicited BN. Half of the special tests of system dynamics model400

validity presented by Barlas (1996) refer to the predictive validity of the401

model in that they test the model behaviour specifically. Of particular rele-402

vance to establishing confidence in the predictive validity of BN are behaviour403

sensitivity tests, Qualitative Features Analysis and the extreme conditions404

tests. When applied within a BN paradigm, the behaviour sensitivity test405

can be applied to the model structure and parameters by determining to406

which factors and relationships the model is sensitive, and comparing this to407

hypothetical models or alternative empirical models. The terms ’sensitivity408

to parameters’ and ’sensitivity to findings’ are used by Pollino et al. (2007) to409

describe the application of behaviour sensitivity tests to the parameters and410

model behaviour specifically, however it should be noted that this test can411

be just as easily applied to the structure and discretisation of nodes in the412

model as well. These tests are commonly used, and various versions of them413

can be executed using the GeNiE 2.0 (DSL, 2007), Hugin Expert (Andersen414

et al., 1989) or Netica (Norsys, 2007) software packages among others.415

Qualitative features analysis (Carson and Flood, 1990) is a case of predic-416

tive validity testing where behaviour in a hypothetical model is compared417

to the behaviour of individual pairs of nodes, sub-networks and the entire418

model. As in the case of predictive validity, the hypothetical models can be419

achieved through a number of formal strategies; however in this case, we are420

interested in the comparison of simulation output rather than comparison of421

model features directly. It is for this reason that model behaviour is outlined422

as the fourth source of model uncertainty. While this area is the product of423

20


the uncertainty of its component features, predictive validity requires that424

model behaviour be simulated from the model for tests to occur. For this425

reason, predictive validity should be the final type of validity to be tested.426

Finally, the extreme conditions test can be seen as a special case of qualita-427

tive features analysis, as it sets the hypothetical model to extreme conditions428

where the behaviour of the model is more predictable (Forrester and Senge,429

1980). For example, if the number of passengers is set to 0 then the model430

should reflect that there is a probability of 1 that 0 passengers are processed431

within the time range of interest. The direct extreme conditions test ex-432

amines the behaviour of individual pairs of nodes and sub-networks under433

such extreme conditions, while the indirect extreme conditions test examines434

the behaviour of the entire network against such hypotheses. The range of435

tests to establish confidence in the predictive validity of a model is notable436

considering the issue at hand that true objective data on the model are not437

available, and suggests that the lack of data available does not preclude pre-438

dictive validity testing, as hypothesis-driven models can be used in place of439

data-driven models. From examination of the various techniques associated440

with assessing predictive validity, we arrive at the following set of questions:441

• Is the model behaviour predictive of the behaviour of the system being442

modelled?443

• Once simulations have been run, are the output states of individual444

nodes predictive of aspects in the comparison models?445

• Is the model sensitive to any particular findings or parameters to which446

the system would also be sensitive?447

21


• Are there qualitative features of the model behaviour that can be ob-448

served in the system being modelled?449

• Does the model including its component relationships predict extreme450

model behaviour under extreme conditions?451

4. Conclusions and Recommendations452

In this paper we have outlined a broad range of conceptual tests that can453

be applied to validate BNs. These validity tests incorporate standard model-454

data fit comparisons, but expand the construct of validity to the broader455

definition of whether or not a model describes the system it is intended to456

describe, and produces output it is intended to produce. Many of these va-457

lidity tests can be used where no objective data exist.458

By combining existing research from BN validation with validation tests from459

psychometrics as well alternative complex systems disciplines, this paper in-460

troduces a starting point for discussing a framework for building confidence461

in the validity of BNs. The presented framework is not intended to be com-462

prehensive; instead, the aim is to establish that the validity of a BN can be463

tested, and should be tested, independent of the model fit to available data464

or expert confirmation. Disciplines such as psychometrics, with a history of465

measuring latent constructs, can provide a useful perspective on the problem.466

The framework presents a sequence of steps that can be followed to establish467

confidence in model validity, beginning with creating a nomological map of468

the literature surrounding the domain, then gradually building confidence in469

six types of model validity, using both general and specific tests.470

The application of this framework to the BN developed in conjunction with471

22


ACBPS will to our knowledge be a novel practical demonstration of such an472

approach to BN validation. The framework presented in this paper is in-473

tended to be domain-general, and there would be great value in establishing474

the versatility of the tests by applying them to complex models in other do-475

mains. Future work will extend to formalising and quantifying many of the476

tests in the context of BN modelling, and obtaining perspectives on model va-477

lidity from other disciplines that deal with unobserved variables and complex478

systems.479

5. References480

S.K. Andersen, K.G. Olesen, F.V. Jensen, and F. Jensen. Hugin - a shell481

for building bayesian belief universes for expert systems. In Proceedings482

of the Eleventh International Joint Conference on Artificial Intelligence,483

pages 1080–1085, United States of America, 1989. MIT press.484

Y. Barlas. Formal aspects of model validity and validation in system dynam-485

ics. System Dynamics Review, 12(3):183–210, 1996.486

S. Barrett, P. Whittle, K. Mengersen, and R. Stoklosa. Biosecurity threats:487

the design of surveillance systems, based on power and risk. Environmental488

and ecological statistics, 17:503–519, 2010.489

E.R. Carson and R.L. Flood. Model validation: philosophy, methodology490

and examples. Transactions of the Institute of Measurement and Control,491

12:178–185, 1990.492

J. Cheng, D.A. Bell, and W. Liu. An algorithm for bayesian belief network493

23


construction from data. In Proceedings of Conference on Artificial Intelli-494

gence and Statistics, pages 83–90, United States, 1997.495

L.J. Cronbach and P.E. Meehl. Construct validity in psychological tests.496

Psychological Bulletin, 52(4):281–302, 1955.497

J. Darkes, P.E. Greenbaum, and M.S. Goldman. Sensation seekingdisinhibi-498

tion and alcohol use: Exploring issues of criterion contamination. Psycho-499

logical Assessment, 10:71–76, 1998.500

DSL. Genie and smile, 2007. Bayesian Network Modelling software package501

and decision platform.502

J.W. Forrester and P.M. Senge. Tests for building confidence in system503

dynamics models. TIMS studies in the management sciences, 14:209–228,504

1980.505

D. Geiger, T. Verma, and J. Pearl. d-separation: From theorems to algo-506

rithms. In Proceedings of the Fifth Annual Conference on Uncertainty in507

Artificial Intelligence, UAI ’89, pages 139–148, The Netherlands, 1990.508

North-Holland Publishing Co.509

D. Heckerman, D. Geiger, and D.M. Chickering. Learning bayesian networks:510

The combination of knowledge and statistical data. Machine Learning, 20:511

197–243, 1995.512

A. James, S. Low Choy, and K. Mengersen. Elicitator: An expert elicitation513

tool for regression in ecology. Environmental modelling and Software, 25:514

129–145, 2010.515

24


S. Johnson. Integrated Bayesian network frameworks for modelling complex516

ecological issuesy. PhD thesis, Queensland University of Technology, Aus-517

tralia, 2009.518

S. Johnson, F. Harding, G. Hamilton, and K. Mengersen. An integrated519

bayesian network approach to lyngbya majuscula bloom initiation. Marine520

Environmental Research, 69:27–37, 2010.521

D. Koller and A. Pfeffer. Object-oriented bayesian networks. In Proceedings522

of the Thirteenth Annual Conference on Uncertainty in Artificial Intelli-523

gence, JMLR workshop and conference proceedings, pages 302–313, United524

States, 1997.525

K.B. Korb and A.E. Nicholson. Bayesian Artificial Intelligence, page 462 pp.526

CRC Press, United Kingdom, 2010.527

S. Low Choy, R. O’Leary, and K. Mengersen. Elicitation by design in ecol-528

ogy: using expert opinion to inform priors for bayesian statistical models.529

Ecology, 90:265–277, 2009.530

V. Luu, S. Kim, N. Tuan, and S. Ogunlana. Quantifying schedule risk in531

construction projects using bayesian belief networks. International Journal532

of Project Management, 27:39–50, 2009.533

M. Masoli, D. Fabian, S. Holt, and R. Beasley. The global burden of asthma:534

executive summary of the gina dissemination committee report. Allergy,535

59(5):469–478, 2004.536

P. Myllymaki, T. Silander, H. Tirri, and P. Uronen. B-course: A web-based537

25


tool for bayesian and causal data analysis. International Journal on Arti-538

ficial Intelligence, 11(3):369–387, 2002.539

Norsys. Netica, 2007. Proprietary Bayesian Network Modelling software540

package.541

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausi-542

ble Inference. Morgan Kaufmann, 1988.543

C.A. Pollino, O. Woodberry, A. Nicholson, K. Korb, and B.T. Hart. Parame-544

terisation and evaluation of a bayesian network for use in an ecological risk545

assessment. Environmental Modelling and Software, 22:1140–1152, 2007.546

S. Renooij. Probability elicitation for belief networks: issues to consider. The547

Knowledge Engineering Review, 16(3):255–269, 2001.548

L.W. Schruben. Establishing the credibility of simulations. Simulation, 34:549

101–105, 1980.550

O. Shakoor, R.B. Taylor, and R.H. Behrens. Assessment of the incidence of551

substandard drugs in developing countries. Tropical Medicine and Inter-552

national Health, 2(9):839–845, 1997.553

T. Silander, T. Ross, and P. Myllymaki. Locally minimax optimal predic-554

tive modeling with bayesian networks. In Proceedings of the 12th Inter-555

national Conference on Artificial Intelligence and Statistics (AISTATS)556

2009, JMLR workshop and conference proceedings, pages 504–511, United557

States, 2009.558

26


R.S. Sobel and B.J. Osoba. Youth gangs as pseudo-governments implications559

for violent crime. Southern Economic Journal, 75(4):996–1018, 2009.560

W.M. Trochim. Research methods knowledge base, 2001. URL561

http://www.socialresearchmethods.net/kb/index.htm.562

L. Uusitalo. Advantages and challenges of bayesian networks in environmen-563

tal modelling. Ecological Modelling, 203(3), 2007.564

O. Woodberry, A. Nicholson, K. Korb, and C Pollino. Parameterising565

bayesian networks. In Geoffrey Webb and Xinghuo Yu, editors, AI 2004:566

Advances in Artificial Intelligence, volume 3339 of Lecture Notes in Com-567

puter Science, pages 711–745. Springer Berlin / Heidelberg, 2005.568

27