Auto claim fraud detection using Bayesian learning neural networks


Auto claim fraud detection using Bayesian learning neural networks

S. Viaene
a,b,*, G. Dedene

b,c
, R.A. Derrig

d

a
Applied Economic Sciences, K. V. Leuvei, Naamsestraat 69, B-3000 Leuven, Belgium

b
Vlerick Leuven Gent Management School, Reep1, B-9000 Gent, Belgium

c
Economics and Econometrics, University of Amsterdam, Roetersstract 11, 1018 WB Amsterdam, The Netherlands

d
Automobile Insurers Bureau of Massachusetts & Insurance Fraud Bureau of Massachusetts, 101 Arch Street, Boston MA 02110, USA

Abstract

This article explores the explicative capabilities of neural network classifiers with automatic relevance determination weight

regularization, and reports the findings from applying these networks for personal injury protection automobile insurance claim fraud

detection. The automatic relevance determination objective function scheme provides us with a way to determine which inputs are most

informative to the trained neural network model. An implementation of MacKay’s, (1992a,b) evidence framework approach to Bayesian

learning is proposed as a practical way of training such networks. The empirical evaluation is based on a data set of closed claims from

accidents that occurred in Massachusetts, USA during 1993.

q 2005 Elsevier Ltd. All rights reserved.

JEL classification: C45

Keywords: Automobile insurance; Claim fraud; Neural network; Bayesian learning; Evidence framework

SIBC: IB40

1. Introduction

In recent years, the detection of fraudulent claims has

blossomed into a high-priority and technology-laden

problem for insurers (Viaene, 2002). Several sources

speak of the increasing prevalence of insurance fraud and

the sizeable proportions it has taken on (see, for example,

Canadian Coalition Against Insurance Fraud, 2002;

Coalition Against Insurance Fraud, 2002; Comité Européen

des Assurances, 1996; 1997). September 2002, a special

issue of the Journal of Risk and Insurance (Derrig, 2002)

was devoted to insurance fraud topics. It scopes a significant

part of previous and current technical research directions

regarding insurance (claim) fraud prevention, detection and

diagnosis.

More systematic electronic collection and organization

of and company-wide access to coherent insurance data

have stimulated data-driven initiatives aimed at analyzing

and modeling the formal relations between fraud indicator

combinations and claim suspiciousness to upgrade fraud

detection with (semi-)automatic, intelligible, accountable

tools. Machine learning and artificial intelligence solutions

are increasingly explored for the purpose of fraud prediction

and diagnosis in the insurance domain. Still, all in all, little

work has been published on the latter. Most of the state-of-

the-art practice and methodology on fraud detection remains

well-protected behind the thick walls of insurance compa-

nies. The reasons are legion.

Viaene, et al. (2002) reported on the results of a

predictive performance benchmarking study. The study

involved the task of learning to predict expert suspicion of

personal injury protection (PIP) (no-fault) automobile

insurance claim fraud. The data that was used consisted of

closed real-life PIP claims from accidents that occurred in

Massachusetts, USA during 1993, and that were previously

investigated for suspicion of fraud by domain experts. The

study contrasted several instantiations of a spectrum of

state-of-the-art supervised classification techniques, that is,

techniques aimed at algorithmically learning to allocate data

objects, that is, input or feature vectors, to a priori defined

Expert Systems with Applications 29 (2005) 653–666

www.elsevier.com/locate/eswa

0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.eswa.2005.04.030

* Corresponding author. Tel.: C32 16 32 68 91; fax: C32 16 32 67 32.

E-mail address: stijn.viaene@econ.kuleuven.ac.be (S. Viaene).

http://www.elsevier.com/locate/eswa


object classes, based on a training set of data objects with

known class or target labels. Among the considered

techniques were neural network classifiers trained according

to MacKay’s (1992a) evidence framework approach to

Bayesian learning. These neural networks were shown to

consistently score among the best for all evaluated

scenarios.

Statistical modeling techniques such as logistic

regression, linear and quadratic discriminant analysis are

widely used for modeling and prediction purposes. How-

ever, their predetermined functional form and restrictive

(often unfounded) model assumptions limit their usefulness.

The role of neural networks is to provide general and

efficiently scalable parameterized nonlinear mappings

between a set of input variables and a set of output variables

(Bishop, 1995). Neural networks have shown to be very

promising alternatives for modeling complex nonlinear

relationships (see, for example, Desai et al. 1996;

Lacher et al. 1995; Lee et al. 1996; Mobley et al. 2000;

Piramuthu, 1999; Salchenberger et al. 1997; Sharda &

Wilson, 1996). This is especially true in situations where

one is confronted with a lack of domain knowledge which

prevents any valid argumentation to be made concerning an

appropriate model selection bias on the basis of prior

domain knowledge.

Even though the modeling flexibility of neural

networks makes them a very attractive and interesting

alternative for pattern learning purposes, unfortunately,

many practical problems still remain when implementing

neural networks, such as What is the impact of the initial

weight choice? How to set the weight decay parameter?

How to avoid the neural network from fitting the noise in

the training data? These and other issues are often dealt

with in ad hoc ways. Nevertheless, they are crucial to the

success of any neural network implementation. Another

major objection to the use of neural networks for

practical purposes remains their widely proclaimed lack

of explanatory power. Neural networks are black boxes,

it says. In this article Bayesian learning (Bishop, 1995;

Neal, 1996) is suggested as a way to deal with these

issues during neural network training in a principled,

rather than an ad hoc fashion.

We set out to explore and demonstrate the explicative

capabilities of neural network classifiers trained using an

implementation of MacKay’s (1992a) evidence framework

approach to Bayesian learning for optimizing an automatic

relevance determination (ARD) regularized objective func-

tion (MacKay, 1992; 1994; Neal, 1998). The ARD objective

function scheme allows us to determine the relative

importance of inputs to the trained model. The empirical

evaluation in this article is based on the modeling work

performed in the context of the baseline benchmarking

study of Viaene et al. (2002).

The importance of input relevance assessment needs no

underlining. It is not uncommon for domain experts to ask

which inputs are relatively more important. Specifically,

Which inputs contribute most to the detection of insurance

claim fraud? This is a very reasonable question. As such,

methods for input selection are not only capable of

improving the human understanding of the problem domain,

in casu the diagnosis of insurance claim fraud, but also

allow for more efficient and lower-cost solutions. In

addition, penalization or elimination of (partially) redundant

or irrelevant inputs may also effectively counter the curse of

dimensionality (Bellman, 1961). In practice, adding inputs

(even relevant ones) beyond a certain point can actually lead

to a reduction in the performance of a predictive model. This

is because, faced with limited data availability, as we are in

practice, increasing the dimensionality of the input space

will eventually lead to a situation where this space is so

sparsely populated that it very poorly represents the true

model in the data. This phenomenon has been termed the

curse of dimensionality. The ultimate objective of input

selection is, therefore, to select a minimum number of inputs

required to capture the structure in the data.

This article is organized as follows. Section 2 revisits

some basic theory on multilayer neural networks for

classification. Section 3 elaborates on input relevance

determination. The evidence framework approach to

Bayesian learning for neural network classifiers is discussed

in Section 4. The theoretical exposition in the first three

sections is followed by an empirical evaluation. Section 5

describes the characteristics of the 1993 Massachusetts,

USA PIP closed claims data that were used. Section 6

describes the setup of the empirical evaluation and reports

its results. Section 7 concludes this article.

2. Neural networks for classification

Fig. 1 shows a simple three-layer neural network. It is

made up of an input layer, a hidden layer and an output

layer, each consisting of a number of processing units.

The layers are interconnected by modifiable weights,

represented by the links between the layers. A bias unit

is connected to each unit other than the input units.

The function of a processing unit is to accept signals

along its incoming connections and (nonlinearly) transform

a weighted sum of these signals, termed its activation, into

a single output signal. In analogy with neurobiology, the

units are sometimes called neurons. The discussion will be

restricted to the use of neural networks for binary

classification, where the input units represent individual

components of an input vector, and a single output unit is

responsible for emitting the values of the discriminant

function used for classification. One then commonly opts

for a multilayer neural network with one hidden layer.

In principle, such a three-layer neural network can

implement any continuous function from input to output,

given a sufficient number of hidden units, proper nonlinea-

rities and weights (Bishop, 1995). We start with a

description of the feedforward operation of such a neural

S. Viaene et al. / Expert Systems with Applications 29 (2005) 653–666654


https://isiarticles.com/article/17681