doi:10.1016/j.eswa.2005.10.007


Artificial neural networks with evolutionary instance

selection for financial forecasting

Kyoung-jae Kim *

Department of Information Systems, Dongguk University, 3-26, Pil-dong, Chung-gu, Seoul 100-715, South Korea

Abstract

In this paper, I propose a genetic algorithm (GA) approach to instance selection in artificial neural networks (ANNs) for financial data mining.

ANN has preeminent learning ability, but often exhibit inconsistent and unpredictable performance for noisy data. In addition, it may not be

possible to train ANN or the training task cannot be effectively carried out without data reduction when the amount of data is so large. In this

paper, the GA optimizes simultaneously the connection weights between layers and a selection task for relevant instances. The globally evolved

weights mitigate the well-known limitations of gradient descent algorithm. In addition, genetically selected instances shorten the learning time

and enhance prediction performance. This study applies the proposed model to stock market analysis. Experimental results show that the GA

approach is a promising method for instance selection in ANN.

q 2005 Elsevier Ltd. All rights reserved.

Keywords: Instance selection; Genetic algorithms; Artificial neural networks; Financial forecasting
1. Introduction

In general, artificial neural networks (ANNs) can produce

robust performance when a large amount of data is available.

However, ANN often exhibits inconsistent and unpredictable

performance on noisy data. In addition, it may not be possible

to train ANN or the training task cannot be effectively carried

out without data reduction when a data set is too huge. Data

reduction can be achieved in many ways such as feature

selection or feature discretization (Blum & Langley, 1997;

Kim & Han, 2000; Liu & Motoda, 1998).

One facet of data mining concerns the selection of relevant

instances for this reason. Instances are a collection of training

examples in supervised learning and instance selection chooses

a part of the data that is representative and relevant to the

characteristics of all the data. Instance selection is one of

popular methods for dimensionality reduction and is directly

related to data reduction. Although instance selection is the

most complex form of data reduction because the computa-

tionally expensive prediction methods must be invoked more

often to determine the effectiveness of instance selection, we

can usually remove irrelevant instances as well as noise and
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.eswa.2005.10.007

* Tel.: C82 2 2260 3324; fax: C82 2 2260 3684.

E-mail address: kjkim@dongguk.edu
redundant data (Liu & Motoda, 2001; Weiss & Indurkhya,

1998).

Many researchers have suggested instance selection

methods such as squashed data, critical points, prototype

construction, in addition to many forms of sampling (Liu &

Motoda, 2001). The efforts to select relevant instances from an

initial data set have stemmed from the need to reduce immense

storage requirements and computational loads (Kuncheva,

1995). The other perspective on this subject, as pointed out in

Dasarathy (1990), is to achieve enhanced performance from the

learning algorithm through instance selection. In addition,

training time may be shortened by use of the proper instance

selection algorithm.

This paper proposes a new hybrid model of ANN and

genetic algorithms (GAs) for instance selection. An evolution-

ary instance selection algorithm reduces the dimensionality of

data and may eliminate noisy and irrelevant instances. In

addition, this study simultaneously searches the connection

weights between layers in ANN through an evolutionary

search. The genetically evolved connection weights mitigate

the well-known limitations of gradient descent algorithm.

The rest of this paper is organized as follows: Section 2

presents the research background. Section 3 proposes

the evolutionary instance selection algorithm and describes

the benefits of the proposed algorithm. Section 4 describes the

application of the proposed algorithm. Conclusions and the

limitations of this study are presented in Section 5.
Expert Systems with Applications 30 (2006) 519–526
www.elsevier.com/locate/eswa

http://www.elsevier.com/locate/eswa


K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526520
2. Research background

For some applications, quality of data mining is improved

with additional instances. However, the number of instances

may tend to increase the complexity of induced solution.

Increased complexity is not desirable, but may be the price to

pay for better performance. In addition, increased complexity

decreases the interpretability of the result (Weiss & Indurkhya,

1998). In this sense, many researchers have suggested instance

selection methods. The following sections present some

instance selection methods as described by prior research.

2.1. Instance selection methods

Instance-based learning algorithms often faced the problem

of deciding which instances to store for use during

generalization in order to avoid excessive storage and time

complexity, and to improve generalizability by avoiding noise

and overfitting (Wilson & Martinez, 2000). Many researchers

have addressed the problem of training data reduction and have

presented algorithms for maintaining an instance base or case

base in instance-based learning algorithms.

Kuncheva (1993) classified instance selection techniques

(or editing techniques) into the following three categories:

Condensed Nearest Neighbor rule, Generated or Modified

Prototypes, and Two-Level Classifiers. The following presents

some basic concepts of each category as described by prior

research. A detailed explanation may be found in the references

of this paper.

2.1.1. Condensed nearest neighbor rule

Hart (1968) made one of the first attempts to develop an

instance selection rule. Hart’s algorithm, the Condensed

Nearest Neighbor rule, finds a subset S of the training set T

such that every member of T is closer to a member of S of the

same class than to a member of S of a different class.

Subsequent work extended Hart’s algorithm, specifically the

Selective Nearest Neighbor rule (Ritter, Woodruff, Lowry, &

Isenhour, 1975) and the Reduced Nearest Neighbor rule (Gates,

1972). In addition, Wilson (1972) introduced the Edited

Nearest Neighbor algorithm and Tomek (1976) proposed the

All k-NN method of editing.

2.1.2. Generated or modified prototypes

This category is composed of techniques that establish new

prototypes or adjust a limited number of instances. A large

group of studies within this category are implemented by ANN

including feature-map classifiers, learning vector quantiziers

(Kuncheva, 1995).

2.1.3. Two-level classifiers

This category employs two or more classifiers and allocates

a part of all instances to the classifier, which appears most

appropriate. Tetko and Villa (1997) proposed the Efficient

Partition algorithm, which is used to obtain an efficient

partition of noisy instances, whose distribution is proportional

to the complexity of the analyzed function. This is to focus the
training of ANN on the most complex and informative domains

of the data set and accelerate the learning phase. They

concluded that the efficiently partitioned instances enhance

the predictability of ANN in comparison with a random

selection of instances. Oh and Han (2000) proposed the

integrated neural network model using change-point detection.

They partitioned instances according to each detected change-

point, and then applied each partitioned instance to each ANN

of multiple ANN.

Instance selection in instance-based learning algorithms

may be considered as a method of knowledge refinement and it

maintains the instance-base. In this sense, some researchers

proposed many instance selection algorithms for maintaining

the case-base in case-based reasoning (CBR) systems. Smyth

(1998) presented an approach to maintenance, which is based

on the deletion of harmful and redundant cases from the case-

base. In addition, McSherry (2000) suggested an instance

selection method in the construction of a case library in which

evaluation of the coverage contributions of candidate instances

are based on an algorithm called disCover. This algorithm

reverses the direction of CBR to discover all cases that can be

solved with a given case-base.

Although many different approaches have been used to

address the problem of case authoring and data explosion for

instance-based algorithms, there is little research on instance

selection in ANN. Reeves and Taylor (1998) suggested that a

GA is a promising approach to finding ‘better’ training data set

for classification problems in radial basis function (RBF) nets.

Reeves and Bush (2001) reported that the GA can also be used

effectively to find a smaller subset of a ‘good’ training set in

RBF nets for both classification and regression problems.

Although, the GA has been shown to be a promising instance

selection method for RBF nets, its performances on other

neural network models are untested.
2.2. Genetic algorithms

The GA has been investigated recently and shown to be

effective in exploring a complex space in an adaptive way,

guided by the biological evolution mechanisms of selection,

crossover, and mutation (Adeli & Hung, 1995). The GA

simulates the mechanics of population genetics by maintaining

a population of knowledge structure, which is made to evolve

(Odetayo, 1995).

The problems must be represented in a suitable form to be

handled by the GA. The GA often works with a form of binary

coding. If the problems are coded as chromosomes, the

population is initialized. Each chromosome within the

population is gradually evolved by biological operations.

Once the population size is chosen, the initial population is

randomly generated (Bauer, 1994). After the initialization step,

each chromosome is evaluated by the fitness function.

According to the value of the fitness function, the chromo-

somes associated with the fittest individuals will be reproduced

more often than those associated unfit individuals (Davis,

1994).


K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526 521
The GA works with three operators that are iteratively used.

The selection operator determines which individuals may

survive (Hertz & Kobler, 2000). The crossover operator allows

the search to fan out in diverse directions looking for attractive

solutions and permits chromosomal material from different

parents to be combined in a single child. In addition, the

mutation operator arbitrarily alters one or more components of

a selected chromosome. It provides the means for introducing

new information into the population. Finally, the GA tends to

converge on optimal or near-optimal solutions (Wong & Tan,

1994).

The GA is usually employed to improve the performance of

artificial intelligence techniques. For ANN, the GA was

applied to the selection of neural network topology including

optimizing a relevant feature subset, determining the optimal

number of hidden layers and processing elements. In addition,

some researchers searched the connection weights of ANN

using the GA instead of local search algorithms including a

gradient descent algorithm. They suggested that global search

techniques including the GA might prevent ANN from falling

into a local optimum (Gupta & Sexton, 1999; Kim & Han,

2000; Sexton, Dorsey, & Johnson, 1998).

2.3. Prior research on stock market prediction using ANN

Many studies on stock market prediction using artificial

intelligence (AI) techniques have been performed during the

past decade. The early days of these studies focused on

estimating the level of the return on stock price index. One of

the earliest studies, Kimoto, Asakawa, Yoda, and Takeoka

(1990) used several learning algorithms and prediction

methods for developing a prediction system for the Tokyo

Stock Exchange Prices Index. They used the modular neural

network to learn the relationships among various market

factors. They concluded that the correlation coefficient

produced by their model is much higher than that produced

by multiple regression. However, the correlation coefficient

may not be a proper measure for prediction performance.

Kamijo and Tanikawa (1990) used the recurrent neural

network for analyzing candlestick charts. A candlestick chart

is a Japanese style chart used to visualize stock price patterns.

In these studies, they did not perform any statistical test for the

significance of the empirical results.

Some researchers investigated the issue of predicting the

stock index futures market. Choi, Lee and Lee (1995) and

Trippi and DeSieno (1992) predicted the daily direction of

change in the S&P 500 index futures using ANN. Trippi and

DeSieno (1992) combined the outputs of individual networks

using logical (Boolean) operators to produce a set of composite

rules. They suggested that their best composite synthesized rule

set system achieved a higher gain than previous research. Choi

et al. (1995) compared their approach with previous study and

suggested that they earned a higher annualized gain than the

previous study. However, the annualized gain may not be an

appropriate measure for prediction performance because it

varies according to the fee for trade and the trading strategy.

Duke and Long (1993) predicted German government daily
bond futures using backpropagation (BP) neural networks.

They reported that the 53.94% of the patterns are accurately

predicted through the moving simulation method. Most of the

above studies simply applied ANN to stock market prediction.

Recent research tends to hybridize several AI techniques.

Nikolopoulos and Fellrath (1994) developed a hybrid expert

system for investment advising. In their study, genetic

algorithms were used to train and configure the architecture

of investor’s neural network component. Hiemstra (1995)

proposed fuzzy expert systems to predict stock market returns.

He suggested that ANN and fuzzy logic could capture the

complexities of functional mapping because they do not require

the specification of the function to approximate. Some

researchers tend to include novel factors for the learning

process. Kohara, Ishikawa, Fukuhara and Nakamura (1997)

incorporated prior knowledge to improve the performance of

stock market prediction. Prior knowledge in their study

included non-numerical factors such as political and inter-

national events. They made use of prior knowledge of stock

price predictions and newspaper information on domestic and

foreign events. A more recent study of Lee and Jo (1999)

developed an expert system, which uses knowledge in a

candlestick chart analysis. The expert system had patterns and

rules, which could predict future stock price movements. The

experimental results revealed that a developed knowledge-base

could provide excellent indicators. In addition, Tsaih, Hsu and

Lai (1998) integrated a rule-based technique and ANN to

predict the direction of change of the S&P 500 stock index

futures on a daily basis.

Stock market data, however, includes tremendous noise and

non-stationary characteristics; thus, the training process for

ANN tends to be difficult. In addition, the possibility of local

convergence of the gradient search techniques may be another

difficulty for learning patterns.

3. A GA approach to instance selection for ANN

As mentioned earlier, there are many studies on instance

selection for the instance-based learning algorithm. However,

there are few studies on instance selection for ANN. Thus,

there are few relevant theories concerning instance selection

for ANN. This paper proposes the GA approach to instance

selection for ANN (GAIS). The overall framework of GAIS is

shown in Fig. 1. In this study, the GA supports the

simultaneous optimization of connection weights and selection

of relevant instances.

The algorithm of GAIS consists of the following three

phases: GA search phase, feed-forward computation phase, and

validation phase.

3.1. GA search phase

In the GA search phase, the GA searches the search space to

find optimal or near-optimal connection weights and relevant

instances for ANN. The populations, the connection weights

and the codes for instance selection, are initialized into random

values before the search process. The parameters for searching


X1 X2 X3 X4
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10

X1 X2 X3 X4
I3
I6
I7
I10

GA Fitness function

ANN
Assign input connection weights

Assign hidden connection weights

Evaluation

S
el

ec
t 

re
le

va
nt

 i
ns

ta
nc

es

Fig. 1. Overall framework of GAIS.

K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526522
must be encoded on chromosomes. This study needs three sets

of parameters. The first set is the set of connection weights

between the input layer and the hidden layer of the network.

The second set is the set of connection weights between the

hidden layer and the output layer. As mentioned earlier, the

above two sets may mitigate the limitation of the gradient

descent algorithm. The third set represents the codes for

instance selection.

The strings have the following encoding: each processing

element in the hidden layer receives signals from the input

layer. The first set of bits represents the connection weights

between the input layer and the hidden layer. Each processing

element in the output layer receives signals from the hidden

layer. The next set of bits indicates the connection weights

between the hidden layer and the output layer. The following

bits are instance selection codes for the training data. The

parameters to be searched use only the information about

the selected instances within the training data. In this phase, the

GA operates the process of crossover and mutation on initial

chromosomes and iterates until the stopping conditions are

satisfied.
3.2. Feed-forward computation phase

This phase is the process of feed-forward computation in

ANN. Proper activation function is required to facilitate the

learning process. However, there are no clear criteria regarding

which activation function to use. Some researchers rec-

ommended the sigmoid function for classification problems

and the hyperbolic tangent function for forecasting problems
because of the difference between the sigmoid and the

hyperbolic tangent function for the value range of delta

weights with the SSE error function (Coakley & Brown, 2000).

In addition, the majority of back-propagation applications used

the sigmoid activation function (Hansen, McDonald, & Nelson,

1999). There are few comparative studies between the sigmoid

function and other activation functions in ANN. This study

uses the sigmoid function as the activation function because

this study is performed to classify the accurate direction of

change in the daily stock price index. The linear function is

used as a combination function for the feed-forward

computation with the derived connection weights from the

first phase.
3.3. Validation phase

The derived connection weights are applied to the holdout

data. This phase is indispensable to validate the general-

izability because ANN has the eminent ability of learning the

known data. Table 1 summarizes the algorithms of GAIS.
4. Application: analysis of the stock market data

This section applies GAIS to stock market prediction. The

efficiency and effectiveness of GAIS may be properly tested

because the stock market data is very noisy and complex. Many

studies on stock market prediction using artificial intelligence

techniques were performed in the past decade. Some of them,

however, did not produce outstanding prediction accuracy

partly because of the tremendous noise and non-stationary


Table 1

The algorithms of GAIS

Step 0 Initialize the populations (the connection weights between layers and the codes for instance selection). (Set to small random values between 0.0 and 1.0)

Step 1 If stopping condition is false, do Step 2. Otherwise, stop the process

Step 2 Do Steps 3–9

Step 3 Each processing element in the input layer receives an input signal and forwards this signal to all processing elements in the hidden layer

Step 4 Each processing element in the hidden layer sums its weighted input signals and applies the sigmoid activation function to compute its output signal of

the hidden processing element and forwards it to all processing elements in the output layer

Step 5 Each processing element in the output layer sums its weighted signals from the hidden layer and applies the sigmoid activation function to compute its

output signal of the output processing element and computes the difference between the output signal and the target value

Step 6 Calculate fitness. (Fitness function: Average predictive accuracy on the selected instances within the training data)

Step 7 Select individuals to become parents of the next generation

Step 8 Create a second generation from the parent pool. (Perform crossover and mutation)

Step 9 Test stopping condition and go back to Step 1

K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526 523
characteristics in stock market data. If these factors are not

appropriately controlled, the prediction system does not

produce significant performance. When the prediction is

executed using long-term data, this is more important to

manage the consistency of prediction.
4.1. Application data

The application data used in this study consists of technical

indicators and the direction of change in the daily Korea stock

price index (KOSPI). The total number of samples is 2348

trading days, from January 1991 to December 1998. This study

divides the samples into eight data sets according to the trading

year. Experiments are repeated eight times for each data set to

reflect specific knowledge as time passes.

The direction of daily change in the stock price index is

categorized as ‘0’ or ‘1’. ‘0’ means that the next day’s index is

lower than the today’s index, and ‘1’ means that the next day’s

index is higher than today’s index. I select 12 technical

indicators as feature subsets by the review of domain experts

and prior research. Table 2 gives selected features and their

formulas.
Table 2

Selected features and their formulas

Names of feature

Stochastic %K

Stochastic %D

Stochastic slow %D

Momentum

ROC (rate of change)

LW %R (Larry William’s %R)

A/D Oscillator (accumulation/distribution oscillator)

Disparity 5 days

Disparity 10 days

OSCP (price oscillator)

CCI (commodity channel index)

RSI (relative strength index)

C, closing price; L, low price; H, high price; LLn, lowest low price in the last n days;

LtCCt)/3; SMt,
Pn
iZ1

MtKiC1=n; Dt,
Pn
iZ1

jMtKiC1 KSMtj=n; Up, upward price change; Dw
4.2. Experiments

The following experiments are carried out:
4.2.1. Whole training data

The whole training samples are used as the training data.

This is the conventional method of data analysis.

4.2.2. Selected instances with GAIS

Experiments on stock market data are implemented using

GAIS. The procedure of the experiment is as follows. The GA

searches for optimal or near-optimal connection weights and

relevant instances for ANN. As mentioned earlier, this study

needs three sets of parameters: The connection weights

between the input and the hidden layer, the connection weights

between the hidden and the output layer, and the codes for

instance selection.

This study uses the following encoding for the strings: 12

input features are used and 12 processing elements in the

hidden layer are employed. Each processing element in the

hidden layer receives 12 signals from the input layer. The first

144 bits represent the connection weights between the input
Formula

(CtKLLtK5)/(HHtK5KLLtK5)!100PnK1
iZ0

%KtKi=n

PnK1
iZ0

%DtKi=n

CtKCtK4
(Ct/CtKn)!100

(HnKCt)/(HnKLn)!100

(HtKCtK1)/(HtKLt)

(Ct/MA5)!100
(Ct/MA10)!100

(MA5KMA10)/MA5
(MtKSMt)/(0.015!Dt)

100K100= 1C
PnK1
iZ0

UptKi=n

� �
=

PnK1
iZ0

DwtKi=n

� �� �

HHn, highest high price in the last n days; M, moving average of price; Mt, (HtC

, downward price change.


Table 3

Number of instances

Set Year Total

1991 1992 1993 1994 1995 1996 1997 1998

Training instances for

GANN

234 236 237 237 235 235 234 234 1882

Selected instances for

GAIS

74 71 87 66 93 86 93 85 655

Holdout instances for

GANN & GAIS

58 58 59 59 58 58 58 58 466

Table 4

Average predictive performance (hit ratio: %)

Year GANN GAIS

Training Holdout Training Holdout

1991 63.68 53.45 74.32 72.41

1992 64.83 56.90 77.46 58.62

1993 61.18 59.32 70.11 59.32

1994 62.87 57.63 74.24 61.02

1995 69.36 65.52 81.72 67.24

1996 65.11 65.52 76.74 77.59

1997 64.96 58.62 65.59 58.62

1998 61.11 56.90 78.82 68.97

Total 64.13% 59.23% 74.87% 65.45%

K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526524
layer and the hidden layer. These bits are searched from K5 to
5. Each processing element in the output layer receives signals

from the hidden layer. The next 12 bits indicate the connection

weights between the hidden layer and the output layer. These

bits also varied between K5 and 5. The following bits are
instance selection codes for the training data. The chromosome

of these bits consists of n genes (where n is the number of initial

training instances), each one with two possible states: 0 or 1.

‘1’ means the associated instance is selected into the analysis

and ‘0’ means the associated instance is not chosen.

The encoded chromosomes are searched to maximize the

fitness function. The fitness function is specific to applications.

In this study, the objectives of the model are to approximate

connection weights and to select relevant instances for the

correct solutions. These objectives can be represented by the

average prediction accuracy of the selected instances within the

training data. Thus, this study applies the average prediction

accuracy of the selected instances within the training data to

the fitness function. Mathematically, the fitness function is

represented as Eq. (1):

Fitness Z
1

n

Xn
iZ1

CRi ði Z 1; 2; .; nÞ

if POi Z AOi CRi Z 1

otherwise CRi Z 0

( (1)

where CRi is the prediction result for the ith trading day which

is denoted by 0 or 1, POi is the predicted output from the model

for the ith trading day, and AOi is the actual output for the ith

trading day.

For the controlling parameters of the GA search, the

population size is set at 100 organisms and the crossover and

mutation rates are varied to prevent ANN from falling into a

local minimum. The value of the crossover rate is set at 0.7

while the mutation rate is 0.1. For the crossover method, the

uniform crossover method is considered better at preserving

the schema, and can generate any schema from the two parents,

while single-point and two-point crossover methods may bias

the search with the irrelevant position of the variables. Thus,

this study performs crossover using the uniform crossover

routine. For the mutation method, this study generates a

random number between 0 and 1 for each of the variables in the

organism. If a variable gets a number that is less than or equal

to the mutation rate, then that variable is mutated. As the

stopping condition, only 100 generations are permitted.
4.3. Experimental results and discussions

This study compares GAIS to the conventional ANN with

the GA. The conventional ANN with the GA, named GANN,

denotes the ANN model with the connection weights, which

are determined by the GA. This model does not use the gradient

descent algorithm but uses the GA to determine the connection

weights between layers. However, this model analyzes all

available training data to learn. On the other hand, GAIS also

uses the GA to determine the connection weights, but learns the

patterns of the stock market data from the selected instances

through an evolutionary search. For the GANN model, about

20% of the data is used for holdout and 80% for training. The

training data is used to search for the optimal or near-optimal

parameters and is employed to evaluate the fitness function.

The holdout data is used to test the results with the data that is

not utilized to develop the model. The number of the training

instances in GANN and the number of the selected instances

within the training instances in GAIS for each year are

presented in Table 3.

Table 4 describes the average prediction accuracy of each

model.

In Table 4, GAIS outperforms GANN by 6.22% for the

holdout data. In addition, GAIS has higher accuracy than

GANN by 10.74% for the training data. This result may be

caused by the benefits of the instance selection through

evolutionary search techniques.

The McNemar tests are used to examine whether GAIS

significantly outperforms GANN. This test may be used with

nominal data and is particularly useful with a before–after

measurement of the same subjects (Cooper & Emory, 1995).

The McNemar value and its p values of the holdout data are


K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526 525
5.262 and 0.022, respectively. This means that GAIS performs

better than GANN at the 5% statistical significance level.
5. Concluding remarks

Prior studies tried to optimize the controlling parameters of

ANN using global search algorithms. Some of them only

focused on the optimization of the connection weights of ANN.

Others placed little emphasis on the optimization of the

learning algorithm itself, but most studies focused little on

instance selection for ANN. In this paper, I use the GA for

ANN in two ways. I first use the GA to determine the

connection weights between layers. This may mitigate the

well-known limitations of the gradient descent algorithm. In

addition, I adopt the evolutionary instance selection algorithm

for ANN. This directly removes irrelevant and redundant

instances from the training data. I conclude that GA-based

learning and the instance selection algorithm (GAIS) signifi-

cantly outperforms the conventional GA-based learning

algorithm (GANN).

The prediction performance may be more enhanced if the

GA is employed not only for instance selection but also for

relevant feature selection, and this remains a very interesting

topic for further study. Although instance selection is a direct

method of noise and dimensionality reduction, feature

selection effectively reduces the dimensions of feature space.

In addition, while ANN performed well with GA-based

learning and instance selection, other instance-based learning

algorithms including CBR may also prove effective in place of

ANN. Of course, there are still many tasks to be done for GAIS.

The generalizability of GAIS should be further tested by

applying it to other problem domains.
References

Adeli, H., & Hung, S. (1995). Machine learning: Neural networks, genetic

algorithms, and fuzzy systems. New York: Wiley.

Bauer, R. J. (1994). Genetic algorithms and investment strategies. New York:

Wiley.

Blum, A., & Langley, P. (1997). Selection of relevant features and examples in

machine learning. Artificial Intelligence, 97(1–2), 245–271.

Choi, J.H., Lee, M.K., & Lee, M.W. (1995). Trading S&P 500 stock index

futures using a neural network. Proceedings of the third annual

international conference on artificial intelligence applications on wall

street (pp. 63–72). New York.

Coakley, J. R., & Brown, C. E. (2000). Artificial neural networks in accounting

and finance: Modeling issues. International Journal of Intelligent Systems

in Accounting, Finance and Management, 9, 119–144.

Cooper, D. R., & Emory, C. W. (1995). Business research methods. Chicago:

Irwin.

Dasarathy, B. V. (1990). Nearest neighbor (NN) norms: NN pattern

classification techniques. California: IEEE Computer Society Press.

Davis, L. (1994). Genetic algorithms and financial applications. In G. J.

Deboeck (Ed.), Trading on the edge (pp. 133–147). New York: Wiley.

Duke, L. S., & Long, J. A. (1993). Neural network futures trading—a feasibility

study. In: Society for worldwide interbank financial telecomunications,

adaptive intelligent systems (pp. 121–132). Amsterdam: Elsevier.

Gates, G. W. (1972). The reduced nearest neighbor rule. IEEE Transactions on

Information Theory, 18(3), 431–433.
Gupta, J. N. D., & Sexton, R. S. (1999). Comparing backpropagation with

a genetic algorithm for neural network training. Omega, 27(6),

679–684.

Hansen, J. V., McDonald, J. B., & Nelson, R. D. (1999). Time series prediction

with genetic-algorithm designed neural networks: An empirical comparison

with modern statistical models. Computational Intelligence, 15(3), 171–

184.

Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on

Information Theory, 14, 515–516.

Hertz, A., & Kobler, D. (2000). A framework for the description of

evolutionary algorithms. European Journal of Operational Research,

126(1), 1–12.

Hiemstra, Y. (1995). Modeling structured nonlinear knowledge to predict stock

market returns. In R. R. Trippi (Ed.), Chaos & nonlinear dynamics in the

financial markets: Theory, evidence and applications (pp. 163–175).

Chicago, IL: Irwin.

Kamijo, K., & Tanigawa, T. (1990). Stock price pattern recognition: A

recurrent neural network approach. Proceedings of the international joint

conference on neural networks (pp. 215–221). San Diego, CA.

Kim, K., & Han, I. (2000). Genetic algorithms approach to feature

discretization in artificial neural networks for the prediction of stock

price index. Expert Systems with Applications, 19(2), 125–132.

Kimoto, T., Asakawa, K., Yoda, M., & Takeoka, M. (1990). Stock market

prediction system with modular neural network. Proceedings of the

International Joint Conference on Neural Networks (pp. 1–6). San Diego,

CA.

Kohara, K., Ishikawa, T., Fukuhara, Y., & Nakamura, Y. (1997). Stock price

prediction using prior knowledge and neural networks. International

Journal of Intelligent Systems in Accounting, Finance and Management,

6(1), 11–22.

Kuncheva, L. I. (1993). ‘Change-glasses’ approach in pattern recognition.

Pattern Recognition Letters, 14, 619–623.

Kuncheva, L. I. (1995). Editing for the k-nearest neighbors rule by a genetic

algorithm. Pattern Recognition Letters, 16(8), 809–814.

Lee, K. H., & Jo, G. S. (1999). Expert systems for predicting stock market

timing using a candlestick chart. Expert Systems with Applications, 16(4),

357–364.

Liu, H., & Motoda, H. (1998). Feature transformation and subset selection.

IEEE Intelligent Systems and Their Applications, 13(2), 26–28.

Liu, H., & Motoda, H. (2001). Data reduction via instance selection. In H. Liu,

& H. Motoda (Eds.), Instance selection and construction for data mining

(pp. 3–20). Massachusetts: Kluwer Academic Publishers.

McSherry, D. (2000). Automating case selection in the construction of a case

library. Knowledge Based Systems, 13(2–3), 133–140.

Nikolopoulos, C., & Fellrath, P. (1994). A hybrid expert system for investment

advising. Expert Systems, 11(4), 245–250.

Odetayo, M. O. (1995). Knowledge acquisition and adaptation: A genetic

approach. Expert Systems, 12(1), 3–13.

Oh, K. J., & Han, I. (2000). Using change-point detection to support artificial

neural networks for interest rates forecasting. Expert Systems with

Applications, 19(2), 105–115.

Reeves, C. R., & Bush, D. R. (2001). Using genetic algorithms for training data

selection in RBF networks. In H. Liu, & H. Motoda (Eds.), Instance

selection and construction for data mining (pp. 339–356). Massachusetts:

Kluwer Academic Publishers.

Reeves, C. R., & Taylor, S. J. (1998). Selection of training sets for neural

networks by a genetic algorithm. In A. E. Eiden, T. Bäck, M. Schoenauer, &

H.-P. Schwefel (Eds.), Parallel problem-solving from nature-PPSN V.

Berlin: Springer.

Ritter, G. L., Woodruff, H. B., Lowry, S. R., & Isenhour, T. L. (1975). An

algorithm for a selective nearest neighbor decision rule. IEEE Transactions

on Information Theory, 21(6), 665–669.

Sexton, R. S., Dorsey, R. E., & Johnson, J. D. (1998). Toward global

optimization of neural networks: A comparison of the genetic algorithm

and backpropagation. Decision Support Systems, 22(2), 171–185.

Smyth, B. (1998). Case-base maintenance. Proceedings of the 11th

international conference on industrial & engineering applications of

artificial intelligence & expert systems (pp. 507–516).


K.-j. Kim / Expert Systems with Applications 30 (2006) 519–526526
Tetko, I. V., & Villa, A. E. P. (1997). Efficient partition of learning data sets for

neural network training. Neural Networks, 10(8), 1361–1374.

Tomek, I. (1976). An experiment with the edited nearest neighbor rule. IEEE

Transactions on Systems, Man, and Cybernetics, 6(6), 448–452.

Trippi, R. R., & DeSieno, D. (1992). Trading equity index futures with a neural

network. Journal of Portfolio Management, 19, 27–33.

Tsaih, R., Hsu, Y., & Lai, C. C. (1998). Forecasting S&P 500 stock index

futures with a hybrid AI system. Decision Support Systems, 23(2), 161–174.
Weiss, S. M., & Indurkhya, N. (1998). Predictive data mining: A practical

guide. California: Morgan Kaufmann Publishers.

Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited

data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408–421.

Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-

based learning algorithms. Machine Learning, 38, 257–286.

Wong, F., & Tan, C. (1994). Hybrid neural, genetic, and fuzzy systems. In G. J.

Deboek (Ed.), Trading on the edge (pp. 243–261). New York: Wiley.


	Artificial neural networks with evolutionary instance selection for financial forecasting
	Introduction
	Research background
	Instance selection methods
	Genetic algorithms
	Prior research on stock market prediction using ANN

	A GA approach to instance selection for ANN
	GA search phase
	Feed-forward computation phase
	Validation phase

	Application: analysis of the stock market data
	Application data
	Experiments
	Experimental results and discussions

	Concluding remarks
	References