PII: S0957-4174(03)00067-8


The collaborative filtering recommendation based on SOM

cluster-indexing CBR

Tae Hyup Roh
a,*, Kyong Joo Oh

b
, Ingoo Han

a

a
Graduate School of Management, Korea Advanced Institute of Science and Technology, 207-43 Cheongryangri-Dong,

Dongdaemun-Gu, Seoul 130012, South Korea
b
Department of Business Administration, Hansung University, Seoul, South Korea

Abstract

Collaborative filtering (CF) recommendation is a knowledge sharing technology for distribution of opinions and facilitating contacts in

network society between people with similar interests. The main concerns of the CF algorithm are about prediction accuracy, speed of

response time, problem of data sparsity, and scalability. In general, the efforts of improving prediction algorithms and lessening response

time are decoupled. We propose a three-step CF recommendation model, which is composed of profiling, inferring, and predicting steps

while considering prediction accuracy and computing speed simultaneously. This model combines a CF algorithm with two machine learning

processes, Self-Organizing Map (SOM) and Case Based Reasoning (CBR) by changing an unsupervized clustering problem into a supervized

user preference reasoning problem, which is a novel approach for the CF recommendation field. This paper demonstrates the utility of the CF

recommendation based on SOM cluster-indexing CBR with validation against control algorithms through an open dataset of user preference.

q 2003 Elsevier Ltd. All rights reserved.

Keywords: Collaborative filtering; Recommendation system; Self-organizing map; Case-based reasoning

1. Introduction

The advent of the network world induced by the rapid

development of the Internet and the accompanying adoption

of the Web has promoted the chances to create greater

business opportunities and to reach customers more easily.

This 24 £ 7 on-line accessibility has resulted in the

enlargement of choices, but customers are faced with

information overload. Their own arduous efforts are

required to retrieve information that matches their prefer-

ences. It is an automated and sophisticated decision support

system that is needed to suggest personalized information in

a brief form without going through an annoying process.

Collaborative filtering (CF) recommendation is a knowl-

edge sharing technology for distribution of opinions and

facilitating contacts in network society between people with

similar interests. The CF recommendation is the process of

multiple users sharing information on the preferences and

actions of an affinity group tracked by a system which, then,

tries to make useful recommendations to individual users

based on the patterns it predicts (Herlocker, Konstan,

Borchers, & Riedl, 1999; Kumar, Raghavan, Rajagopalan,

& Tomkins, 1998). CF recommendation also provides a

complementary tool for information retrieval systems that

facilitates users’ navigation in a meaningful and personal-

ized way. Most content retrieval methodologies use some

type of a similarity score to match a query describing the

content with key words, the individual titles or items, and

then present the user with a ranked list of suggestions.

However, conventional CF does not use any actual content

(e.g. words, description, URL, etc.) of the items. It is rather

based on preference ratings information to match users with

similar interests together and to predict a user’s rating for an

unseen item by examining his/her community’s rating for

that item. The CF recommendation systems are built on the

assumption that a good way to find interesting content is to

find other people who have similar interests and then

recommend items that those similar users like (Breese,

Heckerman, & Kadie, 1998).

Most research on recommendation systems can be

divided into three categories: technical system development

research, user behaviour and reaction research, and privacy

issues. Our focus is on technical system development

0957-4174/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0957-4174(03)00067-8

Expert Systems with Applications 25 (2003) 413–423

www.elsevier.com/locate/eswa

* Corresponding author. Tel./fax: þ82-29583685.

E-mail address: rohth@kgsm.kaist.ac.kr (T.H. Roh).

http://www.elsevier.com/locate/eswa


research, especially, the design and analysis of an algorithm

for CF recommendation. As the number of users and items

increases and the contents of each user’s preference to the

items changes, typical CF recommendation needs exponen-

tially growing computation time for finding an affinity group

and predicting each user’s unknown preferences (Cho, Kim,

& Kim, 2002; Claypool et al., 1999). We find the potential

for improving prediction accuracy and efficiency simul-

taneously by separating on- and off-line steps using recent

clustering and reasoning machine learning techniques. This

study presents a three-step CF model which is composed of

SOM profiling, CBR inferring and CF predicting step. The

SOM network has been studied as one of the most popular

unsupervized neural network models for clustering and

visualization in a number of real-world problems (Kohonen,

Hynninen, Kangas, & Laaksonen, 1996). The CBR is well

known as it benefits from the case-specific knowledge of

past problems to find solutions to the new problems (Kim &

Han, 2001). These two outperforming machine learning

methods can be combined for CF and increase the accuracy

and efficiency in the recommendation process.

The rest of this paper is organized as follows. Section 2

provides a brief overview of CF models for several

recommendation techniques and issues with an emphasis

on algorithmic features shown by previous research. Details

of the proposed CF model are provided in Section 3. Section

4 describes the dataset, evaluation metrics and experimental

design. Experiments are run on an open dataset associated

with the MovieLens preference rating dataset, six exper-

imental protocols and two evaluation metrics for the various

algorithms. Results are shown in Section 5 and conclusion

are presented in Section 6.

2. Background

2.1. Collaborative filtering recommendation

The fundamental function of CF is to predict the

preferences of one user, referred to as the ‘active user’.

The problem space can be formulated as a matrix of users

versus items, with each cell representing a user’s rating on

a specific item. Let I be the whole set of items, Ihð, IÞ be the
subset that has been rated by the active user ðUaÞ; and Ir ¼

ðI > ICh Þ be the subset that has not been rated by the Ua: CF
systems estimate Ua‘s preferences for items in Ir based on

the overlap between his/her preference ratings for items in Ih
and those of the other users.

The key advantage of CF is that it does not consider the

content of the items being recommended, but human

determine the relevance, quality, and interest of an items

in the information stream. As a result, filtering can be

performed on items that are hard to analyze with computers,

such as Multimedia Component, ideas, feelings, people, and

so on. Rather than mapping users to items through ‘content

attributes’ or ‘demographics’, CF treats each item and user

individually. Accordingly, it becomes possible to discover

new items of interest simply because other people like them.

At the same time, CF’s dependence on human ratings can be

a drawback. For a CF system to work well, several users

must evaluate each item; even then, new items cannot be

recommended until some users have taken the time to

evaluate them. These limitations, often referred to as ‘data

sparsity’ and ‘cold start problem’, cause trouble for users

seeking obscure items (since nobody may have rated them)

or advice on new items (since nobody has had a chance to

evaluate them) (Good et al., 1999).

CF related research starts from the Tapestry system,

out of Xerox (Goldberg, Nichols, Oki, & Terry, 1992)

which coined the term ‘collaborative filtering’ in the

context of a system for filtering email using binary

category flags. Tapestry was a full-featured filtering

system for electronic documents—primarily electronic

mail and Usenet postings. The GroupLens is a pioneering

and ongoing effort in CF (Good et al., 1999; Herlocker

et al., 1999; Konstan et al., 1997; Resnick, Iacovou,

Sushak, Bergstrom, & Riedl, 1994; Schafer, Konstan, &

Riedl, 2001). The GroupLens team initially implemented

a neighbourhood-based CF system for rating Usenet

articles. Several similar systems were developed around

the same time as the GroupLens Usenet system, including

the Ringo music recommender which used a number of

measures of distance between users, including Pearson

correlation, constrained Pearson correlation, vector cosine

(Shardanand & Maes, 1995), and the Bellcore Video

Recommender (Hill, Stead, Rosenstein, & Furnas, 1995).

These three research systems used what have come to be

called neighbourhood-based prediction algorithms. Due to

their speed, flexibility, and understandability, neighbour-

hood-based prediction algorithms are currently one of the

most effective ways to compute predictions in CF.

Breese et al. (1998) identify two major classes of CF

prediction algorithms; memory-based CF and model-based

CF. Memory-based algorithms operate over the entire user

database to make predictions. The most common memory-

based models are based on the notion of nearest

neighbours, using a variety of distance measures.

Model-based systems are based on a compact model

inferred from the data. They compare a number of

algorithms including Bayesian clustering, decision-tree

modelling, and also show that neighbourhood-based CF

performed better than Bayesian belief networks for non-

binary domains. Bayesian network and correlation models

are the best-performing if computational complexity is not

taken into account. In this framework, our SOM cluster-

indexing CBR CF predictor model would be considered

model-based CF.

More recently, a number of machine learning tech-

niques and hybrid filtering techniques have been chal-

lenged. Hybrid filtering models combine recommendations

from multiple sources which include the content of the

item or page, the ratings of users, content-based filtering,

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423414


and demographic information and so on. Balabanovı́c and

Shoham (1997) apply ‘Selection agent’, which decides the

recommendation algorithm between content-based filtering

and CF. Pazzani (1999) shows the hybrid approach for

recommendation that uses more of the available infor-

mation and consequently has more precise recommen-

dations. The strengths of the different approaches can be

complementary. Basu, Hirsh, and Cohen (1998) present an

inductive learning approach to recommendation that is

able to use both ratings information and other forms of

information about each item in predicting user prefer-

ences. Delgado and Ishii (1999) suggest a weighted-

majority rating approach and Pennock, Horvitz, Lawrence,

and Giles (2000) suggest a hybrid memory and model

based approach for personality diagnosis that computes

the preference probability with the same personality

grouping. However, these efforts of improving prediction

algorithms are decoupled from computational complexity

and response time issues.

In this paper, we introduce a computational machine

learning CF model which comprises off-line learning part

and on-line preference predicting part. This model focuses

on accuracy and efficiency simultaneously by lessening on-

line computation complexity with SOM cluster-indexing

CBR process. Since we adapt the dense user-item matrix

using the reference data set induced by SOM clusters’

centroid value of each item, the correlation matrix is directly

computed and then the active user’s preference to item is

predicted.

2.2. Self-organizing map

The SOM network, known as competitive learning or

self-organization, has been shown as one of the most

popular unsupervized competitive neural network learning

models, for clustering and visualization in a number of real-

world problems (Kohonen et al., 1996). It is capable of

mapping high-dimensional similar input data into clusters

close to each other. It has two-layer, fully connected

networks with a weight matrix. Sometimes, SOM called

‘topology-preserving maps’, assumes a topological

structure among the cluster units. A topological map is

simply a mapping that preserves neighbourhood relations

and performs a topology-preserving projection from the

data space onto a regular two-dimensional grid. The

resulting maps provide users an intuitive and familiar way

of correlating and illustrating input data sets. Furthermore,

SOM can be used for clustering, classification, and

modelling. The versatile properties of SOM make it a

valuable tool in data mining. Relating to the clustering

capability of SOM, Mangiameli, Chen, and West (1996)

demonstrate that it is a better clustering algorithm than

hierarchical clustering with overlapped dispersion, irrele-

vant variables, outliers or different sized populations. For

that reason, SOM has been adapted as an analytical tool in

various marketing domains including database marketing

(Ha & Park, 1998), segmentation of on-line markets

(Vellido, Lisboa, & Meehan, 1999) and automatic labelling

of customer clusters (Yuan & Chang, 2001).

In this suggested CF recommendation model, we focus

on the clustering capability of SOM. Given a set of users’

preference patterns to items, X; the algorithm returns a

prototype (a set of cluster’s centroid values) yi for each

cluster i: The prototypes are sometimes called neurons. The

number of clusters, M; is a parameter that must be provided

a priori. In the algorithm, first each prototype yp is randomly

initialized (line 4). In the main loop (lines 5 – 10), one

randomly selects an element x [ X and determines the
neuron yp that is nearest to x: In the inner loop (lines 8, 9),

one considers all neurons y that are within a neighbourhood

NðypÞ of yp; including yp; and updates them according to the

formula in line 8. The effect of neuron updating is to move

neuron y closer to pattern x: The degree by which y is moved

towards x is controlled by the parameter g; which is called

the learning rate. It has to be noted that gis dependent on the

distance between y and yp; i.e. if neuron y [ NðypÞ has a
smaller distance to yp than neuron y0 [ NðypÞ then y is
moved towards x by a larger degree than neuron y0: After

iterations through the repeat-loop, the learning rate g is

reduced by a small amount, thus facilitating convergence of

the algorithm. It can be expected that after a sufficient

number of iterations the yi’s has moved into areas where

many xj’s are concentrated. Hence each yi can be regarded

as a cluster’s centroid value which is used for the next CBR

process as the ‘cluster-indexed reference set’. The Pseudo

code description is shown in Fig. 1.

In spite of several excellent applications, SOM has some

limitations that hinder its performance. The typical

limitations and the settlements are due to the vulnerability

of convergence along a number of cluster and weight

initialization, network size, and stopping rule conditions

(Kim & Han, 2001). To determine the number of cluster, we

adapt the visualization techniques by use of principle

component analysis (PCA). PCA was first introduced in

1901 by Pearson and Hotelling generalized it to random

variables in 1933. The idea is to keep only the ‘principal’

eigenvectors (components). The number of eigenvectors to

retain depends on the variances (eigenvalues) but is

typically small. If v eigenvectors are retained, data are

Fig. 1. Pseudo code description of self-organizing map.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 415


projected along the first n principal eigenvectors. In this

study, PCA facilitates dimensionality reduction for off-line

clustering of user and rapid online cluster assignment, and

also users are projected onto the ‘eigen-plane’ in 2 or 3D

scatter plot for visualization.

2.3. Case based reasoning

CBR is a methodology for building an analogy process,

one way of human reasoning, which is the inference that a

certain resemblance implies further similarity. It makes

direct use of past experiences or cases to solve a new

problem by recognizing its similarity with a specific known

problem and by applying to find a solution for the current

situation (Chiu, 2002; Choy, Lee, & Lo, 2002). CBR

applications can be broadly used for two main problem

types: classification and synthesis tasks. The classification

task is to match case against those in the case base to

determine what type, or class, of case it is, and then the

solution from the best matching case is reused. Synthesis

task attempts to create a new solution by combining parts of

previous solutions. CBR systems that perform synthesis

tasks must make use of adaptation and are usually hybrid

systems combining with other techniques.

Its main advantages over other techniques are as follows:

in the CBR system, most knowledge is acquired in the case

base and so it reduces the knowledge acquisition effort. That

is, it makes use of existing case database, so it requires less

general knowledge which is very difficult to get. Second, it

requires less maintenance effort. Since rule bases or models

should consider many dependencies between rules and

effects of changes of the rule base are hard to predict, it is

difficult to maintain. However, case bases are easier to

maintain, because cases are independent of each other,

domain experts and novices understand cases quite easily

and maintenance of the CBR system can be done by

adding/deleting cases.

CBR algorithms have been used for marketing decision

making processes. Hui, Fong, and Jha (2001) present the

hybrid CBR – ANN approach that integrates ANN with

the CBR cycle to extract knowledge from service records

for the web customer service. Choy et al. (2002) apply CBR

to integrate customer relationship management (CRM) and

supplier relationship management (SRM) for facilitating

supply chain management of supplier selection. Chiu (2002)

suggests a case based customer classification approach for

direct marketing, which combines Genetic Algorithm and

the CBR process.

The traditional process involved in CBR can be

represented by a schematic cycle as shown in Fig. 2.

Aamodt and Plaza (1994), Bradley (1994) describe CBR as

a cyclical process: representation, retrieval, reuse, revision,

and retainment.

Case retrieval searches the case base to select existing

cases sharing significant features with the new case.

Through the retrieval step, similar cases that are potentially

useful to the current problem are retrieved from the case

base. The computing of the degree of similarity between the

input and the target case can usually be calculated using

various similarity functions among which nearest-neighbour

matching is one of the frequently used methods. Nearest-

neighbour matching is a quite direct method that uses a

numerical function to compute the degree of similarity.

Usually, cases with higher degrees of similarity are

retrieved. A typical numerical function is shown in the

following formula (Kolodner, 1993).

X
n

i¼1
Wi £ simðf

I
i ; f

R
i ÞX

n

i¼1
Wi

where Wi is the weight of the ith feature, f
I
i is the value of

the ith feature for the input case, f Ri is the value of the ith

feature for the retrieved case, and sim ( ) is the similarity

function for f Ii and f
R
i :

In our suggested CF model, CBR process provides

classification and synthesis with additional generalized

knowledge derived from the users’ explicit preference

patterns. Generalized knowledge can be acquired by the

centroid values of clusters obtained using clustering

techniques, which are added to the case base as

representative cases and then used as a case indexing

scheme in order to retrieve more relevant cases. The

cluster-indexing approach assumes that there are some

different subgroups (clusters) in each rated group.

The centroid values of clusters are new artificial cases

that extract the information from the whole case base and

represent each clustered case.

3. SOM cluster-indexing CBR CF recommendation

This study utilizes the outperforming SOM as a

clustering tool and the strength of CBR as assistance to

index and retrieve like-minded users. The point of this

study is to support the usage of the SOM for finding

clusters with centroid value of items and CBR for

indexing and retrieval in the CF recommendation process.

The centroid values of clusters are the values of weight

vectors that are the interim results from SOM learning

processes, and these are standardized due to difference of

Fig. 2. CBR process as a schematic cycle comprising the five ‘Re’s.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423416


rating scale. The standardized centroid values have the

same representation scheme as raw users’ rating value in

spite of being learned artificial cases. These values

represent clustered users of the entire user-base, and

they are used as an indexing tool for each user. These

standardized centroid values are consistent if learning is

implemented again with the same parameters, although

the addition of new users modify the standardized

centroid values just a little.

The definite process of the cluster-indexing method is

composed of three steps: profiling, inferring and predicting

steps. In the profiling step operated in the back-office, PCA

and preliminary SOM testing are performed to fit the stable

cluster condition. Clusters are derived from the dense subset

of user-item rating DB. All the training users are indexed by

the SOM process in accordance with the similarity to the

centroid values of each cluster. While the inferring step,

CBR compares an active user with the centroid values. The

most similar cluster is inferred from reference users indexed

within the selected cluster. After the inferring step,

preference prediction is performed on-line with correlation

based CF between an active user and reference users of a

selected cluster. Fig. 3 depicts the proposed model

architecture.

3.1. Profiling step

In the profiling step, PCA is used for visualizing users’

patterns and reducing the dimension of input items before

SOM clustering. Clusters are then derived from the SOM

process using reference user DB with dense preference

rating data. The reference users are indexed by the cluster

and comprise the cluster-indexing reference DB.

Step 1. User clustering with the centroid values of

clusters by the SOM.

1.1 Explore users’ distribution by PCA.

1.2 Determine number of clusters using PCA factors.

1.3 Initialize weight vectors of the SOM.

1.4 Find clusters and the standardized centroid values of

clusters.

3.2. Inferring step

When the item preference prediction of an active user is

requested, an active user’s rating information is compared

with the standardized centroid values of each cluster.

Through indexing and retrieval part of the CBR process, the

most similar cluster is determined and retrieved.

Step 2. Active user indexing and retrieval with CBR.

2.1. Index an active user with the centroid values of

clusters having minimum distance calculated by the

k-Nearest Neighbour method, which is

Min_D ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Xn

m¼1

l
va;m

S
2 Cref;ml

2

vuut

where m is the given item, ni;m is the active user’s rating

value to item m; S is the standardizing factor for rating scale,

and Cref;m represents the centroid values to item m of the

fixed clusters.

2.2. Retrieve the neighbours that were indexed in the

same cluster.

3.3. Predicting step

The active user’s predicted preference value to the target

item is calculated by a Pearson-correlation based filtering

formula in the on-line prediction part.

Step 3. Prediction of active user’s predicted preference

value.

3.1. Calculate Pearson correlation between the active

user and the most similar neighbours that were indexed in

the same cluster

wða; iÞ ¼
X

j

ðva;j 2 maÞ

sa

ðvi;j 2 miÞ

si

where wða; iÞ is the Pearson correlation coefficient to

compute the weight for each user’s contribution that is

indexed in the same cluster.

3.2. Compute the prediction Pa;t of active user ðUaÞ on

target item It:

Pa;t ¼ ma þ k
X
i–a

wða; iÞðvi;t 2 miÞ

where the sum is same cluster-indexing users in the

reference DB, ni;m is the rating cast by the other user i on

item t; and ma is Uas mean rating. The constant k in front of

the sum is an appropriate normalization factor.

The process of the cluster-indexing method is exempli-

fied in Fig. 4 If an active user’s preference rating {5, _, 2,

2, …,?,…, 4} is given, at first, it is indexed as the most

similar cluster C2 using the cluster-indexing DB, which is

clustered by SOM in an off-line learning process. Next, this

method retrieves the nearest neighbour users out of that
Fig. 3. Model architecture of SOM cluster-indexing CBR CF

recommendation.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 417


cluster-indexing user group (C2). Finally, the predictive

value of the active user’s target item is calculated by use of

the prediction formula.

4. Experiments

Experiments are run on open datasets, six different

experimental predictors, and two evaluation metrics for

the various algorithms. Results are compared with base-line

models and other comparative models in terms of the level of

data sparsity and machine learning techniques.

4.1. Data: MovieLens dataset

MovieLens data sets were collected by the GroupLens

Research Project at the University of Minnesota

(http://www.cs.umn.edu/Research/GroupLeans/data). The-

Fig. 4. Computation example of SOM cluster-indexing CBR CF model.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423418

http://www.cs.umn.edu/Research/GroupLeans/data


historical dataset consists of 100,000 ratings from 943

users on 1682 movies with every user having at least 20

ratings and simple demographic information for the users

(age, gender, occupation, zip code) is included. The

ratings are on a numeric five-point scale with 1 and 2

representing negative ratings, 4 and 5 representing

positive ratings, and 3 indicating ambivalence.

We sample a reference user set that has enough rating

information to discover similar user patterns. The number of

users that are extracted as the reference sample is 251.

Among 100,000 rating records, 6093 were generated from

251 users rating data to 33 items out of 10 movie genres.

4.2. Evaluation metrics

Recommender systems researchers use several different

measures for the quality of recommendations produced:

statistical accuracy, decision-support metrics, coverage

measures. We apply two metrics to our evaluation:

normalized mean absolute error (NMAE) and the receiver

operating characteristic (ROC) curve including the area

under ROC curve, which are used by Goldberg, Roeder,

Gupta, and Perkins (2001) and Good et al. (1999).

NMAE. We look at the average absolute deviation of the

predicted rating to the actual rating on items the users in the

test set have actually voted on. That is, if the number of

predicted ratings in the test set is in the active case, ma; then

the average absolute deviation for an active user is

MAE ¼
1

ma

Xma
j¼1

lPa;j 2 va;jl

Since our numerical rating scale gives ratings over the range

[1, 5], we normalize to express errors as percentages of full

scale: NMAE is

NMAE ¼
MAE

rmax 2 rmin

ROC curve and area under ROC curve. This metric

evaluates the performance of classification scheme in

which there is one variable with two categories by which

subjects are classified. ROC sensitivity is a signal proces-

sing measure of the decision making power of a filtering

system. Operationally, it is the area under the ROC—a

curve that plots the sensitivity vs. 1—specificity of the test

(Swets, 1988). Sensitivity refers to the probability of a

randomly selected good item being accepted by the filter.

Specificity is the probability of a randomly selected bad

item being rejected by the filter. Points on the ROC curve

represent trade-offs supported by the filter. The ROC

sensitivity ranges from 0 to 1 where 1 is perfect and 0.5 is

random.

To test the difference of six protocols’ performance, we

use one-way ANOVA with a post hoc test—Bonferroni

procedure in the equal variance assumed for multiple

comparisons statistics of MAE (Breese et al., 1998). In this

study, the post hoc test procedure is used for investigating

the differences between specific experimental protocols in

conjunction with ANOVA comparing each protocol’s MAE

difference.

4.3. Experimental setup

At first, we build the dataset into three experimental sets

for testing the effect of the available information level (the

number of items on which active users have rated). In the

first set, named Allbut1, we withhold one selected item for

each user in the test set, and try to predict its value given all

the other ratings the user has voted on. In the second and

third set of experiments, we select five and 10 ratings from

each test user as the observed ratings, and then attempt to

predict those preference levels. We call these Given5, and

Given10. The Allbut1 experiments measure the algorithms’

performance when given as much data as possible from each

test user. The Given experiments look at users with less data

available, and examine the performance of the algorithms

when there is relatively little known about an active user.

Secondly, we present metrics derived from empirical

analysis of the proposed SOM cluster-indexing CBR CF

model, hereafter referred to as SCP, compared to baseline

models and comparative 3-step models. The list of

experiment protocols is shown in Table 1.

4.3.1. SOM cluster-indexing CBR CF model

On the belief that an affinity group can be clustered

according to the distribution of their rating values, a couple

of clustering and visualizing methods are performed to find

the number of clusters. In this study, clustering techniques

involve two distinct works: (1) the determination of the

number of clusters present in the reference-base; and (2)

the assignment of reference users to one cluster. The

number of clusters, which is the number of nodes in the

output layer, depends on the expected number of clusters,

but there is currently no apparent practical or theoretical

way of determining the optimal size of the output layer

(Nour & Madey, 1996). There is a possible instability due

to the randomness of clusters, so it requires a policy for

initial cluster selection. To reduce this possibility, the SCP

model contains the PCA process and preliminary SOM

clustering. At first, each reference user’s distribution is

Table 1

List of experiment protocols

Protocol Description

Proposed model SCP SOM cluster-indexing CBR CF predictor

Comparative model

Baseline model UAP By-user-average CF predictor

IAP By-item-average CF predictor

SPP Simple Pearson CF predictor

3-Step model SIP SOM cluster induction CF predictor

SNP SOM cluster neural network CF predictor

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 419


summarized by PCA using 6093 rating records in the 33

movie among 10 genres before SOM clustering. In this

experiment, we chose three components by PCA, and data

are projected onto the eigen-plane for pre-visualization

shown as Figs. 5 and 6.

In preliminary SOM clustering, the number of clusters is

set as 2 – 10. When deciding the optimal number of clusters,

the lowest cluster number is selected so that each cluster can

have as many indexed reference users as possible. If a

cluster has no or only a few users, this cluster does not have

sufficient user cases to find a more similar user within

indexed users. The total number of neurons in the output

layer is decided as being four clusters according to the

results of the preliminary SOM that was performed to find

the number of clusters.

Based on the SOM learning, each user in reference DB is

indexed into a cluster. In addition to indexing, the centroid

values of each item are deduced and its average values of

each genre cluster suggest each cluster’s inferable genre

preference characteristics shown in Table 2(a) and (b). It

appears as if each cluster is grouped by rating level, and a

quick glance seems to suggest that the average rating is

going up, i.e. C4 (0.7810) is relatively more liberal than C1

(0.5400). However, a closer look shows that each cluster has

different genre and movie preferences. For example, C1 is

a comparatively negative group, but this group prefers tough

genres such as horror, crime, and war Multimedia

Component to soft ones such as drama, romance films. On

the other hand, C3 rates higher than C1. This affinity group

likes science-fiction and adventure rather than horror, which

is the most preferred genre for C1.

4.3.2. Comparative models

Previous research on CF algorithms has tended to

compare the performance of algorithms exclusively within

that research, so making comparisons of algorithm per-

formance from paper to paper is difficult. We show three

baseline predictors, by-user-average CF predictor (UAP),

by-item-average CF predictor (IAP) and simple Pearson CF

predictor (SPP), to provide benchmarks against which any

predictor could be compared. The baseline algorithms are

simple, efficient and return reasonable results. The UAP

returns the average of the ratings the given user has already

entered. The IAP gives the average rating for the given

movie of all users that have voted for that movie. The SPP

returns the Pearson correlation based neighbourhood

prediction.

To demonstrate the utility of the SCP model, we change

the CBR process into neural network and induction

technique as a classification method withholding the 1st

SOM profiling step and the 3rd Pearson correlation-based

prediction step. The SOM neural network CF Predictor

(SNP) uses the well known back propagation neural network

algorithm in the classification step and all decision

coefficients are controlled to achieve the best prediction

accuracy. The SOM induction CF Predictor (SIP) uses

decision a tree technique known as induction, especially

SEE 5.0 algorithm which is an upgraded version of Quinlan

(1993) decision tree classifier C4.5.

Fig. 6. Scatter plot of reference users projected onto the 3D ‘eigen-plane’

for visualization.

Fig. 5. Scree curve of eigenvalue explained by components. The largest

amount of eigenvalue is explained by the first component. The first three

components can be the representative factors of the population dataset.

Table 2a

Genre average centroid values of SOM cluster

Genre Children’s Drama Adventure Science-fiction Crime Thriller War Romance Comedy Horror Average

Cluster C1 0.4845 0.4775 0.5432 0.5435 0.6473 0.5477 0.6090 0.4980 0.5540 0.6870 0.5400

C2 0.5595 0.6050 0.6756 0.6525 0.8367 0.6693 0.6570 0.5590 0.5540 0.5790 0.6490

C3 0.6910 0.5820 0.7218 0.7488 0.6340 0.6857 0.7090 0.5970 0.9313 0.5450 0.6860

C4 0.7180 0.6965 0.8336 0.8124 0.8210 0.7910 0.8820 0.6785 0.6897 0.7230 0.7810

Bold numbers mean the most preferred genre in each cluster, and italicized numbers are the worst.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423420


5. Results

To validate the effectiveness of SCP, prediction accuracy

is compared with the comparative experimental algorithms

in terms of NMAE and the area under the ROC curve.

Table 3(a) and (b) tabulates the accuracy results of each

protocol on the movie dataset.

SPP reflects better users’ preference than simple average

predictors (UAP, IAP) in the all experiment dataset,

Allbut1, Given5 and Given10 at the 5% significance level

among the baseline model, which are the same results as the

ROC area metric. These results support that CF algorithm is

more accurate than simple average predictors that have not

considered the like-minded affinity group to predict active

users’ preference. Comparing between UAP and IAP, when

not enough rating data are available to apply a CF algorithm,

user average value has more predictive power than item

average value.

SCP and SNP, which use a clustering-classification

method, yield superior results to SPP, which is a type of

memory-based model. However, the SIP model shows

worse performance than all other experimental protocols. So

model-based CF methods, especially machine learning

technique-based CF models, have promising potential to

improve, but their success depends on methodology

suitability. Especially, the SCP model shows an out-

performing result (0.1583 NMAE on Given 10 and an

ROC area of 0.8461) compared with other comparative

modelling by NMAE and ROC area metrics. It dominates

UAP, IAP, SPP, SIP models at the 5% and 1% significance

levels and yields better performance than SNP. According

to our experiment, the SCP model can alleviate prediction

error about 4%.

From the viewpoint of data sparsity, Given10’s average

NMAE (0.1790), which has less preference information,

reports a higher error rate than Allbut1 (0.1564) and Given5

(0.1747). This implies that as explicit information, about

preference rating to items, becomes sparse, the prediction

accuracy decreases. In this study, we make a pre-filtered

DB, named as reference DB which is composed of dense

user-item rating data. Building a high-density preference

DB can be one approach for alleviating the sparsity problem

to achieve higher recommendation accuracy.

Overall, among the ROC curves illustrated in Fig. 7, the

SCP model is stably dominant to other clustering-classifi-

Table 3a

Performance results: Prediction accuracy and area under ROC curve of

each protocol

Protocol Metric

NMAE ROC Area

Allbut 1 Given5 Given10

UAP 0.1600 0.1863 0.1901 0.7127

IAP 0.1714 0.1930 0.1951 0.6071

SPP 0.1497 0.1719 0.1734 0.8006

SNP 0.1413 0.1557 0.1638 0.8166

SIP 0.1687 0.1892 0.1933 0.7430

SCP 0.1475 0.1524 0.1583 0.8461

Average 0.1564 0.1747 0.1790

Lower value of NMAE metric indicates better performance, and for

ROC area, it is vice versa.

Table 2b

Centroid values of SOM cluster by item

Genre Items Cluster

C1 C2 C3 C4

Children’s 1 0.407 0.520 0.739 0.751

2 0.562 0.599 0.643 0.685

Drama 3 0.488 0.611 0.581 0.766

4 0.467 0.599 0.583 0.627

Adventure 5 0.431 0.666 0.675 0.759

6 0.690 0.790 0.841 0.966

7 0.543 0.637 0.741 0.828

8 0.690 0.752 0.754 0.913

9 0.362 0.533 0.598 0.702

Science-fiction 10 0.705 0.822 0.942 0.974

11 0.676 0.676 0.857 0.941

12 0.597 0.689 0.884 0.891

13 0.516 0.622 0.705 0.705

14 0.364 0.489 0.606 0.663

15 0.542 0.607 0.766 0.839

16 0.570 0.682 0.647 0.790

17 0.523 0.685 0.599 0.783

18 0.503 0.673 0.681 0.797

19 0.556 0.699 0.758 0.850

20 0.427 0.534 0.741 0.681

Crime 21 0.593 0.875 0.592 0.902

22 0.670 0.783 0.666 0.784

23 0.679 0.852 0.644 0.777

Thriller 24 0.466 0.628 0.733 0.835

25 0.467 0.560 0.588 0.615

26 0.710 0.820 0.736 0.923

War 27 0.609 0.657 0.709 0.882

Romance 28 0.569 0.642 0.645 0.779

29 0.427 0.476 0.549 0.578

Comedy 30 0.655 0.631 0.800 0.815

31 0.601 0.585 0.519 0.630

32 0.406 0.446 0.575 0.624

Horror 33 0.687 0.579 0.545 0.723

Table 3b

Performance results: Statistical significance test—one-way ANOVA with

the post hoc test

IAP SPP SNP SIP SCP

UAP 20.0202 0.0665** 0.1052*** 20.0131 0.1271***

IAP – 0.0867** 0.1253*** 0.0070 0.1472***

SPP – – 0.0387* 20.0797** 0.0605**

SNP – – – 20.1183*** 0.0219

SIP – – – – 0.1402**

Bonferroni procedure based on MAE: mean differences and signifi-

cance level at ***1%, **5% and *10% level for the pair-wise comparison

of performance between protocols.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 421


cation CF models and memory-based models. This implies

that when the item recommendation criterion is changed, the

SCP model can be flexibly applied. For example, even if the

recommendable item criterion is changed from 5 to 4 stars

in the movie recommendation, the SCP model does still

works.

6. Conclusion

In this paper, we propose an SCP model which applies

two combing machine learning techniques, SOM and CBR

to the consecutive CF prediction process as a new

approach in the CF recommendation field. This study

shows that cluster-indexing CBR is an effective user

indexing and retrieval method for CF recommendation.

Instead of using all user’s ratings for retrieval of nearest

neighbour, SOM cluster-indexing CBR approach allevi-

ates the on-line computational complexity by use of

significant cluster-centroid values induced from SOM

process. The SOM facilitates affinity user grouping and

extraction of representative centroid values of each

cluster’s items for assisting case indexing and retrieval

of CBR. For SOM clustering of our study, most of the

computational costs are involved in the training process

and this process is done off-line in the profiling step. After

that, the CBR process and CF prediction in the selected

cluster are a compact representation of the raw ratings

information, and thus the time and space complexities on

making recommendations are quite low.

The performance of our model yields superior results

compared to memory-based CF techniques and other

previous hybrid CF models. The NMAE values induced

from our model indicate that predicted rating values will be

within roughly 15% of the true rating values. So the items

with predicted ratings well above the mean for a new user in

many cases will correspond to desirable items for that user.

These accuracies are comparable with those reported for a

completely different data set (jokes); the algorithms in

Goldberg et al. (2001) show NMAE from 0.187 to 0.237 in

the 20 unit rating scale [210, þ10]. Herlocker et al. (1999)

report MAE from 0.768 to 0.828. When these are normal-

ized to the 4 unit rating scale [1, 5], they yield NMAE from

0.192 to 0.207 in the same MovieLens data set. According

to Goldberg et al. (2001), if user ratings are distributed

uniformly or normally, random predictions yield NMAE of

33 and 28%, respectively. Our model also yields superior

performance when compared to other traditional memory-

based CF algorithms and other NN and induction based CF

prediction algorithms. This suggests that there is room for

improved accuracy for all current CF algorithms.

We are experimenting with a number of variations, such

as k-means clustering and hybrid approaches with adaptive

online weighting to further improve accuracy without

altering online computation time. This study compares

several computational approaches to CF recommendation,

considering prediction accuracy and response speed simul-

taneously. Especially, data mining techniques, such as

SOM, NN and CBR have shown the potential for improving.

The promising potential for CF systems can be investigated

by integrating with product/customer-specific information

profiling, implicit information analysis such as web-page

navigation history and retrieval technology. In the future

research, we will suggest hybrid recommendation algor-

ithms and try to apply our model into a real-world

personalized recommendation site.

Acknowledgements

This research was financially supported by Han Sung

University in the year of 2003.

References

Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational

issues, methodological variations, and system approaches. Artificial

Intelligence Communications, 7(1), 39 – 59.

Balabanovı́c, M., & Shoham, Y. (1997). Fab: Content-based,

collaborative recommendation. Communications of the ACM, 40(3),

66 – 72.

Basu, C., Hirsh, H., Cohen, W., (1998). Recommendation as classification:

Using social and content base information in recommendation.

Proceedings of the 1998 Workshop on Recommender Systems

(pp.11 – 15)

Bradley, P. S. (1994). Case-based reasoning: Business applications.

Communication of the ACM, 37(3), 40 – 43.

Breese, J.S., Heckerman, D., Kadie, C., (1998). Empirical analysis of

predictive algorithms for collaborative filtering. Proceedings of the 14th

Conference on Uncertainty in Artificial Intelligence (UAI-98)

(pp. 43 – 52)

Chiu, C. (2002). A case-based customer classification approach for

direct marketing. Expert Systems with Applications, 22(2),

163 – 168.

Fig. 7. ROC curve comparison of CF prediction protocols. Upper line

indicates more prediction accuracy.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423422


Cho, Y. H., Kim, J. K., & Kim, S. H. (2002). A personalized recommender

system based on web usage mining and decision tree induction. Expert

Systems with Applications, 23(3), 329 – 342.

Choy, K. L., Lee, W. B., & Lo, V. (2002). Development of a case based

intelligent customer-upplier relationship management system. Expert

Systems with Applications, 23(3), 281 – 297.

Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin,

M., (1999). Combining content-based and collaborative filters in an

online newspaper. ACM SIGIR’99 Workshop on Recommender

Systems, Berkely, CA

Delgado, P.J., Ishii, N., (1999). Memory-based weighted majority

prediction for recommender systems. SIGIR Workshop on Recommen-

der Systems

Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using

collaborative filtering to weave an information tapestry. Communi-

cations of the ACM, 35(12), 61 – 70.

Goldberg, K., Roeder, R., Gupta, D., & Perkins, C. (2001). Eigentaste: A

constant time collaborative filtering algorithm. Information Retrieval

Journal, 4(2), 133 – 151.

Good, N., Schafer, J.B., Konstan, J., Borchers, A., Sarwar, B.,

Herlocker, J., Riedl, J., (1999). Combining collaborative filtering

with personal agents for better recommendations. Proceedings of the

1999 Conference of the American Association of Artificial

Intelligence (AAAI-99)

Ha, S. H., & Park, S. C. (1998). Application for data mining tool to hotel

data mart on the Internet for database marketing. Expert Systems with

Applications, 15(1), 1 – 31.

Herlocker, J., Konstan, J., Borchers, A., Riedl, J., (1999). An algorithmic

framework for performing collaborative filtering. Proceedings of the

1999 Conference on Research and Development in Information

Retrieval

Hill, W., Stead, L., Rosenstein, M., & Furnas, G. (1995). Recommending

and evaluating choices in a virtual community of use. CHI 95, Denver,

CO: ACM Press, pp. 194 – 201.

Hui, S. C., Fong, A. C. M., & Jha, G. (2001). A web-based intelligent fault

diagnosis system for customer service support. Engineering Appli-

cations of Artificial Intelligence, 14(4), 537 – 548.

Kim, K. S., & Han, I. G. (2001). The cluster-indexing method for case-

based reasoning using self-organizing maps and learning vector

quantization for bond rating cases. Expert Systems with Applications,

21(3), 147 – 156.

Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., (1996). SOM_PAK:

The self-organizing map program package. Technical Report A31,

Helsinki University of Technology, Laboratory of Computer and

Information Science, FIN-02150

Kolodner, J. L. (1993). Case-Based Reasoning. Los Altos, CA: Morgan

Kaufmann.

Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., &

Riedl, J. (1997). GroupLens: Applying collaborative filtering to Usenet

news. Communications of the ACM, 40(3), 77 – 87.

Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A., (1998).

Recommendation systems: A probabilistic analysis. Proceedings of

the 39th Annual Symposium on Foundations of Computer Science

Mangiameli, P., Chen, S. K., & West, D. (1996). A comparison of SOM

neural network and hierarchical clustering methods. European Journal

of Operation Research, 93, 402 – 417.

Nour, A. N., & Madey, G. R. (1996). Heuristic and optimization approaches

to extending the Kohonen self organizing algorithm. European Journal

of Operation Research, 93, 428 – 448.

Pazzani, M. J. (1999). A framework for collaborative, content-based and

demographic filtering. Artificial Intelligence Review, 13(5/6), 393 – 408.

Pennock, D.M., Horvitz, E., Lawrence, S., Giles, C.L., (2000). Collabora-

tive filtering by personality diagnosis: A hybrid memory- and model-

based approach. Proceedings of the 16th Conference on Uncertainty in

Artificial Intelligence (UAI-2000) (pp. 473 – 480). Stanford, CA

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan

Kaufmann.

Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., Riedl, J., (1994).

GroupLens: An open architecture for collaborative filtering of netnews.

Proceedings of the 1994 Computer Supported Collaborative Work

Conference

Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-commerce

recommendation applications. Data Mining and Knowledge Discovery,

5(1 – 2), 115 – 153.

Shardanand, U., Maes, P., (1995). Social information filtering: Algorithms

for automating ‘Word of Mouth’. Proceedings of ACM CHI ‘95 (pp.

210 – 217). Denver, CO

Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science,

240, 1285 – 1289.

Vellido, A., Lisboa, P. J. G., & Meehan, K. (1999). Segmentation of the on-

line market using neural networks. Expert Systems with Applications,

17(4), 303 – 314.

Yuan, S., & Chang, W. (2001). Mixed-initiative synthesized learning

approach for web-based CRM. Expert Systems with Applications, 20(2),

187 – 200.

T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 423


	The collaborative filtering recommendation based on SOM cluster-indexing CBR
	Introduction
	Background
	Collaborative filtering recommendation
	Self-organizing map
	Case based reasoning

	SOM cluster-indexing CBR CF recommendation
	Profiling step
	Inferring step
	Predicting step

	Experiments
	Data: MovieLens dataset
	Evaluation metrics
	Experimental setup

	Results
	Conclusion
	Acknowledgements
	References