PII: S0957-4174(03)00067-8 The collaborative filtering recommendation based on SOM cluster-indexing CBR Tae Hyup Roh a,*, Kyong Joo Oh b , Ingoo Han a a Graduate School of Management, Korea Advanced Institute of Science and Technology, 207-43 Cheongryangri-Dong, Dongdaemun-Gu, Seoul 130012, South Korea b Department of Business Administration, Hansung University, Seoul, South Korea Abstract Collaborative filtering (CF) recommendation is a knowledge sharing technology for distribution of opinions and facilitating contacts in network society between people with similar interests. The main concerns of the CF algorithm are about prediction accuracy, speed of response time, problem of data sparsity, and scalability. In general, the efforts of improving prediction algorithms and lessening response time are decoupled. We propose a three-step CF recommendation model, which is composed of profiling, inferring, and predicting steps while considering prediction accuracy and computing speed simultaneously. This model combines a CF algorithm with two machine learning processes, Self-Organizing Map (SOM) and Case Based Reasoning (CBR) by changing an unsupervized clustering problem into a supervized user preference reasoning problem, which is a novel approach for the CF recommendation field. This paper demonstrates the utility of the CF recommendation based on SOM cluster-indexing CBR with validation against control algorithms through an open dataset of user preference. q 2003 Elsevier Ltd. All rights reserved. Keywords: Collaborative filtering; Recommendation system; Self-organizing map; Case-based reasoning 1. Introduction The advent of the network world induced by the rapid development of the Internet and the accompanying adoption of the Web has promoted the chances to create greater business opportunities and to reach customers more easily. This 24 £ 7 on-line accessibility has resulted in the enlargement of choices, but customers are faced with information overload. Their own arduous efforts are required to retrieve information that matches their prefer- ences. It is an automated and sophisticated decision support system that is needed to suggest personalized information in a brief form without going through an annoying process. Collaborative filtering (CF) recommendation is a knowl- edge sharing technology for distribution of opinions and facilitating contacts in network society between people with similar interests. The CF recommendation is the process of multiple users sharing information on the preferences and actions of an affinity group tracked by a system which, then, tries to make useful recommendations to individual users based on the patterns it predicts (Herlocker, Konstan, Borchers, & Riedl, 1999; Kumar, Raghavan, Rajagopalan, & Tomkins, 1998). CF recommendation also provides a complementary tool for information retrieval systems that facilitates users’ navigation in a meaningful and personal- ized way. Most content retrieval methodologies use some type of a similarity score to match a query describing the content with key words, the individual titles or items, and then present the user with a ranked list of suggestions. However, conventional CF does not use any actual content (e.g. words, description, URL, etc.) of the items. It is rather based on preference ratings information to match users with similar interests together and to predict a user’s rating for an unseen item by examining his/her community’s rating for that item. The CF recommendation systems are built on the assumption that a good way to find interesting content is to find other people who have similar interests and then recommend items that those similar users like (Breese, Heckerman, & Kadie, 1998). Most research on recommendation systems can be divided into three categories: technical system development research, user behaviour and reaction research, and privacy issues. Our focus is on technical system development 0957-4174/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0957-4174(03)00067-8 Expert Systems with Applications 25 (2003) 413–423 www.elsevier.com/locate/eswa * Corresponding author. Tel./fax: þ82-29583685. E-mail address: rohth@kgsm.kaist.ac.kr (T.H. Roh). http://www.elsevier.com/locate/eswa research, especially, the design and analysis of an algorithm for CF recommendation. As the number of users and items increases and the contents of each user’s preference to the items changes, typical CF recommendation needs exponen- tially growing computation time for finding an affinity group and predicting each user’s unknown preferences (Cho, Kim, & Kim, 2002; Claypool et al., 1999). We find the potential for improving prediction accuracy and efficiency simul- taneously by separating on- and off-line steps using recent clustering and reasoning machine learning techniques. This study presents a three-step CF model which is composed of SOM profiling, CBR inferring and CF predicting step. The SOM network has been studied as one of the most popular unsupervized neural network models for clustering and visualization in a number of real-world problems (Kohonen, Hynninen, Kangas, & Laaksonen, 1996). The CBR is well known as it benefits from the case-specific knowledge of past problems to find solutions to the new problems (Kim & Han, 2001). These two outperforming machine learning methods can be combined for CF and increase the accuracy and efficiency in the recommendation process. The rest of this paper is organized as follows. Section 2 provides a brief overview of CF models for several recommendation techniques and issues with an emphasis on algorithmic features shown by previous research. Details of the proposed CF model are provided in Section 3. Section 4 describes the dataset, evaluation metrics and experimental design. Experiments are run on an open dataset associated with the MovieLens preference rating dataset, six exper- imental protocols and two evaluation metrics for the various algorithms. Results are shown in Section 5 and conclusion are presented in Section 6. 2. Background 2.1. Collaborative filtering recommendation The fundamental function of CF is to predict the preferences of one user, referred to as the ‘active user’. The problem space can be formulated as a matrix of users versus items, with each cell representing a user’s rating on a specific item. Let I be the whole set of items, Ihð, IÞ be the subset that has been rated by the active user ðUaÞ; and Ir ¼ ðI > ICh Þ be the subset that has not been rated by the Ua: CF systems estimate Ua‘s preferences for items in Ir based on the overlap between his/her preference ratings for items in Ih and those of the other users. The key advantage of CF is that it does not consider the content of the items being recommended, but human determine the relevance, quality, and interest of an items in the information stream. As a result, filtering can be performed on items that are hard to analyze with computers, such as Multimedia Component, ideas, feelings, people, and so on. Rather than mapping users to items through ‘content attributes’ or ‘demographics’, CF treats each item and user individually. Accordingly, it becomes possible to discover new items of interest simply because other people like them. At the same time, CF’s dependence on human ratings can be a drawback. For a CF system to work well, several users must evaluate each item; even then, new items cannot be recommended until some users have taken the time to evaluate them. These limitations, often referred to as ‘data sparsity’ and ‘cold start problem’, cause trouble for users seeking obscure items (since nobody may have rated them) or advice on new items (since nobody has had a chance to evaluate them) (Good et al., 1999). CF related research starts from the Tapestry system, out of Xerox (Goldberg, Nichols, Oki, & Terry, 1992) which coined the term ‘collaborative filtering’ in the context of a system for filtering email using binary category flags. Tapestry was a full-featured filtering system for electronic documents—primarily electronic mail and Usenet postings. The GroupLens is a pioneering and ongoing effort in CF (Good et al., 1999; Herlocker et al., 1999; Konstan et al., 1997; Resnick, Iacovou, Sushak, Bergstrom, & Riedl, 1994; Schafer, Konstan, & Riedl, 2001). The GroupLens team initially implemented a neighbourhood-based CF system for rating Usenet articles. Several similar systems were developed around the same time as the GroupLens Usenet system, including the Ringo music recommender which used a number of measures of distance between users, including Pearson correlation, constrained Pearson correlation, vector cosine (Shardanand & Maes, 1995), and the Bellcore Video Recommender (Hill, Stead, Rosenstein, & Furnas, 1995). These three research systems used what have come to be called neighbourhood-based prediction algorithms. Due to their speed, flexibility, and understandability, neighbour- hood-based prediction algorithms are currently one of the most effective ways to compute predictions in CF. Breese et al. (1998) identify two major classes of CF prediction algorithms; memory-based CF and model-based CF. Memory-based algorithms operate over the entire user database to make predictions. The most common memory- based models are based on the notion of nearest neighbours, using a variety of distance measures. Model-based systems are based on a compact model inferred from the data. They compare a number of algorithms including Bayesian clustering, decision-tree modelling, and also show that neighbourhood-based CF performed better than Bayesian belief networks for non- binary domains. Bayesian network and correlation models are the best-performing if computational complexity is not taken into account. In this framework, our SOM cluster- indexing CBR CF predictor model would be considered model-based CF. More recently, a number of machine learning tech- niques and hybrid filtering techniques have been chal- lenged. Hybrid filtering models combine recommendations from multiple sources which include the content of the item or page, the ratings of users, content-based filtering, T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423414 and demographic information and so on. Balabanovı́c and Shoham (1997) apply ‘Selection agent’, which decides the recommendation algorithm between content-based filtering and CF. Pazzani (1999) shows the hybrid approach for recommendation that uses more of the available infor- mation and consequently has more precise recommen- dations. The strengths of the different approaches can be complementary. Basu, Hirsh, and Cohen (1998) present an inductive learning approach to recommendation that is able to use both ratings information and other forms of information about each item in predicting user prefer- ences. Delgado and Ishii (1999) suggest a weighted- majority rating approach and Pennock, Horvitz, Lawrence, and Giles (2000) suggest a hybrid memory and model based approach for personality diagnosis that computes the preference probability with the same personality grouping. However, these efforts of improving prediction algorithms are decoupled from computational complexity and response time issues. In this paper, we introduce a computational machine learning CF model which comprises off-line learning part and on-line preference predicting part. This model focuses on accuracy and efficiency simultaneously by lessening on- line computation complexity with SOM cluster-indexing CBR process. Since we adapt the dense user-item matrix using the reference data set induced by SOM clusters’ centroid value of each item, the correlation matrix is directly computed and then the active user’s preference to item is predicted. 2.2. Self-organizing map The SOM network, known as competitive learning or self-organization, has been shown as one of the most popular unsupervized competitive neural network learning models, for clustering and visualization in a number of real- world problems (Kohonen et al., 1996). It is capable of mapping high-dimensional similar input data into clusters close to each other. It has two-layer, fully connected networks with a weight matrix. Sometimes, SOM called ‘topology-preserving maps’, assumes a topological structure among the cluster units. A topological map is simply a mapping that preserves neighbourhood relations and performs a topology-preserving projection from the data space onto a regular two-dimensional grid. The resulting maps provide users an intuitive and familiar way of correlating and illustrating input data sets. Furthermore, SOM can be used for clustering, classification, and modelling. The versatile properties of SOM make it a valuable tool in data mining. Relating to the clustering capability of SOM, Mangiameli, Chen, and West (1996) demonstrate that it is a better clustering algorithm than hierarchical clustering with overlapped dispersion, irrele- vant variables, outliers or different sized populations. For that reason, SOM has been adapted as an analytical tool in various marketing domains including database marketing (Ha & Park, 1998), segmentation of on-line markets (Vellido, Lisboa, & Meehan, 1999) and automatic labelling of customer clusters (Yuan & Chang, 2001). In this suggested CF recommendation model, we focus on the clustering capability of SOM. Given a set of users’ preference patterns to items, X; the algorithm returns a prototype (a set of cluster’s centroid values) yi for each cluster i: The prototypes are sometimes called neurons. The number of clusters, M; is a parameter that must be provided a priori. In the algorithm, first each prototype yp is randomly initialized (line 4). In the main loop (lines 5 – 10), one randomly selects an element x [ X and determines the neuron yp that is nearest to x: In the inner loop (lines 8, 9), one considers all neurons y that are within a neighbourhood NðypÞ of yp; including yp; and updates them according to the formula in line 8. The effect of neuron updating is to move neuron y closer to pattern x: The degree by which y is moved towards x is controlled by the parameter g; which is called the learning rate. It has to be noted that gis dependent on the distance between y and yp; i.e. if neuron y [ NðypÞ has a smaller distance to yp than neuron y0 [ NðypÞ then y is moved towards x by a larger degree than neuron y0: After iterations through the repeat-loop, the learning rate g is reduced by a small amount, thus facilitating convergence of the algorithm. It can be expected that after a sufficient number of iterations the yi’s has moved into areas where many xj’s are concentrated. Hence each yi can be regarded as a cluster’s centroid value which is used for the next CBR process as the ‘cluster-indexed reference set’. The Pseudo code description is shown in Fig. 1. In spite of several excellent applications, SOM has some limitations that hinder its performance. The typical limitations and the settlements are due to the vulnerability of convergence along a number of cluster and weight initialization, network size, and stopping rule conditions (Kim & Han, 2001). To determine the number of cluster, we adapt the visualization techniques by use of principle component analysis (PCA). PCA was first introduced in 1901 by Pearson and Hotelling generalized it to random variables in 1933. The idea is to keep only the ‘principal’ eigenvectors (components). The number of eigenvectors to retain depends on the variances (eigenvalues) but is typically small. If v eigenvectors are retained, data are Fig. 1. Pseudo code description of self-organizing map. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 415 projected along the first n principal eigenvectors. In this study, PCA facilitates dimensionality reduction for off-line clustering of user and rapid online cluster assignment, and also users are projected onto the ‘eigen-plane’ in 2 or 3D scatter plot for visualization. 2.3. Case based reasoning CBR is a methodology for building an analogy process, one way of human reasoning, which is the inference that a certain resemblance implies further similarity. It makes direct use of past experiences or cases to solve a new problem by recognizing its similarity with a specific known problem and by applying to find a solution for the current situation (Chiu, 2002; Choy, Lee, & Lo, 2002). CBR applications can be broadly used for two main problem types: classification and synthesis tasks. The classification task is to match case against those in the case base to determine what type, or class, of case it is, and then the solution from the best matching case is reused. Synthesis task attempts to create a new solution by combining parts of previous solutions. CBR systems that perform synthesis tasks must make use of adaptation and are usually hybrid systems combining with other techniques. Its main advantages over other techniques are as follows: in the CBR system, most knowledge is acquired in the case base and so it reduces the knowledge acquisition effort. That is, it makes use of existing case database, so it requires less general knowledge which is very difficult to get. Second, it requires less maintenance effort. Since rule bases or models should consider many dependencies between rules and effects of changes of the rule base are hard to predict, it is difficult to maintain. However, case bases are easier to maintain, because cases are independent of each other, domain experts and novices understand cases quite easily and maintenance of the CBR system can be done by adding/deleting cases. CBR algorithms have been used for marketing decision making processes. Hui, Fong, and Jha (2001) present the hybrid CBR – ANN approach that integrates ANN with the CBR cycle to extract knowledge from service records for the web customer service. Choy et al. (2002) apply CBR to integrate customer relationship management (CRM) and supplier relationship management (SRM) for facilitating supply chain management of supplier selection. Chiu (2002) suggests a case based customer classification approach for direct marketing, which combines Genetic Algorithm and the CBR process. The traditional process involved in CBR can be represented by a schematic cycle as shown in Fig. 2. Aamodt and Plaza (1994), Bradley (1994) describe CBR as a cyclical process: representation, retrieval, reuse, revision, and retainment. Case retrieval searches the case base to select existing cases sharing significant features with the new case. Through the retrieval step, similar cases that are potentially useful to the current problem are retrieved from the case base. The computing of the degree of similarity between the input and the target case can usually be calculated using various similarity functions among which nearest-neighbour matching is one of the frequently used methods. Nearest- neighbour matching is a quite direct method that uses a numerical function to compute the degree of similarity. Usually, cases with higher degrees of similarity are retrieved. A typical numerical function is shown in the following formula (Kolodner, 1993). X n i¼1 Wi £ simðf I i ; f R i ÞX n i¼1 Wi where Wi is the weight of the ith feature, f I i is the value of the ith feature for the input case, f Ri is the value of the ith feature for the retrieved case, and sim ( ) is the similarity function for f Ii and f R i : In our suggested CF model, CBR process provides classification and synthesis with additional generalized knowledge derived from the users’ explicit preference patterns. Generalized knowledge can be acquired by the centroid values of clusters obtained using clustering techniques, which are added to the case base as representative cases and then used as a case indexing scheme in order to retrieve more relevant cases. The cluster-indexing approach assumes that there are some different subgroups (clusters) in each rated group. The centroid values of clusters are new artificial cases that extract the information from the whole case base and represent each clustered case. 3. SOM cluster-indexing CBR CF recommendation This study utilizes the outperforming SOM as a clustering tool and the strength of CBR as assistance to index and retrieve like-minded users. The point of this study is to support the usage of the SOM for finding clusters with centroid value of items and CBR for indexing and retrieval in the CF recommendation process. The centroid values of clusters are the values of weight vectors that are the interim results from SOM learning processes, and these are standardized due to difference of Fig. 2. CBR process as a schematic cycle comprising the five ‘Re’s. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423416 rating scale. The standardized centroid values have the same representation scheme as raw users’ rating value in spite of being learned artificial cases. These values represent clustered users of the entire user-base, and they are used as an indexing tool for each user. These standardized centroid values are consistent if learning is implemented again with the same parameters, although the addition of new users modify the standardized centroid values just a little. The definite process of the cluster-indexing method is composed of three steps: profiling, inferring and predicting steps. In the profiling step operated in the back-office, PCA and preliminary SOM testing are performed to fit the stable cluster condition. Clusters are derived from the dense subset of user-item rating DB. All the training users are indexed by the SOM process in accordance with the similarity to the centroid values of each cluster. While the inferring step, CBR compares an active user with the centroid values. The most similar cluster is inferred from reference users indexed within the selected cluster. After the inferring step, preference prediction is performed on-line with correlation based CF between an active user and reference users of a selected cluster. Fig. 3 depicts the proposed model architecture. 3.1. Profiling step In the profiling step, PCA is used for visualizing users’ patterns and reducing the dimension of input items before SOM clustering. Clusters are then derived from the SOM process using reference user DB with dense preference rating data. The reference users are indexed by the cluster and comprise the cluster-indexing reference DB. Step 1. User clustering with the centroid values of clusters by the SOM. 1.1 Explore users’ distribution by PCA. 1.2 Determine number of clusters using PCA factors. 1.3 Initialize weight vectors of the SOM. 1.4 Find clusters and the standardized centroid values of clusters. 3.2. Inferring step When the item preference prediction of an active user is requested, an active user’s rating information is compared with the standardized centroid values of each cluster. Through indexing and retrieval part of the CBR process, the most similar cluster is determined and retrieved. Step 2. Active user indexing and retrieval with CBR. 2.1. Index an active user with the centroid values of clusters having minimum distance calculated by the k-Nearest Neighbour method, which is Min_D ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn m¼1 l va;m S 2 Cref;ml 2 vuut where m is the given item, ni;m is the active user’s rating value to item m; S is the standardizing factor for rating scale, and Cref;m represents the centroid values to item m of the fixed clusters. 2.2. Retrieve the neighbours that were indexed in the same cluster. 3.3. Predicting step The active user’s predicted preference value to the target item is calculated by a Pearson-correlation based filtering formula in the on-line prediction part. Step 3. Prediction of active user’s predicted preference value. 3.1. Calculate Pearson correlation between the active user and the most similar neighbours that were indexed in the same cluster wða; iÞ ¼ X j ðva;j 2 maÞ sa ðvi;j 2 miÞ si where wða; iÞ is the Pearson correlation coefficient to compute the weight for each user’s contribution that is indexed in the same cluster. 3.2. Compute the prediction Pa;t of active user ðUaÞ on target item It: Pa;t ¼ ma þ k X i–a wða; iÞðvi;t 2 miÞ where the sum is same cluster-indexing users in the reference DB, ni;m is the rating cast by the other user i on item t; and ma is Uas mean rating. The constant k in front of the sum is an appropriate normalization factor. The process of the cluster-indexing method is exempli- fied in Fig. 4 If an active user’s preference rating {5, _, 2, 2, …,?,…, 4} is given, at first, it is indexed as the most similar cluster C2 using the cluster-indexing DB, which is clustered by SOM in an off-line learning process. Next, this method retrieves the nearest neighbour users out of that Fig. 3. Model architecture of SOM cluster-indexing CBR CF recommendation. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 417 cluster-indexing user group (C2). Finally, the predictive value of the active user’s target item is calculated by use of the prediction formula. 4. Experiments Experiments are run on open datasets, six different experimental predictors, and two evaluation metrics for the various algorithms. Results are compared with base-line models and other comparative models in terms of the level of data sparsity and machine learning techniques. 4.1. Data: MovieLens dataset MovieLens data sets were collected by the GroupLens Research Project at the University of Minnesota (http://www.cs.umn.edu/Research/GroupLeans/data). The- Fig. 4. Computation example of SOM cluster-indexing CBR CF model. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423418 http://www.cs.umn.edu/Research/GroupLeans/data historical dataset consists of 100,000 ratings from 943 users on 1682 movies with every user having at least 20 ratings and simple demographic information for the users (age, gender, occupation, zip code) is included. The ratings are on a numeric five-point scale with 1 and 2 representing negative ratings, 4 and 5 representing positive ratings, and 3 indicating ambivalence. We sample a reference user set that has enough rating information to discover similar user patterns. The number of users that are extracted as the reference sample is 251. Among 100,000 rating records, 6093 were generated from 251 users rating data to 33 items out of 10 movie genres. 4.2. Evaluation metrics Recommender systems researchers use several different measures for the quality of recommendations produced: statistical accuracy, decision-support metrics, coverage measures. We apply two metrics to our evaluation: normalized mean absolute error (NMAE) and the receiver operating characteristic (ROC) curve including the area under ROC curve, which are used by Goldberg, Roeder, Gupta, and Perkins (2001) and Good et al. (1999). NMAE. We look at the average absolute deviation of the predicted rating to the actual rating on items the users in the test set have actually voted on. That is, if the number of predicted ratings in the test set is in the active case, ma; then the average absolute deviation for an active user is MAE ¼ 1 ma Xma j¼1 lPa;j 2 va;jl Since our numerical rating scale gives ratings over the range [1, 5], we normalize to express errors as percentages of full scale: NMAE is NMAE ¼ MAE rmax 2 rmin ROC curve and area under ROC curve. This metric evaluates the performance of classification scheme in which there is one variable with two categories by which subjects are classified. ROC sensitivity is a signal proces- sing measure of the decision making power of a filtering system. Operationally, it is the area under the ROC—a curve that plots the sensitivity vs. 1—specificity of the test (Swets, 1988). Sensitivity refers to the probability of a randomly selected good item being accepted by the filter. Specificity is the probability of a randomly selected bad item being rejected by the filter. Points on the ROC curve represent trade-offs supported by the filter. The ROC sensitivity ranges from 0 to 1 where 1 is perfect and 0.5 is random. To test the difference of six protocols’ performance, we use one-way ANOVA with a post hoc test—Bonferroni procedure in the equal variance assumed for multiple comparisons statistics of MAE (Breese et al., 1998). In this study, the post hoc test procedure is used for investigating the differences between specific experimental protocols in conjunction with ANOVA comparing each protocol’s MAE difference. 4.3. Experimental setup At first, we build the dataset into three experimental sets for testing the effect of the available information level (the number of items on which active users have rated). In the first set, named Allbut1, we withhold one selected item for each user in the test set, and try to predict its value given all the other ratings the user has voted on. In the second and third set of experiments, we select five and 10 ratings from each test user as the observed ratings, and then attempt to predict those preference levels. We call these Given5, and Given10. The Allbut1 experiments measure the algorithms’ performance when given as much data as possible from each test user. The Given experiments look at users with less data available, and examine the performance of the algorithms when there is relatively little known about an active user. Secondly, we present metrics derived from empirical analysis of the proposed SOM cluster-indexing CBR CF model, hereafter referred to as SCP, compared to baseline models and comparative 3-step models. The list of experiment protocols is shown in Table 1. 4.3.1. SOM cluster-indexing CBR CF model On the belief that an affinity group can be clustered according to the distribution of their rating values, a couple of clustering and visualizing methods are performed to find the number of clusters. In this study, clustering techniques involve two distinct works: (1) the determination of the number of clusters present in the reference-base; and (2) the assignment of reference users to one cluster. The number of clusters, which is the number of nodes in the output layer, depends on the expected number of clusters, but there is currently no apparent practical or theoretical way of determining the optimal size of the output layer (Nour & Madey, 1996). There is a possible instability due to the randomness of clusters, so it requires a policy for initial cluster selection. To reduce this possibility, the SCP model contains the PCA process and preliminary SOM clustering. At first, each reference user’s distribution is Table 1 List of experiment protocols Protocol Description Proposed model SCP SOM cluster-indexing CBR CF predictor Comparative model Baseline model UAP By-user-average CF predictor IAP By-item-average CF predictor SPP Simple Pearson CF predictor 3-Step model SIP SOM cluster induction CF predictor SNP SOM cluster neural network CF predictor T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 419 summarized by PCA using 6093 rating records in the 33 movie among 10 genres before SOM clustering. In this experiment, we chose three components by PCA, and data are projected onto the eigen-plane for pre-visualization shown as Figs. 5 and 6. In preliminary SOM clustering, the number of clusters is set as 2 – 10. When deciding the optimal number of clusters, the lowest cluster number is selected so that each cluster can have as many indexed reference users as possible. If a cluster has no or only a few users, this cluster does not have sufficient user cases to find a more similar user within indexed users. The total number of neurons in the output layer is decided as being four clusters according to the results of the preliminary SOM that was performed to find the number of clusters. Based on the SOM learning, each user in reference DB is indexed into a cluster. In addition to indexing, the centroid values of each item are deduced and its average values of each genre cluster suggest each cluster’s inferable genre preference characteristics shown in Table 2(a) and (b). It appears as if each cluster is grouped by rating level, and a quick glance seems to suggest that the average rating is going up, i.e. C4 (0.7810) is relatively more liberal than C1 (0.5400). However, a closer look shows that each cluster has different genre and movie preferences. For example, C1 is a comparatively negative group, but this group prefers tough genres such as horror, crime, and war Multimedia Component to soft ones such as drama, romance films. On the other hand, C3 rates higher than C1. This affinity group likes science-fiction and adventure rather than horror, which is the most preferred genre for C1. 4.3.2. Comparative models Previous research on CF algorithms has tended to compare the performance of algorithms exclusively within that research, so making comparisons of algorithm per- formance from paper to paper is difficult. We show three baseline predictors, by-user-average CF predictor (UAP), by-item-average CF predictor (IAP) and simple Pearson CF predictor (SPP), to provide benchmarks against which any predictor could be compared. The baseline algorithms are simple, efficient and return reasonable results. The UAP returns the average of the ratings the given user has already entered. The IAP gives the average rating for the given movie of all users that have voted for that movie. The SPP returns the Pearson correlation based neighbourhood prediction. To demonstrate the utility of the SCP model, we change the CBR process into neural network and induction technique as a classification method withholding the 1st SOM profiling step and the 3rd Pearson correlation-based prediction step. The SOM neural network CF Predictor (SNP) uses the well known back propagation neural network algorithm in the classification step and all decision coefficients are controlled to achieve the best prediction accuracy. The SOM induction CF Predictor (SIP) uses decision a tree technique known as induction, especially SEE 5.0 algorithm which is an upgraded version of Quinlan (1993) decision tree classifier C4.5. Fig. 6. Scatter plot of reference users projected onto the 3D ‘eigen-plane’ for visualization. Fig. 5. Scree curve of eigenvalue explained by components. The largest amount of eigenvalue is explained by the first component. The first three components can be the representative factors of the population dataset. Table 2a Genre average centroid values of SOM cluster Genre Children’s Drama Adventure Science-fiction Crime Thriller War Romance Comedy Horror Average Cluster C1 0.4845 0.4775 0.5432 0.5435 0.6473 0.5477 0.6090 0.4980 0.5540 0.6870 0.5400 C2 0.5595 0.6050 0.6756 0.6525 0.8367 0.6693 0.6570 0.5590 0.5540 0.5790 0.6490 C3 0.6910 0.5820 0.7218 0.7488 0.6340 0.6857 0.7090 0.5970 0.9313 0.5450 0.6860 C4 0.7180 0.6965 0.8336 0.8124 0.8210 0.7910 0.8820 0.6785 0.6897 0.7230 0.7810 Bold numbers mean the most preferred genre in each cluster, and italicized numbers are the worst. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423420 5. Results To validate the effectiveness of SCP, prediction accuracy is compared with the comparative experimental algorithms in terms of NMAE and the area under the ROC curve. Table 3(a) and (b) tabulates the accuracy results of each protocol on the movie dataset. SPP reflects better users’ preference than simple average predictors (UAP, IAP) in the all experiment dataset, Allbut1, Given5 and Given10 at the 5% significance level among the baseline model, which are the same results as the ROC area metric. These results support that CF algorithm is more accurate than simple average predictors that have not considered the like-minded affinity group to predict active users’ preference. Comparing between UAP and IAP, when not enough rating data are available to apply a CF algorithm, user average value has more predictive power than item average value. SCP and SNP, which use a clustering-classification method, yield superior results to SPP, which is a type of memory-based model. However, the SIP model shows worse performance than all other experimental protocols. So model-based CF methods, especially machine learning technique-based CF models, have promising potential to improve, but their success depends on methodology suitability. Especially, the SCP model shows an out- performing result (0.1583 NMAE on Given 10 and an ROC area of 0.8461) compared with other comparative modelling by NMAE and ROC area metrics. It dominates UAP, IAP, SPP, SIP models at the 5% and 1% significance levels and yields better performance than SNP. According to our experiment, the SCP model can alleviate prediction error about 4%. From the viewpoint of data sparsity, Given10’s average NMAE (0.1790), which has less preference information, reports a higher error rate than Allbut1 (0.1564) and Given5 (0.1747). This implies that as explicit information, about preference rating to items, becomes sparse, the prediction accuracy decreases. In this study, we make a pre-filtered DB, named as reference DB which is composed of dense user-item rating data. Building a high-density preference DB can be one approach for alleviating the sparsity problem to achieve higher recommendation accuracy. Overall, among the ROC curves illustrated in Fig. 7, the SCP model is stably dominant to other clustering-classifi- Table 3a Performance results: Prediction accuracy and area under ROC curve of each protocol Protocol Metric NMAE ROC Area Allbut 1 Given5 Given10 UAP 0.1600 0.1863 0.1901 0.7127 IAP 0.1714 0.1930 0.1951 0.6071 SPP 0.1497 0.1719 0.1734 0.8006 SNP 0.1413 0.1557 0.1638 0.8166 SIP 0.1687 0.1892 0.1933 0.7430 SCP 0.1475 0.1524 0.1583 0.8461 Average 0.1564 0.1747 0.1790 Lower value of NMAE metric indicates better performance, and for ROC area, it is vice versa. Table 2b Centroid values of SOM cluster by item Genre Items Cluster C1 C2 C3 C4 Children’s 1 0.407 0.520 0.739 0.751 2 0.562 0.599 0.643 0.685 Drama 3 0.488 0.611 0.581 0.766 4 0.467 0.599 0.583 0.627 Adventure 5 0.431 0.666 0.675 0.759 6 0.690 0.790 0.841 0.966 7 0.543 0.637 0.741 0.828 8 0.690 0.752 0.754 0.913 9 0.362 0.533 0.598 0.702 Science-fiction 10 0.705 0.822 0.942 0.974 11 0.676 0.676 0.857 0.941 12 0.597 0.689 0.884 0.891 13 0.516 0.622 0.705 0.705 14 0.364 0.489 0.606 0.663 15 0.542 0.607 0.766 0.839 16 0.570 0.682 0.647 0.790 17 0.523 0.685 0.599 0.783 18 0.503 0.673 0.681 0.797 19 0.556 0.699 0.758 0.850 20 0.427 0.534 0.741 0.681 Crime 21 0.593 0.875 0.592 0.902 22 0.670 0.783 0.666 0.784 23 0.679 0.852 0.644 0.777 Thriller 24 0.466 0.628 0.733 0.835 25 0.467 0.560 0.588 0.615 26 0.710 0.820 0.736 0.923 War 27 0.609 0.657 0.709 0.882 Romance 28 0.569 0.642 0.645 0.779 29 0.427 0.476 0.549 0.578 Comedy 30 0.655 0.631 0.800 0.815 31 0.601 0.585 0.519 0.630 32 0.406 0.446 0.575 0.624 Horror 33 0.687 0.579 0.545 0.723 Table 3b Performance results: Statistical significance test—one-way ANOVA with the post hoc test IAP SPP SNP SIP SCP UAP 20.0202 0.0665** 0.1052*** 20.0131 0.1271*** IAP – 0.0867** 0.1253*** 0.0070 0.1472*** SPP – – 0.0387* 20.0797** 0.0605** SNP – – – 20.1183*** 0.0219 SIP – – – – 0.1402** Bonferroni procedure based on MAE: mean differences and signifi- cance level at ***1%, **5% and *10% level for the pair-wise comparison of performance between protocols. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 421 cation CF models and memory-based models. This implies that when the item recommendation criterion is changed, the SCP model can be flexibly applied. For example, even if the recommendable item criterion is changed from 5 to 4 stars in the movie recommendation, the SCP model does still works. 6. Conclusion In this paper, we propose an SCP model which applies two combing machine learning techniques, SOM and CBR to the consecutive CF prediction process as a new approach in the CF recommendation field. This study shows that cluster-indexing CBR is an effective user indexing and retrieval method for CF recommendation. Instead of using all user’s ratings for retrieval of nearest neighbour, SOM cluster-indexing CBR approach allevi- ates the on-line computational complexity by use of significant cluster-centroid values induced from SOM process. The SOM facilitates affinity user grouping and extraction of representative centroid values of each cluster’s items for assisting case indexing and retrieval of CBR. For SOM clustering of our study, most of the computational costs are involved in the training process and this process is done off-line in the profiling step. After that, the CBR process and CF prediction in the selected cluster are a compact representation of the raw ratings information, and thus the time and space complexities on making recommendations are quite low. The performance of our model yields superior results compared to memory-based CF techniques and other previous hybrid CF models. The NMAE values induced from our model indicate that predicted rating values will be within roughly 15% of the true rating values. So the items with predicted ratings well above the mean for a new user in many cases will correspond to desirable items for that user. These accuracies are comparable with those reported for a completely different data set (jokes); the algorithms in Goldberg et al. (2001) show NMAE from 0.187 to 0.237 in the 20 unit rating scale [210, þ10]. Herlocker et al. (1999) report MAE from 0.768 to 0.828. When these are normal- ized to the 4 unit rating scale [1, 5], they yield NMAE from 0.192 to 0.207 in the same MovieLens data set. According to Goldberg et al. (2001), if user ratings are distributed uniformly or normally, random predictions yield NMAE of 33 and 28%, respectively. Our model also yields superior performance when compared to other traditional memory- based CF algorithms and other NN and induction based CF prediction algorithms. This suggests that there is room for improved accuracy for all current CF algorithms. We are experimenting with a number of variations, such as k-means clustering and hybrid approaches with adaptive online weighting to further improve accuracy without altering online computation time. This study compares several computational approaches to CF recommendation, considering prediction accuracy and response speed simul- taneously. Especially, data mining techniques, such as SOM, NN and CBR have shown the potential for improving. The promising potential for CF systems can be investigated by integrating with product/customer-specific information profiling, implicit information analysis such as web-page navigation history and retrieval technology. In the future research, we will suggest hybrid recommendation algor- ithms and try to apply our model into a real-world personalized recommendation site. Acknowledgements This research was financially supported by Han Sung University in the year of 2003. References Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications, 7(1), 39 – 59. Balabanovı́c, M., & Shoham, Y. (1997). Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3), 66 – 72. Basu, C., Hirsh, H., Cohen, W., (1998). Recommendation as classification: Using social and content base information in recommendation. Proceedings of the 1998 Workshop on Recommender Systems (pp.11 – 15) Bradley, P. S. (1994). Case-based reasoning: Business applications. Communication of the ACM, 37(3), 40 – 43. Breese, J.S., Heckerman, D., Kadie, C., (1998). Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98) (pp. 43 – 52) Chiu, C. (2002). A case-based customer classification approach for direct marketing. Expert Systems with Applications, 22(2), 163 – 168. Fig. 7. ROC curve comparison of CF prediction protocols. Upper line indicates more prediction accuracy. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423422 Cho, Y. H., Kim, J. K., & Kim, S. H. (2002). A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, 23(3), 329 – 342. Choy, K. L., Lee, W. B., & Lo, V. (2002). Development of a case based intelligent customer-upplier relationship management system. Expert Systems with Applications, 23(3), 281 – 297. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin, M., (1999). Combining content-based and collaborative filters in an online newspaper. ACM SIGIR’99 Workshop on Recommender Systems, Berkely, CA Delgado, P.J., Ishii, N., (1999). Memory-based weighted majority prediction for recommender systems. SIGIR Workshop on Recommen- der Systems Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communi- cations of the ACM, 35(12), 61 – 70. Goldberg, K., Roeder, R., Gupta, D., & Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval Journal, 4(2), 133 – 151. Good, N., Schafer, J.B., Konstan, J., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J., (1999). Combining collaborative filtering with personal agents for better recommendations. Proceedings of the 1999 Conference of the American Association of Artificial Intelligence (AAAI-99) Ha, S. H., & Park, S. C. (1998). Application for data mining tool to hotel data mart on the Internet for database marketing. Expert Systems with Applications, 15(1), 1 – 31. Herlocker, J., Konstan, J., Borchers, A., Riedl, J., (1999). An algorithmic framework for performing collaborative filtering. Proceedings of the 1999 Conference on Research and Development in Information Retrieval Hill, W., Stead, L., Rosenstein, M., & Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. CHI 95, Denver, CO: ACM Press, pp. 194 – 201. Hui, S. C., Fong, A. C. M., & Jha, G. (2001). A web-based intelligent fault diagnosis system for customer service support. Engineering Appli- cations of Artificial Intelligence, 14(4), 537 – 548. Kim, K. S., & Han, I. G. (2001). The cluster-indexing method for case- based reasoning using self-organizing maps and learning vector quantization for bond rating cases. Expert Systems with Applications, 21(3), 147 – 156. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., (1996). SOM_PAK: The self-organizing map program package. Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science, FIN-02150 Kolodner, J. L. (1993). Case-Based Reasoning. Los Altos, CA: Morgan Kaufmann. Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., & Riedl, J. (1997). GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3), 77 – 87. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A., (1998). Recommendation systems: A probabilistic analysis. Proceedings of the 39th Annual Symposium on Foundations of Computer Science Mangiameli, P., Chen, S. K., & West, D. (1996). A comparison of SOM neural network and hierarchical clustering methods. European Journal of Operation Research, 93, 402 – 417. Nour, A. N., & Madey, G. R. (1996). Heuristic and optimization approaches to extending the Kohonen self organizing algorithm. European Journal of Operation Research, 93, 428 – 448. Pazzani, M. J. (1999). A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 13(5/6), 393 – 408. Pennock, D.M., Horvitz, E., Lawrence, S., Giles, C.L., (2000). Collabora- tive filtering by personality diagnosis: A hybrid memory- and model- based approach. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI-2000) (pp. 473 – 480). Stanford, CA Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., Riedl, J., (1994). GroupLens: An open architecture for collaborative filtering of netnews. Proceedings of the 1994 Computer Supported Collaborative Work Conference Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-commerce recommendation applications. Data Mining and Knowledge Discovery, 5(1 – 2), 115 – 153. Shardanand, U., Maes, P., (1995). Social information filtering: Algorithms for automating ‘Word of Mouth’. Proceedings of ACM CHI ‘95 (pp. 210 – 217). Denver, CO Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285 – 1289. Vellido, A., Lisboa, P. J. G., & Meehan, K. (1999). Segmentation of the on- line market using neural networks. Expert Systems with Applications, 17(4), 303 – 314. Yuan, S., & Chang, W. (2001). Mixed-initiative synthesized learning approach for web-based CRM. Expert Systems with Applications, 20(2), 187 – 200. T.H. Roh et al. / Expert Systems with Applications 25 (2003) 413–423 423 The collaborative filtering recommendation based on SOM cluster-indexing CBR Introduction Background Collaborative filtering recommendation Self-organizing map Case based reasoning SOM cluster-indexing CBR CF recommendation Profiling step Inferring step Predicting step Experiments Data: MovieLens dataset Evaluation metrics Experimental setup Results Conclusion Acknowledgements References