Multi-instance genetic programming for web index recommendation Expert Systems with Applications 36 (2009) 11470–11479 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Multi-instance genetic programming for web index recommendation A. Zafra a, C. Romero a, S. Ventura a,*, E. Herrera-Viedma b a Dept. of Computer Sciences and Numerical Analysis, University of Córdoba, Campus de Rabanales, edificio Albert Einstein, 14071 Córdoba, Spain b Dept. of Computer Sciences and Artificial Intelligence, University of Granada, Periodista Daniel Saucedo Aranda s/n, 18071 Granada, Spain a r t i c l e i n f o a b s t r a c t Keywords: Grammar-guided genetic programming Multiple instance learning User modelling Web mining 0957-4174/$ - see front matter � 2009 Elsevier Ltd. A doi:10.1016/j.eswa.2009.03.059 * Corresponding author. Tel.: +34 957212218; fax: E-mail addresses: azafra@uco.es (A. Zafra), cr sventura@uco.es (S. Ventura), viedma@decsai.ugr.es ( This article introduces the use of a multi-instance genetic programming algorithm for modelling user preferences in web index recommendation systems. The developed algorithm learns user interest by means of rules which add comprehensibility and clarity to the discovered models and increase the quality of the recommendations. This new model, called G3P-MI algorithm, is evaluated and compared with other available algorithms. Computational experiments show that our methodology achieves competitive results and provide high-quality user models which improve the accuracy of recommendations. � 2009 Elsevier Ltd. All rights reserved. 1. Introduction In the last few years, the quantity of information available on Internet has been growing so rapidly that it now exceeds human processing capabilities. Users feel overwhelmed by the amount of information available and are usually unable to locate really rele- vant information that suits their individual needs in a limited amount of time. In this situation, there is a pressing need for tools that anticipate the preferences of users and provide recommenda- tions about whether or not a particular item will be of interest to the user. Such systems, referred to in the literature as recommen- dation systems (Felfernig, Friedrich, & Schmidt-Thieme, 2007), have features similar to traditional information retrieval ap- proaches but differ from them, especially in the use of models that contain information about user tastes, preferences and needs. This information differs according to the type of processing performed by the system. So, in collaborative filtering recommender systems (Schafer, Herlocker, & Sen, 2007) this model reflects similar users’ preferences or needs, while in content-based recommender sys- tems (Pazzani & Billsus, 2007) this information maps the relation- ship between the items to be recommended and the preferences of a given user. In modelling user preferences, an interesting problem is the classifying of web index pages into two categories (according to whether or not they are pertinent for a user), because this allows us to build a user model for a content-based recommendation sys- tem. The main difficulty in this problem lies in training set repre- sentation; web index pages are those which contain references or brief summaries of other pages and where there is a different num- ll rights reserved. +34 957218630. omero@uco.es (C. Romero), E. Herrera-Viedma). ber of references on each page. Moreover, the information available about the user is imprecise. We know if the user is interested in an index page or not, instead of determining exactly which concrete links the user really considers to be of interest. Recently, Zhou, Jiang, and Li (2005) have solved the problem from a multi-instance learning perspective, adapting the well known k-Nearest Neighbor (k-NN) algorithm to this new learning framework. Experimental results show that this approach greatly improves supervised learn- ing algorithm approaches. In spite of the interesting results reported by Zhou et al. (2005), their proposal presents two major limitations. The first one is re- lated to sparsity and to scalability, as the k-NN algorithm requires computations that grow linearly with the number of items, which makes it hard to scale when the number of items is high and maintain reasonable prediction performance and accuracy. The second one is related to the interpretability of new-found knowl- edge. The K-NN algorithm is a black box algorithm, that is, it sim- ply classifies web index pages as being ‘‘of interest” or ‘‘not of interest”, without providing additional information about user preferences. This is not a desirable property in recommendation systems, where any information that allows us to learn more about the interest of the user is of outmost interest for facilitating new recommendations. To overcome the aforementioned drawbacks, we propose the use of G3P-MI, a grammar-guided genetic programming algorithm for multiple instance learning. This algorithm learns prediction rules which provide information on whether any of the links con- tained on a given web index page are of interest to a given user. Experimental results concerning several benchmarks show that this approach obtains competitive results in terms of accuracy, re- call and precision. Moreover, it adds comprehensibility and clarity to the knowledge discovery process which is such an important characteristic for obtaining high predictive accuracy since the mailto:azafra@uco.es mailto:cromero@uco.es mailto:sventura@uco.es mailto:viedma@decsai.ugr.es http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa A. Zafra et al. / Expert Systems with Applications 36 (2009) 11470–11479 11471 system’s results can be interpreted easily (understandable user models) and this data can be used to obtain further information about the user thus generating even more appropriate recommendations. The rest of this paper is organized as follows. Section 2 is de- voted to introducing the multi-instance learning paradigm, and Section 3 describes the proposed G3P-MI algorithm. Section 4 pre- sents Web Index Recommendation as a multi-instance learning problem. Sections 5 and 6 presents and analyses the experimental results of our system. Finally, Section 6 presents conclusions and future work. 2. Multiple instance learning The term Multiple Instance Learning was coined by Dietterich, Lathrop, and Lozano-Perez (1997) when investigating a qualitative structure–activity relationship problem. In this problem, the task consisted of determining if a given substance does or does not present pharmacological activity in information about its molecu- lar structure. The difficulty of this task is due to the fact that a sub- stance can present more than one spatial configuration, each of which showing different structural properties. Because conforma- tions can most naturally be represented as fixed-length vectors of attribute values (or instances), the most convenient form for representing these learning examples seems to be in collections of these instances, with one associated class label representing the concept that we can learn. This is called ‘‘multiple instance” and contrasts with the manner of representing examples in super- vised learning, where each example contains only one labeled instance. To solve this last problem, Dietterich et al. (1997) proposed as the learning hypothesis that an example should be considered po- sitive (that is, it represents the concept that we can learn) if it con- tains at least one instance that represents this concept. On the other hand, an example should be considered negative if it does not contain any instances representing the concept of learning. Using this learning hyphothesis, they developed three Axis-Parallel Rectangle (abbreviated as APR) algorithms, which attempt to search for appropriate axis-parallel rectangles constructed by the conjunction of their features. Their best performing algorithm (iterated-discrim) starts with a point in the feature space and ‘‘grows” a box with the goal of finding the smallest box that can cover at least one instance from each positive bag and no instances from any negative bags. The resulting box was then expanded (via a statistical technique) to get optimum results. Following Dietterich et al.’s study, Auer (1997) tries to avoid some potentially difficult computational problems that were re- quired by the heuristics used in the iterated-discrim algorithm, presenting a theoretical algorithm that does not require product distribution, MULTINST. With a new approach, Maron and Loz- ano-Pérez (1997) proposed one of the most famous multi-instance learning algorithms, Diverse Density (DD). The diverse density of a point, p, in the feature space is defined as a probabilistic measure taking into consideration how many different positive bags have an instance near p, and how far the negative instances are from p. This algorithm was combined with the Expectation Maximiza- tion (EM) algorithm, resulting in EM-DD (Zhang & Goldman, 2001), a general-purpose MI algorithm whose basic premise is to show which instance corresponds to the bag labeled as a missing attribute which can be estimated using the EM approach. Recently, Pao, Chuang, Xu, and Fu (2008) have proposed an EM based learn- ing algorithm to provide a comprehensive procedure to maximize the measurement of DD on given multiple instances. In 1998, Long and Tan (1998) described a polynomial-time the- oretical algorithm showing that if the instances in the bags drawn from product distribution are independent, then the APR is PAC- learnable. Continuing with PAC-learnable research, Kalai and Blum (1998) described a reduction in the problem of PAC-learning in the MIL framework as compared to PAC-learning with one- sided random classification noise, and presented a theoretical algorithm with less complexity than the algorithm described in Auer (1997). The first approaches using lazy learning, decision trees and rule learning were studied during the year 2000. In the lazy learning context, Wang and Zucker (2000) proposed two variants of the K-nearest neighbor algorithm (kNN) that they referred to as Cita- tion-kNN and Bayesian-kNN, these algorithms extending the K- nearest neighbor algorithm for MIL, adopting Hausdorff distance. With respect to decision trees and learning rules, Zucker and Che- valeyre (2000) implemented ID3-MI and RIPPER-MI, which are multi-instance versions of decision tree algorithm ID3 and rule learning algorithm RIPPER, respectively. At that time, Ruffo (2000) presented a multi-instance version of the C4.5 decision tree, which was called RELIC. There are also many other supervised learning algorithms which have been adapted to MIL. Thus, we can find the contribu- tion of Ramon and De Raedt (2000) which extends standard neural networks to MIL. After this work, further studies appeared improv- ing or extending it (Chai & Yang, 2007; Zhang, Jack, & Nandi, 2005; Zhang & Zhou, 2004, 2006). Another approach that has been adapted to the MIL framework is Support Vector Machines (SVM). There are numerous proposals in these approaches, Gärtner, Flach, Kowalczyk, and Smola (2002) adapted kernel methods to work with MIL data by modifying the kernel distance measures to handle sets. Using a related approach, Chen and Wang (2004) and Chen et al. (2006) adapted SVMs by modifying the form of the data rather than changing the underlying SVM algorithms while Andrews, Tsochantaridis, and Hofmann (2002) adapted the SVM kernels directly to produce one of the best MIL classification systems currently available. Recently, we can also find the propos- als by Mangasarian and Wild (2008) and Gu et al. (2008). Finally, there are works such as those by Zhang et al. (2005) and Zhou and Zhang (2007) that show the use of ensembles to enhance mul- ti-instance learners. 3. Grammar-guided genetic programming for multiple instance learning In this section we introduce G3P-MI, a grammar-guided genetic programming algorithm for multi-instance learning. In the next sections, we will introduce the following design aspects: individual representation, genetic operators, fitness function and evolution- ary process. 3.1. Individual representation In G3P-MI, an individual represents rules that determine if a gi- ven pattern should be considered positive (that is, is an example of the concept we want to represent) or negative (if it is not): if ðcondBðbÞÞ then pattern b is an instance of the concept: else pattern b is not an instance of the concept: end if ð1Þ where condB is a condition that is applied to the pattern. Consider- ing the Dietterich hypothesis, which states that a pattern is positive if any of its instances represent the concept that we want to learn and negative otherwise, we could represent condB as: 11472 A. Zafra et al. / Expert Systems with Applications 36 (2009) 11470–11479 condBðbÞ¼ true; 9i 2 ½1; sizeðbÞ�=condIðinstanceði; bÞÞ¼ true false; otherwise: � ð2Þ where sizeðbÞ returns the number of instances in pattern b, instanceði; bÞ is a function that returns the ith instance to bag b and condI is a condition that is applied to an instance contained in a given bag. Considering the properties of the disjunction opera- tor, Eq. (2) can be rewritten as condBðbÞ¼ _ i¼1;sizeðbÞ condIðinstanceði; bÞÞ ð3Þ where _ is the disjunction operator. As can be seen, condI is the only variable part in Eqs. (1)–(3) that can experiment an evolutionary process. So, in G3P-MI, an individual’s genotype is a syntax tree that contains the code of function condI , while the individual’s pheno- type is the whole rule that is applied to the bags (Eq. 1). Fig. 1 shows the context free grammar that represents these individuals in a gen- eral form. As can be seen, the code of the condition can contain one or several valid clauses, that check conditions related to instances contained in patterns. In the case of there being more than one clause, they can be combined by the conjunctions or disjunctions found in any order. The format of the clauses depends on the type of data contained in the instances analyzed (in Section 5 we will de- scribe the format of the clauses used in the web index recommen- dation problem). 3.2. Initialization To initialize the population in the algorithm, the procedure used is inspired by that defined by Geyer-Schulz (1995). This procedure builds a new syntax tree at random given the maximum number of derivations. To guarantee that the syntax tree generated is valid and uses a maximum number of derivations, the system calculates a selection probability for each symbol in the grammar for a spec- ified number of available derivations. This table of probabilities, although implying some computational input, only has to be calcu- lated once since it is saved with the rest of the structural informa- tion about individuals. To guarantee greater diversity in the number of individuals gen- erated, the initialization procedure generates individuals of differ- ent sizes, with two parameters as the minimum number of derivations and maximum number of individuals in the population. 3.3. Genetic operators G3P-MI uses two genetic operators, called respectively selective crossover and selective mutation, to generate new individuals in a given generation of the evolutionary algorithm. Both operators were proposed by Whigham (1996) in the definition of grammar- based genetic programming. In this section, we will briefly de- scribe their basic principles and functioning. 3.3.1. Selective crossover This operator creates new programs by mixing the contents of two parent programs. To do so, a non-terminal symbol is chosen at random and two sub-trees (one from each parent) are selected whose roots coincide with the symbol adopted. Fig. 2 shows how this operation is performed. Fig. 1. Grammar used for representing individuals’ genotypes in G3P-MI. Selective crossover presents several configuration parameters. On one hand, a list of eligible symbols can be defined in order to increase the probability of crossover for certain symbols and lessen that probability for certain others. On the other hand, in order to reduce bloating (Banzhaf, Francone, Keller, & Nordin, 1998), there is a parameter that defines the largest size possible for offspring generated in a crossover. Surpassing this size, the operations will reproduce the original parents which would also occur if one of the parents does not possess the symbol in question. 3.3.2. Selective mutation Mutation is associated with a small change in the structure of the representation it is applied to. As can be seen in Fig. 3, this is achieved by randomly selecting a sub-tree within the individual to be mutated, and replacing this sub-tree with a new randomly- generated one. The procedure used to generate this sub-tree is the same as that used to create new individuals. As in the case of selective crossover, this operator presents two configuration parameters: on one hand, the list of eligible non-ter- minal symbols as the root of the sub-tree to be mutated, and on the other, a maximum size for the offspring generated. 3.4. Evolutionary algorithm G3P-MI follows the structure of a classical generational and elit- ist evolutionary algorithm. First of all, there is the creation of an initial population, following the procedure described in Section 3.2. Once the individuals are evaluated, we encounter the main loop of the algorithm composed of the following operations: (1) Parents selection. The procedure followed to choose the indi- viduals to be reproduced by crossover and/or mutation is roulette selection. (2) Parents reproduction. Once the parents are obtained, the crossover operator is applied with a certain probability, and later on the mutation operator as well, also with a deter- mined probability. The offspring obtained through this pro- cedure are then evaluated. (3) Population update. The population is updated by direct replacement, that is, the resulting offspring replace the pres- ent population. To guarantee that the best individual in the population is not lost during the updating process, the algo- rithm employs elitism. Finally, there are two conditions for exiting from the main loop. On one hand, the algorithm ends if the maximum number of genera- tions defined by the user is surpassed and, on the other, it also ends if the best individual in the population achieves the quality objec- tives indicated by the user. 4. Web index recommendation: a multiple instance problem Web Index Pages are pages that provide titles or brief summa- ries of other pages. These pages contain a lot of information through references, leaving detailed presentations to their linked pages. An example of a web index page is http://health.yahoo.com as shown in Fig. 4. The web index recommendation problem consists of building a model to establish exactly which web page index it is that interests a given user from among the contents of a myriad of web index pages that have already been labeled as being ‘‘of interest” or ‘‘not of interest” for this particular user. This problem is more dif- ficult than other analogous ones related to the construction of user models for content-based recommendation systems (Mooney, Ray- mond, & Loriene, 2000; Pazzani & Billsus, 2007) since, in this case, http://health.yahoo.com Parent 1 Parent 2 OR LT term1 value1 AND GE term4 value4 AND Offspring 1 Offspring 2 LE term2 value2 GT term3 value3 GE term5 value5 GE term5 value5 OR LT term1 value1 GE term4 value4 AND AND GT term3 value3 GE term5value5 LE term2 value2 Fig. 2. Selective crossover. Parent Offspring LT term1 value1 LE term2 value2 AND GT term3 value3 LT term1 value1 GE term4 value4 OR >OR 16) _ (house > 11) _ (science > 2) _ (aol 6 20 ^ on-line > 4))) THEN Recommend page to V1 user. ELSE No recommend page to V1 user. We can see that this rule is more complex because the words are limited by their frequency and it is more difficult to identify user preferences. For this, although both representations obtain similar results, after this study we can conclude that numerical represen- tation is less effective because it obtains less comprehensive rules. 7. Conclusions and future work This study describes the use of the G3P-MI algorithm for recom- mending Web Index Pages. This algorithm applies grammar- guided genetic programming to learn rules about whether or not a page referred to on a Web Index Page is of interest to a given user. To represent the Web Index Page, this algorithm applies the con- cept of multi-instances, representing the web pages as a set of in- stances where each instance represent the different referenced pages and stores information related to reference page. Two ver- sions of the algorithms have been developed, one which is applied when pages are represented in Boolean form and the other which is applied when the representation format is based on the fre- quency of the appearance of certain terms on a page. There has also been analyzed the possibility of carrying out a previous filtering of information to eliminate less discriminating terminology. The experiments carried out show that although there is no significant differences in the application of a Friedman test, the techniques using a previous information filtering produce better results than those that do not while those that use numerical representation produce a less understandable knowledge. Moreover, our proposal have been compared to the results of various versions of the kNN algorithm published by the Zhou team. The statistical tests carried out do not show significant differences in accuracy, although they do for precision and recall. In fact, our proposal is the one that produces the best results with respect to precision, while the Fretcit-kNN algorithm is that with the best re- sults for recall. Finally, some examples of the rules discovered with the G3P-MI algorithm are presented. These rules show the high de- gree of comprehensibility of the knowledge acquired with the G3P- MI algorithm as compared to in black box methods like those based on kNN. Although the results obtained are of great interest, we feel that the yield of the G3P-MI algorithm in the task of Web Index Page rec- ommendation could be improved in some ways. On one hand the algorithm has not always been able to establish an equilibrium among conflicting objectives. In this sense the problem could be considered from a multi-objective perspective to see if the balance of different objectives could be achieved more appropriately. On the other hand it has been confirmed that a reduction in the space ded- icated to characteristics can improve the yield of the algorithm. For this reason we would consider it to be of special interest to study the application of selection techniques for characteristics and com- pare the effect these would produce on the yield of our system. Acknowledgments This work has been subsidised in part by the research project SAINFOWEB (P05-TIC-00602) and the TIN2005-08386-C05-02, TIN2007-61079 and TIN2008-06681-C06-03 projects of the Span- ish Inter-Ministerial Commission of Science and Technology (CI- CYT) and FEDER funds. References Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS’02: Proceedings of neural information processing system (pp. 561–568). Auer, P. (1997). On learning from multi-instance examples: Empirical evaluation of a theoretical approach. In ICML’97: Proceedings of the 14th international conference on machine learning (pp. 21–29). San Francisco, CA, USA: Morgan Kaufmann Publishers. Banzhaf, W., Francone, F. D., Keller, R. E., & Nordin, P. (1998). Genetic programming: An introduction. On the automatic evolution of computer programs and its applications. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Chai, Y.-M., & Yang, Z.-W. (2007). A multi-instance learning algorithm based on normalized radial basis function network. LNCS (Vol. 4491). Chen, Y., Bi, J., & Wang, J. (2006). Miles: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 1931–1947. Chen, Y., & Wang, J. Z. (2004). Image categorization by learning and reasoning with regions. Journal of Machine Learning Research, 5, 913939. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1-2), 31–71. Felfernig, A., Friedrich, G., & Schmidt-Thieme, L. (2007). Guest editors’ introduction: Recommender systems. IEEE Intelligent Systems, 22(3), 18–21. Gärtner, T., Flach, P. A., Kowalczyk, A., & Smola, A. J. (2002). Multi-instance kernels. In ICML’02: Proceedings of the 19th international conference on machine learning (pp. 179–186). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Geyer-Schulz, A. (1995). Fuzzy rule-based expert systems and genetic machine learning (1st ed.). Heidelberg, Germany: Physica Verlag. Gu, Z., Mei, T., Tang, J., Wu, X., & Hua, X. (2008). MILC2: A multi-layer multi-instance learning approach to video concept detection (Vol. 4903). Herlocker, J., Konstan, J., Borchers, A., & Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In SIGIR’99: Proceedings of the 22nd annual international ACM conference on research and development in information retrieval, Berkeley, California, United States (pp. 230–237). Herlocker, J., Konstan, J., Terveen, L., & Riedl, J. (2004). Evaluating collaborative filtering recommender systems. ACM Transaction Information Systems, 22(1), 5–53. Kalai, A., & Blum, A. (1998). A note on learning from multiple-instance examples. Machine Learning, 30(1), 23–30. Long, P. M., & Tan, L. (1998). PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learning, 30(1), 7–21. Mangasarian, O. L., & Wild, E. W. (2008). Multiple instance classification via successive linear programming. Journal of Optimization Theory and Applications, 137(3), 555–568. A. Zafra et al. / Expert Systems with Applications 36 (2009) 11470–11479 11479 Maron, O., & Lozano-Pérez, T. (1997). A framework for multiple-instance learning. NIPS’97: Proceedings of neural information processing system (Vol. 10, pp. 570–576). Cambridge, MA, USA: MIT Press. Mooney, Raymond, J., & Loriene, R. (2000). Content-based book recommending using learning for text categorization. In Proceedings of the ACM international conference on digital libraries (pp. 195–204). Pao, H. T., Chuang, S. C., Xu, Y. Y., & Fu, H. . (2008). An em based multiple instance learning method for image classification. Expert Systems with Applications, 35(3), 1468–1472. Pazzani, M. J., & Billsus, D. (2007). Content-based recommendation systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The adaptive web: Methods and strategies of web Personalization. Lecture notes in computer science (Vol. 4321, pp. 325–341). Springer. Ramon, J., & De Raedt, L. (2000). Multi-instance neural networks. In Proceedings of the ICML-workshop on attribute-value and relational learning. Ruffo, G. (2000). Learning single and multiple instance decision tree for computer security applications. Ph.D. thesis, Department of Computer Science. University of Turin, Torino, Italy. Schafer, J. B., Herlocker, D. F. J., & Sen, S. (2007). Collaborative filtering recommender systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The adaptive web: Methods and strategies of web personalization. Lecture notes in computer science (Vol. 4321, pp. 291–324). Springer. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. Ventura, S., Romero, C., Zafra, A., Delgado, J. A., & Hervás, C. (2008). JCLEC: A java framework for evolutionary computation soft computing. Soft Computing, 12(4), 381–392. Wang, J., & Zucker, J.-D. (2000). Solving the multiple-instance problem: A lazy learning approach. In ICML’00: Proceedings of the 17th international conference on machine learning (pp. 1119–1126). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Whigham, P. A. (1996). Grammatical bias for evolutionary learning. Ph.D. thesis, School of Computer Science, University College, University of New South Wales, Australian Defence Force Academy, Canberra, Australia. Yang, Y., & Padmanabhan, B. (2005). Evaluation of online personalization systems: A survey of evaluation schemes and a knowledge-based approach. Journal of Electronic Commerce Research, 6(2), 112–122. Zhang, L., Jack, L. B., & Nandi, A. K. (2005). Fault detection using genetic programming. Mechanical Systems and Signal Processing, 19(2), 271–289. Zhang, M.-L., & Zhou, Z.-H. (2004). Improve multi-instance neural networks through feature selection. Neural Processing Letters, 19(1), 1–10. Zhang, M.-L., & Zhou, Z.-H. (2006). Adapting rbf neural networks to multi-instance learning. Neural Processing Letters, 23(1), 1–26. Zhang, Q., & Goldman, S. (2001). EM-DD: An improved multiple-instance learning technique. In NIPS’01: Proceedings of neural information processing system (Vol. 14). Zhou, Z.-H., Jiang, K., & Li, M. (2005). Multi-instance learning based web mining. Applied Intelligence, 22(2), 135–147. Zhou, Z.-H., & Zhang, M.-L. (2007). Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 11(2), 155–170. Zucker, J.-D., & Chevaleyre, Y. (2000). Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In Proceedings of the 14th Canadian conference on artificial intelligence. Lecture notes in artificial intelligence, Ottawa, Canada (pp. 204–214). Multi-instance genetic programming for web index recommendation Introduction Multiple instance learning Grammar-guided genetic programming for multiple instance learning Individual representation Initialization Genetic operators Selective crossover Selective mutation Evolutionary algorithm Web index recommendation: a multiple instance problem Experimental setup Data sets Algorithm details G3P-MI variants Fitness function Other algorithm parameters Results and discussion Comparing different versions of G3P-MI Comparing G3P-MI with other proposals Knowledge acquired Conclusions and future work Acknowledgments References