Editorial: Third special issue on Knowledge Discovery and Business Intelligence Paulo Cortez1 Manuel Filipe Santos1 1 ALGORITMI Research Centre, Department of Information Systems, University of Minho, 4800-058 Guimarães, Portugal Email: pcortez@dsi.uminho.pt, mfs@dsi.uminho.pt 1 Introduction Expert Systems (ES) were proposed in the mid 1970s [Arnott and Pervan, 2014] with the goal of building computerized systems that mimic human behavior to solve real-world tasks. Such systems were based on Artificial Intelligence (AI) techniques, typically by adopting explicit (human understandable) knowledge, extracted from domain experts (e.g. by using interviews), and that was stored in a knowledge base [Buchanan, 1986]. In the last decades, the world as changed due to advances in information and communication technology (e.g. massive usage of computers and personal mobile devices, Internet and social media, usage of digital cameras and other sensors). In effect, we are now in the age of data, where a large portion of organizational, societal or personal activities is captured digitally [Mojsilovic, 2014]. Following this change, ES have evolved to include data-driven models, either solely or complemented by expert-driven knowledge. Such change is reflected in the ES journal, which currently publishes several articles related with data analysis fields (e.g. analytics and business intelligence, data mining and knowledge discovery, big data, data science). In this special issue, we highlight two of data related terms: Knowledge Discovery (KD) and Business Intelligence (BI). KD is often used as a synonym of data mining and it consists in an AI subfield that uses machine learning algorithms to extract high-level interesting knowledge from raw data [Fayyad et al., 1996]. BI is a popular management term [Arnott and Pervan, 2014] and it represents several technologies (e.g., data warehouses, KD, dashboards) that store and process organizational data in order to support managerial decision-making [Delen et al., 2014]. The ‘Knowledge Discovery and Business Intelligence’ (KDBI) thematic track was proposed for the EPIA conference on AI in 2009 with the goal of promoting the interaction between the KD and BI areas. Since then, the track has been in- cluded in all EPIA biennial conferences. After 2011, the KDBI track has been associated with special ES journal issues, which included extended versions of the best KDBI papers. The first special issue was published in 2013 and it included the best KDBI 2011 track papers [Cortez and Santos, 2013], while the second spe- cial issue appeared in 2015 and it encompassed the best KDBI 2013 track papers [Cortez and Santos, 2015]. 1 This issue, entitled ‘Third special issue on Knowledge Discovery and Business Intelligence’, contains recent KD and BI contributions that can be used in ES to produce a valuable impact in real-world applications. It includes extended versions of papers from the 4th KDBI thematic track, of the 17th EPIA conference on AI (EPIA 2015), held in Coimbra, Portugal. The track received 18 paper submissions and the authors of the best papers were invited to extend their works for this special issue. After two rounds of reviews, which included reviewers from the KDBI track and also ES journal, the best six papers were accepted, corresponding to an overall acceptance rate of 33%. Due to the interest in data-driven models, in the last years there has been sev- eral interesting developments in the KDBI area. Despite this progress, there are still many challenges and opportunities. For instance, most KD algorithms were targeted for single label classification tasks and often these algorithms cannot deal adequately with label ranking, which is useful in several real-world applications (e.g. modeling user preferences). Also, the financial domain still rises many challenges: it is not clear what is the best approach (regression or classification) to forecast trading actions; and easy to interpret tools are needed to better disclose the rela- tionships among price variations from distinct financial products. Moreover, most of the real-world data has a temporal dimension and there is still room for proposing specialized algorithms that use this dimension in the KD or BI process (e.g. time series retrieval, visual representation of temporal changes). Furthermore, the ‘Ex- tract, Transform, and Load’ (ETL) is a vital component of BI systems but it often requires a substantial manual effort in terms of its design and implementation, even when there are several common ETL processes that are repeated through distinct BI projects. All these challenges and opportunities are addressed in the six papers accepted in this special issue, which we will briefly detail in the next section. 2 Contents of the special issue In the first paper ‘Label Ranking Forests’, de Sá et al (2016) propose a novel KD algorithm for Label Ranking (LR), which is a classification variant task. The goal is to learn the implicit function that performs a mapping between a set of inputs, which characterize an item, and a ranking of labels (instead of a single label, as in standard classification). LR is used in several real-world applications, such as mi- croarray analysis, image categorization or modeling user preferences. The proposed Label Ranking Forests (LRF) algorithm uses an ensemble of decision tree methods for LR and it is considered a natural adaptation of the popular random forest algo- rithm for LR. Several experiments were held using 16 datasets from the KEBI Data Repository. Overall, competitive LR results were achieved by the LRF algorithm when compared with LR decision tree methods. In ‘A comparative study of approaches to forecast the correct trading actions’, Báıa and Torgo (2016) perform an extensive comparison of two main approaches to forecast trading actions of financial markets. The first approach uses standard regression models to predict the daily variation on prices and then uses pre-defined decision rules in order to transform the numeric predictions into trading actions. The second approach uses classification models that directly forecast the trading decision 2 (hold, buy, sell). The data analyzed included assets prices of 12 companies, ranging from 7 to 30 years of closing prices daily data. Several machine learning models were tested (e.g. neural networks, support vector machines, random forests). The main conclusion of the paper is that there is no significant difference between the two main approaches: numeric price variation prediction and and direct classification of trading actions. The study contains two additional recommendations: re-sampling strategies (for regression or classification) are not recommended in this financial domain, even if the data is imbalanced; also, the usage of cost-benefit matrices is promising to enhance the classification models. Also approaching the financial context, in ‘Ramex-Forum: a tool for displaying and analysing complex sequential patterns of financial products’, Tiple et al (2016) present an improved version of the Ramex-Forum algorithm, leading to a KD method that is capable of extracting knowledge from multivariate time series in a visual way and that is easy to interpret. The algorithm was applied in two real-world applications: petroleum production prices and risk analysis of European financial institutions. The obtained results attest the algorithm capability to show relevant price variations in financial markets. In ‘Aggressive pruning strategy for time series retrieval using a multi-resolution representation based on vector quantization coupled with discrete wavelet trans- form’, Muhammad Fuad (2016) proposes a novel KD algorithm for time series in- dexing and retrieval. The algorithm adopts a multi-resolution representation of the series based on Haar wavelets and vector quantization. The experiments were con- ducted using 10 time series with distinct sizes and from different repositories. The proposed algorithm obtained competitive results when compared with a two other representation methods (single-resolution and other multi-resolution algorithm). Temporal data was also studied in the paper of Gérk (2016), entitled ‘Visual analytics of educational time-dependent data using interactive dynamic visualiza- tion’. The paper presents a new web-based visualization framework for BI and that is targeted for an interactive exploration by the decision maker. In particular, the framework is based on motion charts and clustering techniques, allowing to reveal changes over time in terms of two-dimensional space animations. The framework was demonstrated using real-world educational data. A total of 16 human partici- pants evaluated the quality of the framework, confirming the proposed animations as an interesting contribution to the analytic process. In the last paper, Oliveira and Belo (2016) approach ETL processes, which play a key role in BI by extracting data from several sources into a data warehousing system. The paper, entitled ‘On the specification of ETL patterns behavior, a domain-specific language approach’, proposes the use of patterns to represent com- mon ETL tasks (e.g. surrogate key pipelining or intensive data loading). Using the business process modeling notation (BPMN), it is shown how a typical and crucial ETL process (data quality enhancement) can be improved in terms of its design and implementation. 3 Acknowledgments We would like to thank the other KDBI 2015 track (of EPIA) co-organizers, Lúıs Cavique, João Gama and Nuno Marques. Also, we thank the authors, who con- tributed with their papers, and the reviewers (from the KDBI 2015 program com- mittee and the ES journal). This work has been supported by COMPETE: POCI- 01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013. References [Arnott and Pervan, 2014] Arnott, D. and Pervan, G. (2014). A critical analysis of decision support systems research revisited: the rise of design science. Journal of Information Technology, 29(4):269–293. [Báıa and Torgo, 2016] Báıa, L. and Torgo, L. (2016). A comparative study of approaches to forecast the correct trading actions. Expert Systems, pages –. [Buchanan, 1986] Buchanan, B. G. (1986). Expert systems: working systems and the research literature. Expert Systems, 3(1):32–50. [Cortez and Santos, 2013] Cortez, P. and Santos, M. F. (2013). Knowledge Discov- ery and Business Intelligence. Expert Systems, 30(4):283–284. [Cortez and Santos, 2015] Cortez, P. and Santos, M. F. (2015). Recent advances on knowledge discovery and business intelligence. Expert Systems, 32(3):433–434. [Delen et al., 2014] Delen, D., Turban, E. and Sharda, D. R. (2014). Business In- telligence: A Managerial Perspective on Analytics. Pearson, 3rd edition. [de Sá et al, 2016] de Sá, C. R., Soares, C., Knobbe, A. and Cortez, P. (2016). Label Ranking Forests. Expert Systems, pages –. [Fayyad et al., 1996] Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). Ad- vances in Knowledge Discovery and Data Mining. MIT Press. [Muhammad Fuad, 2016] Muhammad Fuad, M. M. (2016). Aggressive pruning strategy for time series retrieval using a multi-resolution representation based on vector quantization coupled with discrete wavelet transform. Expert Systems, pages –. [Géryk, 2016] Géryk, J. (2016). Visual analytics of educational time-dependent data using interactive dynamic visualization. Expert Systems, pages –. [Mojsilovic, 2014] Mojsilovic, A. (2014). The age of data and opportuni- ties. IBM Big Data & Analytics Hub, http://www.ibmbigdatahub.com/blog/ age-data-and-opportunities. 4 [Oliveira and Belo, 2016] Oliveira, B. and Belo, O. (2016). On the specification of ETL patterns behavior, a domain-specific language approach. Expert Systems, pages –. [Tiple et al, 2016] Tiple, P., Cavique, L. and Cavalheiro Marques, N. (2016). Ramex-Forum: a tool for displaying and analysing complex sequential patterns of financial products. Expert Systems, pages –. The authors Paulo Cortez Paulo Cortez is Associate Professor with Habilitation at the Department of Informa- tion Systems, University of Minho, Portugal. He is also coordinator of Information Systems and Technologies (IST) research group of ALGORITMI Centre with 48 PhD researchers. His research interests include: Business Intelligence (Decision Support, Data Mining and Forecasting); and Artificial Intelligence (Computational Intelli- gence, Neural Networks, Evolutionary Computation and Applications). Currently, he is associate editor of the Expert Systems journal and participated in 13 R&D projects (principal investigator in 3). He is co-author of more than one hundred indexed (ISI, Scopus) publications in international journals (e.g., Decision Support Systems, Applied Soft Computing) and conferences (e.g., IEEE IJCNN). Web-page: http://www3.dsi.uminho.pt/pcortez Manuel Filipe Santos Manuel Filipe Santos is Associate Professor at the Department of Information Sys- tems, University of Minho, Portugal, and leader of the Intelligent Data Systems group of Centro Algoritmi, with interests in the fields of: Business Intelligence, Data Mining and Learning Classifier Systems. He participated in several R&D projects (principal investigator in 3). He is also co-author of several publications in international journals (e.g., Artificial Intelligence in Medicine) and conferences (e.g., GECCO). 5