06 April 2021 AperTO - Archivio Istituzionale Open Access dell'Università di Torino Original Citation: Trace retrieval for business process operational support Published version: DOI:10.1016/j.eswa.2015.12.002 Terms of use: Open Access (Article begins on next page) Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law. Availability: This is the author's manuscript This version is available http://hdl.handle.net/2318/1557793 since 2016-03-11T10:58:22Z Trace Retrieval for Business Process Operational Support Alessio Bottrighia, Luca Canensib, Giorgio Leonardia,∗, Stefania Montania, Paolo Terenziania aDISIT, Computer Science Institute, Università del Piemonte Orientale, Alessandria, Italy bDepartment of Computer Science, Università di Torino, Italy Abstract Operational support assists users while process instances are being exe- cuted, by making predictions about the instance completion, or recommend- ing suitable actions, resources or routing decisions, on the basis of the already completed instances, stored as execution traces in the event log. In this paper, we propose a case-based retrieval approach to business pro- cess management operational support, where log traces are exploited as cases. Once past traces have been retrieved, classical statistical techniques can be applied to them, to support prediction and recommendation. The framework enables the user to submit queries able to express complex patterns exhibited by the current process instance. Such queries can be composed by several simple patterns (i.e., single actions, or direct sequences of actions), separated by delays (i.e., actions we do not care about). Delays can also be imprecise (i.e., the number of actions can be given as a range). The tool also relies on a tree structure, adopted as an index for a quick retrieval from the available event log. Our approach is highly innovative with respect to the existing litera- ture panorama, since it is the first work that exploits case-based retrieval techniques in the operational support context; moreover, the possibility of ∗Corresponding author. Tel. +39 0131 360340 Email addresses: alessio.bottrighi@uniupo.it (Alessio Bottrighi), canensi@di.unito.it (Luca Canensi), giorgio.leonardi@mfn.unipmn.it (Giorgio Leonardi), stefania.montani@uniupo.it (Stefania Montani), terenz@di.unito.it (Paolo Terenziani) Preprint submitted to Expert Systems with Applications October 1, 2015 retrieving traces by querying complex patterns and the indexing strategy are major departures also with respect to other existing trace retrieval tools proposed in the Case Based Reasoning area. Thanks to its characteristics and methodological solutions, the tool im- plements operational support tasks in a flexible and efficient way, as demon- strated by our experimental results. Keywords: Trace retrieval, Case Based Reasoning, Operational Support 1. Introduction Operational support is a process management activity meant to assist users while process instances are being executed (see der Aalst (2011), chap- ter 9). If an instance is still running, it is possible that information about already completed instances can be exploited to ensure the correct or effi- cient handling of the current instance itself. Completed instances are stored in the so-called event log, which records the sequences (traces henceforth) of actions executed at a given organization, typically together with action execution parameters (e.g., times, costs, resources). Specifically, operational support can be articulated into three main tasks: • detection: compares an existing process model to the current instance data (i.e., the information about the already executed actions in the current trace). If the model control flow (or other predefined rule) is violated, an alert is generated; • prediction: exploits the event log and the current instance to make predictions about the instance completion (e.g., time to completion, costs, use of resources, possible problems); • recommendation: uses the event log and the current instance to recom- mend suitable actions, resources or routing decisions (in order, e.g., to minimize costs or time to completion). While detection is more concerned with constraint satisfaction and with the exploitation of an existing normative model, prediction and recommen- dation heavily rely on experiential knowledge, stored in the event log in the form of past process traces. 2 Traditional operational support tools, like those in the ProM (van Dongen et al., 2005) framework, operate by building a transition system and by replaying log traces on it (van Dongen, 2007). However, also a direct use of the experiential knowledge in the log can provide crucial contributions to support prediction and recommendation. In particular, analogical reasoning (as enabled, e.g., by the Case Based Reason- ing (CBR) (Aamodt & Plaza, 1994) methodology) can be adopted to this end. The CBR methodology is articulated into four steps: (1) retrieve past experiential knowledge, in the form of past “cases”, similar to the current input situation; (2) reuse the retrieved case solution(s) (possibly after an adaptation procedure), to solve the input situation; (3) revise the newly solved case for correctness; (4) retain the new case, to enrich the system knowledge base. In this paper, we propose a case-based retrieval framework for opera- tional support. We therefore focus on the first step of the CBR methodology. In detail, we represent log traces as cases, and, given an ongoing process instance as an input, we retrieve past traces similar to it. Once past traces have been retrieved, classical statistical techniques can be applied to them, to support prediction and recommendation. For instance, the percentage of retrieved traces that, e.g., were completed on time, can be used to calculate the probability that the current instance will complete on time too. A similar approach can be adopted to estimate costs, or predict problems. Moreover, the best actions to execute next can also be extracted from the retrieved traces. In the following, we will however concentrate on trace retrieval, being the discussion on statistical techniques out of the scope of this paper. The approach we propose is, to the best of our knowledge, the first one to provide such a direct exploitation of experiential knowledge in the context of operational support. Notably, trace retrieval is, in general, a quite chal- lenging and complex task. The input process instance is a partial execution trace, and the easiest form of retrieval can be achieved by looking for all the traces in the log that contain the input trace as a prefix. However, in many applications, more flexibility is required. Indeed, the user may observe that the input instance contains specific patterns (i.e., consecutive or non- consecutive sequences of actions), able to strongly influence the next actions to be executed, or the overall performances (e.g., completion time); patterns can be rather complex, and involve temporal indeterminacy. Therefore, s/he may want to abstract from the specificities of the input trace, by identifying 3 these patterns, and looking for all the traces that match them, in order to get a recommendation on how to deal with these problems on the basis of past experience. The medical domain, among others, includes many situations where this non-trivial trace retrieval can provide a valuable help to tackle complicated clinical cases. For example, in the treatment of stroke, particular atten- tion must be paid to suspected Subarachnoid Hemorrhage (SAH). The latest clinical guidelines1 suggest performing a Lumbar Puncture (LP) to patients where the SAH is suspected but not confirmed by the proper laboratory ex- aminations, in order to investigate the presence of blood in the cerebrospinal fluids. However, LP is an invasive procedure and not all the patients can tolerate it without paying some consequences. For this reason, the chance of executing it must be carefully evaluated, and the use of information collected from the past similar situations can help in deciding whether or not the exam should be performed. Let us consider a particular case, where a suspicion of SAH arises for a non-epileptic patient who experiences a sudden epileptic attack. Once the patient has been transported to the hospital, specific tests are performed, such as Transcranial Doppler (TD) and a subsequent Cerebral Angiography (CA). If the results of these tests do not clearly indicate the presence of SAH (i.e., there is a positive TD report, then a negative CA report), the clinical guidelines suggest the execution of a Computerized Tomography (CT) scan as a further exam. If also the TC scan report is negative, the differential diagnosis could be conducted through the observation of the spinal fluid, to check whether the blood has penetrated into the spinal liquid. However, as mentioned before, this procedure is invasive and stressful for patients affected by serious conditions, but it could be important in order to gain a definitive and certain answer for assessing the actual presence of SAH. This assessment is also important because it can radically alter the treatment plan to be administered to the patient. In order to estimate more accurately the cost/benefit ratio of performing the SP, it is very useful to look at what has been done in the past, by retrieving all the execution traces which follow a query pattern which starts with an epileptic attack, followed by a positive TD report, then by a negative CA report and finally by a negative TC report. No matter if and when other 1http://www.iso-spread.it/, last accessed on September, the 30th, 2015 4 medical procedures appear among the diagnostic exams mentioned before: stroke patients typically undergo many other routine tests (such as blood test or chest RX) that are not specifically relevant for SAH, but may be interleaved between the key actions. Trace retrieval by querying complex patterns as the one illustrated in this example is supported in our framework. It is then possible to analyze the retrieved traces in order to calculate: the percentage of patients who received the SP; the percentage of patients who tolerated this procedure; how many times the execution of the SP allowed for a correct differential diagnosis, and what therapeutic plans have been set up depending on the different cases. On the basis on this information, an appropriate cost/benefit analysis can be conducted, in order to decide whether or not the SP should be executed, and to obtain some suggestions about the most common treatment plans to be administered to the patient. Interestingly, our framework also relies on a tree structure, called trace tree, allowing for a quick retrieval, by avoiding a flat search for all the traces in the log that satisfy the input pattern. The trace tree is a sort of “model” of the traces, that we learn using a process mining technique we recently implemented (Canensi et al., 2014). As we will see in this paper, the trace tree can be used as an index of the traces in the log, supporting also the search of traces corresponding to complex query patterns. It is worth noting that the work in Canensi et al. (2014) only dealt with the construction of the process model, i.e., the tree structure, but did not propose any exploitation of it as an index for trace retrieval. As such, the research objective of the present paper, fully focused on trace retrieval for operational support, and the content of all the technical sections (except Section 2.1, which summarizes the approach in Canensi et al. (2014)), are completely new with respect to our previous work. In synthesis, in this paper we describe a flexible and efficient case-based retrieval approach, which also allows to query complex pattern to be searched for in the traces. Our approach is highly innovative with respect to the existing literature panorama, in that: • we propose, to the best of our knowledge, the first work that exploits case-based retrieval techniques on the event log in the operational sup- port context; • we not only support exact prefix retrieval, but also non-trivial trace retrieval, in which complex query patterns are looked for; 5 • we take advantage of the trace tree structure to speed up the retrieval process. The latter two points represent major departures also with respect to other existing trace retrieval tools proposed in the CBR area (see Section 4). The paper is organized as follows. In Section 2 we technically describe our approach. For the sake of completeness, we first briefly summarize our algorithm to build the log tree (see Section 2.1), which acts like an index of the traces. We then move to the novel technical contribution of this paper, describing our query language and retrieval algorithms. In Section 3 we present our experimental results. In Section 4 we discuss related work. In Section 5 we present our concluding remarks and future work directions. 2. Methods This Section presents the details of our approach. In particular, in Section 2.1 we describe the trace tree structure, and sketch the algorithm to build it, which was extensively illustrated in Canensi et al. (2014). In Section 2.2, which represents the core novel technical contribution of this paper, we illustrate our retrieval approach. 2.1. Mining the trace tree Our framework relies on a tree structure, called trace tree, allowing for an efficient retrieval of the traces that satisfy the input pattern. The trace tree is a sort of “model” of the traces, that we learn using a process mining technique we recently implemented (Canensi et al., 2014), and built in such way that it can be used as an index. Several approaches to process mining exist in the literature (e.g., those in ProM (van Dongen et al., 2005)), but, despite some differences, many of them show important similarities, and have common limitations (see also (Cnudde et al., 2014)): • they learn “context-free” patterns of processes; • they can mine paths that do not correspond to any input trace in the log (i.e., they can have a limited precision (Buijs et al., 2012)); • they do not explicitly relate the mined patterns to the log (in the sense that there is no explicit correspondence between mined patterns, and the traces in the log “supporting” them). 6 Such limitations are quite relevant in general, and very relevant in some specific domains, such as the medical one. Concerning the first limitation, it is well known that, e.g., the same (set of) actions may produce different effects, depending on the context (e.g., on the medical actions previously performed on the patient). The impact of the second limitation is obvious and dramatic: if the miner precision is limited, in the sense that it may also learn a path that never appears in any input trace, this can be very harmful in all those applications where it is vital that mining results are reliable as much as possible. However, surprisingly, limited precision is a common limitation of many current miners. The third limitation is less critical, but still significant. Indeed, maintain- ing an explicit link between mined patterns and the input traces matching such patterns, can be important not only to characterize contexts, but also to provide users with an evidence of the learned output, and also to provide support for retrieving traces corresponding to a given pattern - which is our current objective. In Canensi et al. (2014), we therefore proposed an innovative approach, able to support support “context-aware” process mining, and overcome all the above limitations. The technical details of the approach are summarized below. Our mining algorithm takes in input an event log. The event log is stored as a matrix with n rows and m columns, where n is the number of traces in the log and m is the maximum length of these traces. Each cell Matrix[i, j] contains the j-th action of the trace i. Actions in the different traces are aligned on the basis of their order of execution (i.e., the j index). All traces start with a dummy common action #. The algorithm outputs the mined process as a trace tree, where nodes represent actions, and arcs represent a control flow (i.e., precedence, XOR choice) relation between them. Indeed, we exploit the temporal ordering of actions in the log traces to mine the process model, maintaining all the local and global ordering relationships between the actions. More precisely, in the trace tree, each node is represented as a pair < P, T >. P denotes a (possibly unary) set of actions; actions in the same node are in AND relation, or, more properly, may occur in any order with respect to each other. Note that, in such a way, each path from the starting node of the tree to a given node N denotes a set of possible process patterns (called support patterns of N henceforth), obtained by following the order represented by 7 the arcs in the path to visit the trace tree, and ordering in each possible way the actions in each node (for instance, the path {A, B} → {C} represents the support patterns “ABC” and “BAC”). T represents a set of pointers to all and only those traces in the log whose prefixes exactly match the path from the root to one of the patterns in P (called support traces henceforth). Specifically, prefixes correspond to the entire traces if the node at hand is a leaf. In the case of a node representing a set of actions to be executed in any order, T is more precisely composed of several sets of support traces, each one corresponding to a possible action permutation. This choice enhances retrieval performances, as we will discuss in section 2.2.2. Algorithm 1 below builds the trace tree. ALGORITHM 1: Mining pseudocode 1 Build-Tree (index,< P, T >) ; 2 nextP ← getNext(index+1, T) ; 3 if nextP not empty then 4 nextActions ← XORvsAND (nextP, T) ; 5 foreach node < P ′, T ′ > ∈ nextActions do 6 AppendSon(< P ′, T ′ >,< P, T >) ; 7 Build−Tree(index+ |P ′|,< P ′, T ′ >) ; 8 end 9 end The function Build-Tree in Algorithm 1 takes in input a variable index, representing a given position in the traces (i.e., a column in the input matrix), and a node. Initially, it is called on the first position, and on the root of the tree (which is a dummy node, corresponding to the # action; thus, initially, index=0, P=# and T is the set of all the traces). The function getNext simply inspects the traces in T to find all possible next actions. On these actions, the function XORvsAND applies the formula below in order to identify which actions are in AND and which are in XOR relation: we calculate the dependency frequency A → B between every action pair < A, B > in nextP ×nextP: A → B = 1 2 ( |A > B|∑ X∈ActT |A > X| + |A > B|∑ Y ∈ActT |Y > B| ) (1) 8 where, always considering the traces in T , |A > B| is the number of traces in which A is immediately followed by B, |A > X| is the number of traces in which A is immediately followed by some action X (with X ∈ ActT , being ActT the set of all the actions appearing in the traces in T), and |Y > B| is the number of traces in which B is immediately preceded by some action Y (with Y ∈ ActT ). After evaluating the dependency frequency value A → B and B → A, we can have 3 possible situations: • if both the values are below a given threshold, this means that A and B rarely appear in the same trace, therefore they are in XOR relation; • if A → B is above the threshold and B → A is below, then A precedes B, and vice versa; • if both the values are above the threshold, then A and B are in AND (any-order) relation. The output nextActions of the function XORvsAND is a set of nodes < P ′, T ′ >, one for each maximal set of actions to be AND-ed. P’ is therefore a set of action in AND (and a subset of P). Note that, for each one of such sets P’, the corresponding set T’ of support traces where these actions in AND take place is also computed, as a subset of T. Finally, each new node is appended in the output tree (function Append- Son), and Build-Tree is recursively applied to each node (with the parameter index properly set, and passing < P ′, T ′ > as the last parameter). 2.2. Trace retrieval In this Section, we first describe our query language. Then, we present the retrieval algorithm, specifying how traces satisfying a query can be retrieved, taking advantage of the trace tree structure. 2.2.1. Query language In our framework, the user can issue a query, composed of one or more simple patterns to be searched for. In turn, simple patterns are defined as one or more actions in direct sequence. Multiple simple patterns can be combined in a complex pattern, by separating them by delays. A delay is a sequence of actions between two simple patterns; the semantics is that we do not care about these actions, so they will not be specified by the query. Instead, only their number will be provided, possibly in an imprecise way 9 (i.e., we allow the user to express the number of actions as a range). Formally, a query is represented in the following format: ⟨(min1, max1)SP1...(mink, maxk)SPk(mink+1, maxk+1)⟩ (2) where: • SPj is a simple pattern (i.e. a sequence of symbols, representing the actions we are looking for; these actions have to be in direct sequence); • (minj, maxj) is the delay between two items (i.e., two simple patterns, or a simple pattern and the trace starting/ending point), expressed as a range in the number of actions. As an example, the query ⟨(0, 0)B(0, 1)E(2, 2)Z(0, 1)⟩ (3) looks for action B, which has to start at the very beginning of the trace. This first simple pattern B must be followed (with zero or a single action in between) by action E. E must be followed by two actions, which we do not care about; after them, Z is required. Z can be the final action, or can be followed by one additional action we do not care about. It is worth noting that a query written as above corresponds to a whole set of queries, each one obtained by choosing a specific delay value and specific actions in each of the (minj, maxj) intervals. Every query in this set can be made partially explicit as a string, contain- ing as many dummy symbols ∗ as needed, to cover the corresponding delay length (where the dummy symbol is chosen because we are not interested in the specific actions). For example, the query above would correspond to the following four par- tially explicit queries, whose length ranges from 5 to 7 actions (not counting the intial dummy action #), where the dummy symbol ∗ has been properly inserted, according to the delay values information: BE ∗∗Z; BE ∗∗Z∗; B ∗E ∗∗Z; B ∗E ∗∗Z∗ Since each∗ could be substituted by any of the N types of actions recorded in the log and/or existing in the application domain, the example query corresponds to N2 + 2∗N3 + N4 totally explicit queries. The problem is obviously combinatorial, with respect to the possible delay ranges and action types. We thus believe that extensional approaches (in 10 which only explicit queries can be issued) would not be feasible in many domains. Our query language, allowing for compact “intensional” queries, is therefore a significant move in the direction of implementing an efficient and user-friendly operational support tool. Notably, exact prefix retrieval can be seen as a special case of the more general retrieval possibilities we offer, where only a single pattern is provided, and no delays are needed. 2.2.2. Retrieval In order to retrieve the log traces that satisfy a query, we have imple- mented a multi-step procedure, articulated as follows: • automaton generation; • tree search; • filtering. In the following, we will provide the details of the various steps, along with a running example based on the example query introduced in section 2.2.1, exploiting the trace tree in Figure 1. Automaton generation In our approach to trace retrieval, we first generate a deterministic automaton, that represents the query at hand. To build the automaton, we implement the following procedure: • (A) transform the query into a regular expression; • (B) apply the Berry & Sethi (1986) algorithm, to build a non-deterministic automaton that recognizes the regular expression above; • (C) unfold the non-deterministic automaton; • (D) transform the unfolded non-deterministic automaton into a deter- ministic automaton (Lam et al., 2006). Steps (A) and (D) are trivial. As regards Step (A) note that our query language is just a variation of regular expressions, useful to express delays and “do not care” (i.e., dummy) symbols in a compact way. The cost of Step (A) is linear in the number of delays used in the query. Steps (B) and (C) use classical algorithms in the area of formal languages. The cost of Step (B) is linear in the number of symbols in the query expressed as a regular 11 Figure 1: Trace tree in the example. 12 10 9 0 1 # 2 B 3 ε 4* E 5 E 6 * 7 * 8 Z * ε Figure 2: Non deterministic automaton in the example. On the edges: symbol consumed in the corresponding transition; ∗ means that any symbol can be consumed; ε means that no symbol is consumed. expression (Berry & Sethi, 1986) (i.e., the output of Step (A)), and the cost of Step (C) is the product between the number of dummy symbols in the query and the cardinality of the action symbols available in the log. Step (D) substitutes each arc labeled by the dummy symbol in the automaton with a set of arcs, one for each action in the event log. Although in the worst case Step (D) is exponential with respect to the number of states in the automaton (i.e., the output of Step (B)), note that the worst case is rare in practice (van Leeuwen, 1994). Example Referring to our example query, the generated non-deterministic automaton (see Fig. 2) has 9 states and 10 transitions; the state 0 is the start state and states 8 and 9 (double circled in Fig. 2) are the accepting states. Tree search Once the deterministic automaton has been obtained, it would be possible to exploit it in a classical way, by providing all event log traces in input to it, to verify which of them satisfy the query. However, some of these traces may be identical, or share common prefixes of various length, so that the straightforward approach would lead to repeated analyses of the common parts. In order to optimize efficiency, we have therefore proposed a novel ap- proach, that provides the trace tree as an input to the automaton. Each path in the trace tree may index several identical support traces, that will be considered only once, thus speeding up retrieval with respect to a flat search into the event log. Moreover, in the tree, common prefixes of different traces are represented just once, as common branches close to the root (different postfixes can then stem from the common branches, to reach the various leaves). These common parts will be executed on the automaton only once, 13 without requiring for repeated, identical checks. It is worth noting that providing a tree as an input to the automaton represents a significantly novel contribution, since in the formal languages literature the input to be executed on the automaton is typically a string. The work in Baeza-Yates & Gonnet (1996) represents an exception, but the tree it exploits (a Patricia tree) has very different semantics (and usage) with respect to ours. In detail, our approach operates as follows: the algorithm Search Process (see Algorithm 2) takes in input the trace tree T and the deterministic au- tomaton A, and provides as an output a set of pairs, composed of a trace tree leaf node and a corresponding string. Each of these strings is an explicit instantiation of the query represented by the automaton, verified by (some of) the support traces in the leaf node. The output support traces can then be retrieved from the event log and presented to the user, or provided as an input to the filtering step. Basically, Search Process executes a breadth first visit of the trace tree; it exploits the variable searching, defined as a set of triples, composed by a trace tree node, an automaton state, and the string that has been recognized on the automaton so far. Initially (lines 5-7), searching contains the sons of the root (since the root is a dummy action #, see section 2.1), all paired to the initial state of the query automaton and to the empty string. The visit procedure (lines 8-35) extracts one triple at a time from the set searching. If the node in the triple contains a set of actions to be executed in any order (line 11), we simulate all the permutations on the automaton, and save the states we reach and the corresponding recognized strings into the new states set (line 12). If the node contains one single action, we simply simulate it on the automaton, and save the state we reach and the corresponding string into the new states set (line 15). In both cases, the string saved in new states is the one in the input triple properly updated with the newly recognized symbols. After the simulation, if the node at hand is a leaf (line 18), then for each item in new states we check whether the state component is a final state (lines 19-20); if this is the case, node and the string paired to the final state are saved in the output variable result (line 21). Otherwise, if node is not a leaf, we pair its sons to all the items in new states, and save these objects into searching (lines 26-34). The visit terminates when searching is empty, i.e., all tree levels have been visited. The visit procedure is linear in the number of the trace tree nodes. 14 Example Referring to our example query, providing the trace tree in Figure 1 as an input to the Algorithm Search Process, searching initially contains the sons of the root A, B, and C, paired with the initial state of the deter- ministic automaton obtained from the non-deterministic one in Figure 2, and with the empty string. We simulate the actions A, B, and C on the automa- ton. Only B is recognized, generating a state saved in new states with the corresponding string B (line 15). We then pair the sons of node B (E, D, DE) to the item in new states and save these triples into searching (lines 26-34). Continuing the visit, particularly interesting is the case of node DE, which requires to consider all the possible permutations of actions D and E. Both the permutations DE and ED are initially recognized. However, as the visit proceeds and node PZ is reached, it turns out that DE must be followed by the permutation PZ to satisfy the query; on the other hand, if the choice ED is made, it must be followed by ZP . Indeed, the query imposes some constraints that cannot be checked only locally, i.e., referring to a single node along the branch. After this step of the visit (depth 4 in the tree), the recognized partial strings paired to node PZ are BDEOPZ and BEDOZP . When reaching the leaf node W , however, only BDEOPZW is recognized, and reported in output together with the corresponding leaf node W . Considering all the branches, the overall output of the Search Process algorithm is: ⟨ Z in branch 1, BEOPZ ⟩, ⟨ W in branch 4, BDEOPZW ⟩, ⟨ Z in branch 5, BEDWZ ⟩. Filtering As observed above, the tree search step outputs a set pairs, each com- posed of a leaf node and a string, the latter corresponding to an explicit instantiation of the input query issued by the user. If an output leaf node ends a branch which includes one or more nodes with actions to be executed in any order, it is possible that only some of the permutations of these actions are acceptable to answer the corresponding explicit instantiation of the input query; therefore, the support traces must be filtered accordingly. The filtering algorithm (see Algorithm 3) takes in input the trace tree T , and the output set of the tree search step (in search result), and operates as follows. For each item in search result, it extracts the node component and saves it in node (lines 5-6). If node is a leaf (line 9), function support is applied 15 ALGORITHM 2: Pseudo-code of the procedure Search Process. 1 Search Process(T, A) 2 Output: set of < node, string > 3 result ←{} 4 searching ←{} 5 foreach node ∈ sons(root(T)) do 6 searching ← searching ∪ < node, 0, empty > 7 end 8 repeat 9 tmp ←{} 10 foreach < node, state, string > ∈ searching do 11 if node is an any-order-actions node then 12 new states ← anyorder simulate(A, < node, state, string >) 13 end 14 else 15 new states ← simulate(A, < node, state, string >) 16 end 17 if new states ̸= {} then 18 if node is a leaf then 19 foreach < state, string > ∈ new states do 20 if final(state) then 21 result ← result ∪ < node, string > 22 end 23 end 24 end 25 else 26 foreach n ∈ sons(node) do 27 foreach < state, string > ∈ new states do 28 tmp ← tmp ∪ < n, state, string > 29 end 30 end 31 end 32 end 33 end 34 searching ← tmp 35 until searching ̸= {} 36 return result 16 ALGORITHM 3: Pseudo-code of the procedure Filtering. 1 Filtering(T, search result) 2 Output: references to traces 3 result ←{} 4 tmp ←{} 5 foreach < n, string > ∈ search result do 6 node ← n 7 tmp ←{} 8 foreach element ∈ string do 9 if node is a leaf then 10 tmp ← support(node, element) 11 end 12 else 13 if node is an any-order-actions node then 14 tmp ← tmp ∩ support(node, element) 15 end 16 end 17 node ← father(node) 18 element ← update(element) 19 end 20 result ← result ∪ tmp 21 end 22 return result 17 to the node itself, and to the tail portion (element) of the corresponding string: in particular, if node contains a single action, element is a just the last symbol in the string; otherwise, it is a sequence of as many symbols as the number of actions to be executed in any order in node, always computed from the last position. The function support outputs in tmp the traces verifying element: they are a subset of the support traces of node, if node contains actions in any order, since element identifies the single acceptable permutation, in this case. Otherwise, all the support traces of node are returned (line 10). The variable node is then updated by considering the father of the leaf, along its branch (line 17); element is updated accordingly, moving towards the start of the string by as many symbols as the number of actions in node (line 18). At the next iteration, if node contains actions to be executed in any order (line 13), the algorithm calculates the intersection between tmp and the output of the function support (calculated as above), in order to always keep all and only the traces supporting the portion of the string analysed so far (line 14). The algorithm terminates when the string has been fully examined (i.e., we have reached the sons of the root in T), and outputs the references to the filtered traces in result (line 22). The complexity is therefore the number of elements in search result (i.e., the number ⟨leaf node, string ⟩ pairs identified as an output of the previous step), by the length of the longest string in search result. Example We exemplify the algorithm Filtering on the item ⟨ W in branch 4, BDEOPZW ⟩. We set node = W (line 6) and element=W; node W is a leaf and does not contain actions to be executed in any order. Therefore, function support outputs in tmp all the support traces referenced by this node. We then update node=PZ. At the following iteration, element=PZ, and, since node contains two actions to be executed in any order, we calculate the intersection between tmp and the support traces corresponding to the permutation PZ (line 14). Variable node is then set to O. At the following iteration, element=O, and tmp does not change. node is set to DE. At the next iteration, element=DE, and we calculate the intersection between tmp and the support traces corresponding to the permutation DE (line 14). The last iteration is trivial, and does not change the content of tmp, which, in the end, contains the references to all and only the traces supporting the string 18 BDEOPZW , added to the output result (line 22). Obviously, if the leaf node ends a branch that contains no nodes with actions to be executed in any order, the leaf support traces can be directly presented to the user, and the filtering step is not required. 3. Results We have compared our method to a very classical approach, i.e., an ex- isting regular expression processor provided by the Java Regex APIs, not coupled with any indexing strategy2. The experimental database was taken from the Datacentrum website3. Datacentrum archives research data in a standardized, secure and well doc- umented manner, and provides permanent access to them. Specifically, we used a synthetic dataset referring to a loan assessment process. The dataset is composed of 10000 traces, expressed as sequences of 8 to 27 actions. 90% of the traces are 8 to 12 action long. Overall, the dataset contains 14 different action types. For the experiments, we used a machine equipped with an Intel i7-4810MQ CPU @ 2.80GhZ, 8GB RAM, SSHD Hybrid 64MB Cache. In detail, we performed a scalability test. To this end, we made random samplings of the overall available traces, defining 5 subsets of the original database, of dimensions 1000, 3000, 5000, 7000 and 10000 traces respectively (where the last sample is obviously the whole database). We then defined two different query types, and executed 1000 queries for each type. We then calculated average query answering times. The first query type was characterized by the presence of a short delays: the sum of the maximum ranges of the delays is at most three. Overall, the query could contain up to three delays, as in Examples (A), (B) and (C) below. Example (A): (0,0) Loan application received Check application form completeness 2The interested reader can find information about Regex at the link http://www.regular-expressions.info/engine.html - last accessed on July 16, 2015 3http://datacentrum.3tu.nl/en/home/, last accesses on July 16, 2015 19 (3,5) Reject application Loan application rejected (0,0) Example (B): (0,2) Appraise property (1,2) Assess eligibility Reject application Loan application rejected (0,0) Example (C): (0,1) Check application form completeness (2,3) Assess loan risk Assess eligibility (1,2) Loan application rejected (0,0) The second query type was characterized by the presence of longer delays, whose overall length could be up to 5 actions, as in Example (D) below. Example (D): (0,0) Loan application received Check application form completeness (0,5) Reject application Loan application rejected (0,0) The same queries were executed both using our approach and using the Regex Java processor on the 5 archives of different dimensions. The average retrieval times for the two query types are shown in Tables 1 and 2. In the Tables, referring to our method, the times of the three steps (automaton generation, tree search and filtering) are also detailed. 20 Table 1: Comparison of retrieval times (in msec) using our method (trace tree - TT) and using the Regex Java regular expression processor in a scalability test on type 1 queries (short delays) DB dimension TT TT TT TT regex JAVA autom. search filter. tot. 1000 1.71 0.05 0.08 1.84 8.48 3000 1.74 0.06 0.23 2.03 18.82 5000 1.75 0.06 0.40 2.21 29.13 7000 1.73 0.06 0.54 2.33 41.57 10000 1.76 0.06 0.77 2.59 58.72 Table 2: Comparison of retrieval times (in msec) using our method and using the Regex Java regular expression processor in a scalability test on type 2 queries (long delays) DB dimension TT TT TT TT regex JAVA autom. search filter. tot. 1000 3.57 0.08 0.11 3.76 11.90 3000 3.66 0.08 0.31 4.05 18.80 5000 3.63 0.09 0.54 4.26 31.13 7000 3.67 0.09 0.74 4.50 46.23 10000 3.59 0.09 1.02 4.70 60.13 As it can be observed, our method always outperformed Regex. The time of both methods grew as the database dimension increased, but the growth of our method was very limited, referring to both query types (as shown in Figure 3, which plots the values of the two Tables). In both methods, type 2 queries took more time than type 1 queries. This was expected. In particular, as regards our method, the non-deterministic automaton corresponding to type 2 queries has a larger number of states than the one corresponding to type 1 queries (built using Berry and Sethi approach (Berry & Sethi, 1986)). Thus, the transformation cost from the non-deterministic automaton to the deterministic automaton increases too, since this cost depends on the number of states of the non-deterministic automaton. Looking at our method in more detail, referring to both query types, as expected, the cost of building the automaton for queries with a given length of delays was almost constant. The same also holds for searching time, since tree search navigates the index, whose dimension increased only slightly with respect to the size of the dataset. On the other hand, not surprisingly, filtering time (which depends on the number of the retrieved 21 Figure 3: Scalability results. traces) grew linearly with respect to the size of the dataset. However, the whole computation was dominated by the cost of building the automaton. Specifically, automaton generation took from 70 to 95% of the total query answering time (see Tables 1 and 2). On the other hand, the Regex approach does not exploit any index, so that each trace in the dataset is directly checked. As a consequence, the time complexity of Regex grew linearly with respect to the dimension of the dataset. In conclusion, the approach described in this paper can answer the queries in significantly shorter times with respect to classical approaches, and the advantage increases as the database grows. 4. Related work Operational support is typically not provided by commercial Business Intelligence and Business Process Management tools. However, operational support techniques are implemented in the open source framework ProM 22 (van Dongen et al., 2005), developed at the Eindhoven University of Tech- nology, which represents the state of the art in process mining research. In ProM, prediction and recommendation are typically supported by replaying log traces on the transition system (van Dongen, 2007), a state-based model that explicitly shows the states a process can be in, and all possible tran- sitions between these states. The replay activity allows to calculate, e.g., the mean time to completion from a given state, or the most probable next action to be executed. In ProM’s approach, statistics on event log traces are thus used for operational support, but the overall technique is very different from the one we propose in this paper, and no trace retrieval on the basis of complex patterns search is supported. On the other hand, traces have been recently considered in the CBR literature, as sources for retrieving and reusing user’s experience. As an ex- ample, at the International Conference on CBR in 2012, a specific workshop was devoted to this topic (see Floyd et al. (2012)). In 2013, Cordier et al. (2013) proposed trace-based reasoning, a CBR approach where cases are not explicitly stored in a library, but are implic- itly recorded as “episodes” within traces. The elaboration step, in which a case is extracted from a trace, is thus one of the most challenging parts of the reasoning process. Zarka et al. (2013) extended that work, and defined a similarity measure to compare episodes extracted from traces. In these works, traces are typically intended as observations captured from users’ in- teraction with a computer system. Trace-based reasoning was exploited in recommender systems (Adomavicius et al., 2011; Zarka et al., 2012), and to support the annotation of digitalized cultural heritage documents in Doumat et al. (2010). Huang et al. (2013) and Montani & Leonardi (2014) propose two trace retrieval approaches, where different metrics, based on extensions of the edit distance, are exploited. No indexing strategy is however provided to make retrieval faster. Leake (2010) used execution traces recording provenance information to improve reasoning and explanation in CBR. In the Phala system (Leake & Kendall-Morwick, 2008), the authors supported the generation and composi- tion of scientific workflows by mining execution traces for recommendations to aid workflow authors. Finally, Lanz et al. (2010) used annotated traces recorded when a human user played video games in order to feed a case-based planner. All these approaches implement forms of reasoning on traces, but do not aim at providing operational support. 23 There is a growing interest on business process management applications in the CBR community (see, e.g., (Minor et al., 2014b)). Most works, how- ever, focus on process models rather than on traces, and aim at retrieving models (typically represented as graphs) similar to an input one. These works have required the introduction of different metrics for graph compari- son. Most of them are based on extensions of the graph edit distance (Minor et al., 2008; Bergmann & Gil, 2014; Kunze & Weske, 2011; Li et al., 2008; Montani et al., 2015; Dijkman et al., 2009; LaRosa et al., 2013). The work in Minor et al. (2008) makes use of a normalized version of the graph edit distance. The approach is used to support workflow modification in an agile workflow system, and takes into account control flow information as well as activity information. However, the work is limited to considering (small) changes with respect to a running process instance. The work in Dijk- man et al. (2009) also provides a normalized version of the graph edit distance for comparing business process models, and defines syntactical edit opera- tion costs for activity node substitution, activity node insertion/deletion, and edge insertion/deletion. The work in LaRosa et al. (2013) extends the work in Dijkman et al. (2009) by explicitly representing gateway nodes, in order to describe, e.g., parallelism and mutual exclusion, and exploiting them in distance calculation. The work in Kunze & Weske (2011) relies on graph edit distance, and exploits string edit distance on node names to determine the cost of node substitutions. The work in Li et al. (2008) encapsulates a set of edit operations into the so-called “high-level change operations”, and measures distance on the basis of the number of high-level change operations needed to transform one graph into another. All the above contributions typically make use of syntactical information in the definition of the edit operation costs. The works in Bergmann & Gil (2014); Montani et al. (2015), on the other hand, exploit semantic informa- tion in activity comparison. In Bergmann & Gil (2014), a system working on workflows represented as semantically labeled graphs is presented. The paper proposes to use a metric in which the similarity between two mapped nodes or arcs makes explicit use of their semantic description. The work is particularly focused on the data flow. The work in Montani et al. (2015) ex- ploits domain knowledge and temporal information to parametrize the cost of node substitutions and edge substitution. As in LaRosa et al. (2013), gateway nodes are also considered. The approach in Goderis et al. (2006), on the other hand, affords the problem of graph comparison by relying on graph isomorphism. It focuses 24 on scientific workflows, which have a strong focus on the data flow, typically restricting the control flow to a partial ordering of the tasks. The work in Ma et al. (2014) focuses on data oriented workflows as well. It defines a formal structure called Time Dependency Graph (TDG), and exploits it as a rep- resentation model of data oriented workflows with variable time constraints. A distance measure is proposed for computing workflow similarity by their normalization matrices, established on the basis on their TDGs. The work in Kapetanakis et al. (2010) exploits a maximum common subgraph approach for similarity-based process retrieval, in a retrieval sys- tem for supporting business process monitoring. Interestingly, the metric in Kapetanakis et al. (2010) takes into account temporal information, since it combines a contribution related to activity similarity, and a contribution related to delays between activities. Finally, in Madhusudan et al. (2004) a retrieval system for supporting in- cremental workflow modeling is presented. The system proposes a similarity- based retrieval of workflow templates using a planner that employs an inexact graph matching algorithm based on similarity flooding. For computing simi- larities, the algorithm relies on the idea that elements of two distinct graphs are similar, when their adjacent elements are similar. The algorithm prop- agates the similarity from a node to its respective neighbors based on the topology in the two graphs. It is worth noting that all of these works, including our own previous contribution (Montani et al., 2015), are however only loosely related to the present paper, since they do not focus on traces, but on process models (i.e., graphs), and do not aim at enabling operational support. Very interestingly, some recent papers in the area of CBR for business process management (Minor et al., 2011, 2014a; Müller & Bergmann, 2014) also consider the reuse/adaptation step of the CBR cycle (Aamodt & Plaza, 1994). An automation of the adaptation step in process model retrieval is easier in very specific domains, such as the one of cooking recipes (Müller & Bergmann, 2014). In this setting, the graph structure is usually quite simple, and adaptation is often limited to ingredient substitution. However, the issue of implementing a reuse/adaptation strategy in trace retrieval may be investigated in our future research as well. 25 5. Conclusions In this paper, we have introduced a novel framework for trace retrieval, studied to implement operational support tasks in a flexible and efficient way. With respect to existing operational support facilities, our tool is signif- icantly innovative, since, to the best of our knowledge, it is the first one to provide a direct exploitation of experiential knowledge, by means of case- based retrieval techniques. Once past traces have been retrieved, classical statistical techniques can then be applied to them, to support prediction and recommendation. With respect to existing trace retrieval tools in the CBR area, our ap- proach is more efficient and flexible, since: • by allowing for the use of (imprecise) delays in the query language, it enables to express a very large number of explicit queries in a compact way; • by providing the trace tree as an input to the automaton: – it executes common prefixes of different traces only once on the automaton, avoiding repeated, identical checks; – it speeds up retrieval with respect to a flat search into the event log (as testified by our experiments). In the future, we would like to enable also the retrieval of traces that include a prefix similar (but not necessarily identical) to the input trace. To this end, we plan to rely on classical techniques, involving a visit of the trace tree, and a comparison between the trace tree branch at hand and the input trace by means of edit distance calculation. This approach, while quite straightforward from the methodological viewpoint, will further increase the flexibility and the usability of our tool. Finally, we plan to extensively test the overall framework on real world traces, which log the actions executed during stroke patient management in a set of Northern Italy hospitals. der Aalst, W. V. (2011). Process Mining. Discovery, Conformance and En- hancement of Business Processes. Springer. 26 Aamodt, A., & Plaza, E. (1994). Case-based reasoning: foundational issues, methodological variations and systems approaches. AI Communications, 7 , 39–59. Adomavicius, G., Mobasher, B., Ricci, F., & Tuzhilin, A. (2011). Context- aware recommender systems. AI Magazine, 32 , 67–80. Baeza-Yates, R. A., & Gonnet, G. H. (1996). Fast text searching for regular expressions or automaton searching on tries. J. ACM , 43, 915–936. Bergmann, R., & Gil, Y. (2014). Similarity assessment and efficient retrieval of semantic workflows. Information Systems, 40, 115–127. Berry, G., & Sethi, R. (1986). From regular expressions to deterministic automata. Theor. Comput. Sci., 48, 117–126. Buijs, J., van Dongen, B., & van der Aalst, W. (2012). On the role of fitness, precision, generalization and simplicity in process discovery. In On the Move to Meaningful Internet Systems: OTM 2012 (pp. 305–322). Springer. Canensi, L., Montani, S., Leonardi, G., & Terenziani, P. (2014). Chap- man: a context aware process miner. In Proc. Workshop on Synergies be- tween Case-Based Reasoning and Data Mining, International Conference on Case Based Reasoning (ICCBR). Cnudde, S. D., Claes, J., & Poels, G. (2014). Improving the quality of the heuristics miner in prom 6.2. Expert Syst. Appl., 41 , 7678–7690. Cordier, A., Lefevre, M., Champin, P., Georgeon, O., & Mille, A. (2013). Trace-based reasoning - modeling interaction traces for reasoning on expe- riences. In C. Boonthum-Denecke, & G. M. Youngblood (Eds.), Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2013, St. Pete Beach, Florida. May 22-24, 2013.. AAAI Press. Dijkman, R., Dumas, M., & Garca-Banuelos, R. (2009). Graph matching al- gorithms for business process model similarity search. In U. Dayal, J. Eder, J. Koehler, & H. Reijers (Eds.), Proc. International Conference on Busi- ness Process Management (pp. 48–63). volume 5701 of Lecture Notes in Computer Science. 27 van Dongen, B. (2007). An Iterative Algorithm for Applying the Theory of Regions in Process Mining. BETA publicaties: Preprints. Beta, Research School for Operations Management and Logistics. van Dongen, B., De Medeiros, A. A., Verbeek, H., Weijters, A., & der Aalst, W. V. (2005). The proM framework: a new era in process mining tool support. In G. Ciardo, & P. Darondeau (Eds.), Knowledge Mangement and its Integrative Elements (pp. 444–454). Springer, Berlin. Doumat, R., Egyed-Zsigmond, E., & Pinon, J. (2010). User trace-based rec- ommendation system for a digital archive. In I. Bichindaritz, & S. Montani (Eds.), Case-Based Reasoning. Research and Development, 18th Interna- tional Conference on Case-Based Reasoning, ICCBR 2010, Alessandria, Italy, July 19-22, 2010. Proceedings (pp. 360–374). Springer volume 6176 of Lecture Notes in Computer Science. Floyd, M. W., Fuchs, B., Gonzalez-Calero, P., Leake, D., Ontanon, S., Plaza, E., & Rubin, J. (2012). TRUE: Traces for Reusing Users Experiences Cases, Episodes, and Stories, International Conference on Case Based Reasoning (ICCBR). Lyon. Goderis, A., Li, P., & Goble, C. A. (2006). Workflow discovery: the problem, a case study from e-science and a graph-based solution. In F. Leymann, & L. Zhang (Eds.), Proc. IEEE International Conference on Web Services (pp. 312–319). IEEE, USA. Huang, Z., Juarez, J. M., Duan, H., & Li, H. (2013). Length of stay prediction for clinical treatment process using temporal similarity. Expert Syst. Appl., 40, 6330–6339. Kapetanakis, S., Petridis, M., Knight, B., Ma, J., & Bacon, L. (2010). A case based reasoning approach for the monitoring of business workflows. In I. Bichindaritz, & S. Montani (Eds.), Proc. International Conference on Case Based Reasoning (ICCBR) (pp. 390–405). Springer, Berlin volume 6176 of Lecture Notes in Computer Science. Kunze, M., & Weske, M. (2011). Metric trees for efficient similarity search in large process model repositories. In M. Muehlen, & J. Su (Eds.), Proc. Business Process Management Workshops (pp. 535–546). Springer, Berlin volume 66 of Lecture Notes in Business Information Processing. 28 Lam, M. S., Aho, A. V., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools. Addison-Wesley. Lanz, A., Weber, B., & Reichert, M. (2010). Workflow time patterns for process-aware information systems. In Proc. BMMDS/EMMSAD (pp. 94– 107). LaRosa, M., Dumas, M., Uba, R., & Dijkman, R. (2013). Business process model merging: An approach to business process consolidation. ACM Trans. Softw. Eng. Methodol., 22, 11. Leake, D. B. (2010). Case-based reasoning tomorrow: Provenance, the web, and cases in the future of intelligent information processing. In Z. Shi, S. Vadera, A. Aamodt, & D. B. Leake (Eds.), Intelligent Information Pro- cessing V - 6th IFIP TC 12 International Conference, IIP 2010, Manch- ester, UK, October 13-16, 2010. Proceedings (p. 1). Springer volume 340 of IFIP Advances in Information and Communication Technology. Leake, D. B., & Kendall-Morwick, J. (2008). Towards case-based support for e-science workflow generation by mining provenance. In K. Althoff, R. Bergmann, M. Minor, & A. Hanft (Eds.), Proc. ECCBR 2008, Advances in Case-Based Reasoning (pp. 269–283). Springer volume 5239 of Lecture Notes in Computer Science. van Leeuwen, J. (1994). Handbook of Theoretical Computer Science. Mit Press. Li, C., Reichert, M., & Wombacher, A. (2008). On measuring process model similarity based on high-level change operations. In Q. Li, S. Spaccapietra, E. S. K. Yu, & A. Olivé (Eds.), Proc. International Conference on Con- ceptual Modeling (pp. 248–264). Springer, Berlin volume 5231 of Lecture Notes in Computer Science. Ma, Y., Zhang, X., & Lu, K. (2014). A graph distance based metric for data oriented workflow retrieval with variable time constraints. Expert Syst. Appl., 41 , 1377–1388. Madhusudan, T., Zhao, J., & Marshall, B. (2004). A case-based reasoning framework for workflow model management. Data and Knowledge Engi- neering, 50 , 87–115. 29 Minor, M., Bergmann, R., & Görg, S. (2014a). Case-based adaptation of workflows. Inf. Syst., 40, 142–152. Minor, M., Bergmann, R., Görg, S., & Walter, K. (2011). Reasoning on busi- ness processes to support change reuse. In B. Hofreiter, E. Dubois, K. Lin, T. Setzer, C. Godart, E. Proper, & L. Bodenstaff (Eds.), 13th IEEE Con- ference on Commerce and Enterprise Computing, CEC 2011, Luxembourg- Kirchberg, Luxembourg, September 5-7, 2011 (pp. 18–25). IEEE Computer Society. Minor, M., Montani, S., & Recio-Garćıa, J. (2014b). Process-oriented case- based reasoning. Inf. Syst., 40 , 103–105. Minor, M., Tartakovski, A., Schmalen, D., & Bergmann, R. (2008). Agile workflow technology and case-based change reuse for long-term processes. International Journal of Intelligent Information Technologies, 4 , 80–98. Montani, S., & Leonardi, G. (2014). Retrieval and clustering for supporting business process adjustment and analysis. Information Systems, 40 , 128– 141. Montani, S., Leonardi, G., Quaglini, S., Cavallini, A., & Micieli, G. (2015). A knowledge-intensive approach to process similarity calculation. Expert Syst. Appl., 42, 4207–4215. Müller, G., & Bergmann, R. (2014). Workflow streams: A means for composi- tional adaptation in process-oriented CBR. In L. Lamontagne, & E. Plaza (Eds.), Case-Based Reasoning Research and Development - 22nd Inter- national Conference, ICCBR 2014, Cork, Ireland, September 29, 2014 - October 1, 2014. Proceedings (pp. 315–329). Springer volume 8765 of Lec- ture Notes in Computer Science. Zarka, R., Cordier, A., Egyed-Zsigmond, E., Lamontagne, L., & Mille, A. (2013). Similarity measures to compare episodes in modeled traces. In S. J. Delany, & S. Ontañón (Eds.), Case-Based Reasoning Research and Devel- opment - 21st International Conference, ICCBR 2013, Saratoga Springs, NY, USA, July 8-11, 2013. Proceedings (pp. 358–372). Springer volume 7969 of Lecture Notes in Computer Science. Zarka, R., Cordier, A., Egyed-Zsigmond, E., & Mille, A. (2012). Contextual trace-based video recommendations. In A. Mille, F. L. Gandon, J. Misselis, 30 M. Rabinovich, & S. Staab (Eds.), Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012 (Companion Volume) (pp. 751–754). ACM. 31