key: cord-0508071-0cdym4ji
authors: Christino, Leonardo; Paulovich, Fernando V.
title: From Data to Knowledge Graphs: A Multi-Layered Method to Model User's Visual Analytics Workflow for Analytical Purposes
date: 2022-04-01
journal: nan
DOI: nan
sha: bdce5687710320597f2760f17c25b6bc9ecd20a0
doc_id: 508071
cord_uid: 0cdym4ji

The importance of knowledge generation drives much of Visual Analytics (VA). User-tracking and behavior graphs have shown the value of understanding users' knowledge generation while performing VA workflows. Works in theoretical models, ontologies, and provenance analysis have greatly described means to structure and understand the connection between knowledge generation and VA workflows. Yet, two concepts are typically intermixed: the temporal aspect, which indicates sequences of events, and the atemporal aspect, which indicates the workflow state space. In works where these concepts are separated, they do not discuss how to analyze the recorded user's knowledge gathering process when compared to the VA workflow itself. This paper presents Visual Analytic Knowledge Graph (VAKG), a conceptual framework that generalizes existing knowledge models and ontologies by focusing on how humans relate to computer processes temporally and how it relates to the workflow's state space. Our proposal structures this relationship as a 4-way temporal knowledge graph with specific emphasis on modeling the human and computer aspect of VA as separate but interconnected graphs for, among others, analytical purposes. We compare VAKG with relevant literature to show that VAKG's contribution allows VA applications to use it as a provenance model and a state space graph, allowing for analytics of domain-specific processes, usage patterns, and users' knowledge gain performance. We also interviewed two domain experts to check, in the wild, whether real practice and our contributions are aligned.

Visual Analytics (VA) systems provide ways for users to harness insights and knowledge from datasets [39] . This ubiquity has given VA enormous interest in tracking and analyzing user behavior to understand how it generates knowledge [51] . To model user knowledge and its relationship to VA systems, researchers have been using from the very beginning the knowledge generation model of Sacha et al. [39] as a guide. From there, the development done by the VA literature can be described as two-fold. Some research focuses on the conceptual knowledge modeling aspect of VA, describing the knowledge gain workflow as an iterative process between users and computers [20] . Others describe VA as taxonomy models [12] , such as relationships between its different aspects [2] and interactivity [47] , or VA ontological work- [12, 40] .

Research within VA has also shown that by tracking users' behavior and interactivity over time, many insights regarding the VA workflow, the VA system, and the users themselves can be done through behavioral and provenance analysis [5, 26] . However, these behavioral analyses do not yet fully use the previously listed knowledge models as a unified architecture to track and analyze users performing VA workflows. For instance, Sacha et al. [39] describes the conceptual relationship between Computer and Human, but how would one store the ongoing interactions within a graph structure [3] ? Or, for what would one use this graph structure for behavior analysis? Although knowledge-related taxonomies, typologies, and ontologies are constantly referenced by VA research [7, 20, 37, 39, 40] , the knowledge model as depicted in Fig. 1(A) is typically used as an underlying theoretical concept by researchers, leaving the work of putting it into practice to domain experts. Furthermore, there is a conceptual difference between the literature about workflow ontologies and the literature on modeling VA as a knowledge model [12] . That is, the ontology research focuses on a taxonomy structure to describe the task sequence of VA workflows [40, 42] independent of the separation between Human and Computer processes, yet the knowledge models describe the iterative knowledge generation workflow between Human and Computer [20, 39] without describing ways to perform behavior tracking or provenance analysis.

While the two have certain overlaps, these are mostly treated as limitations by their authors [48, 51] . Despite the significant impact and popularity of systems created using such works, the lack of a formal ontology that singlehandedly captures and compares a VA system's workflow to how and with what reason its users transform data into knowledge is a missed opportunity. Even works which pairs ontologies with Knowledge Graphs (KGs) for analytical purposes [5, 8, 22] can't analyse the VA workflow in its entirety. In other words, even though we have access to literature with well-built models and applications, no methodology provides a way to simultaneously understand on a per-user basis how past knowledge aided in generating new knowledge, to identify the workflow which lead to such knowledge, to compare multiple user's intentions and behavior patterns, to relate them to a VA workflow's state space, to generate the full user and computer interactivity patterns during the workflow and which allows its further customization and analysis.

In light of this, we propose Visual Analytic Knowledge Graph (VAKG), a multi-layer ontology architecture based on Temporal Knowledge Graphs (TKG) [23] . VAKG proposes to structure the knowledge gathering process within the Visual Analytics (VA) life-cycle into a set of linked knowledge graphs. For this, VAKG describes a model which systematically stores, describes, and enables analysis of the tasks done by a user and their generated knowledge during any given VA workflow ( Fig. 1 ). As our core contribution, VAKG introduces the concept of a multi-lane TKG model ( Fig. 1(B) ), which unfolds the VA typology states over time while keeping the separation between Human and Computer concepts and between the sequential and the state space aspects of a VA workflow, resulting in a knowledge graph ( Fig. 1(C) ) which provides an architecture for VA systems to structure, store and link users' knowledge generation workflows, all computer-side changes over time and the relationship between users' reasoning and the executed workflow. Additionally, VAKG's multi-layer approach allows all parties to attach existing models to VAKG as sub-components, including graph analysis techniques. Although VAKG seems to compete with current knowledge models and knowledge graphs within VA, our purpose is not to replace existing models but to describe an architecture where such models can overlap and communicate through our multi-layered knowledge graph architecture since it allows for a flexible definition of sub-ontologies to be used within VAKG.

The remainder of the paper is structured as follows. In Sec. 2, we discuss related work involving techniques that seek to formalize the VA knowledge flow as a concrete framework, usages of knowledge graphs within VA, including how they differ from VAKG, and other concepts which tackle the ongoing knowledge evolution during data analysis. In Sec. 3, we extend the existing works of the theoretical knowledge model of VA to formalize VAKG. In Sec. 5, we present possible applications of VAKG while comparing with existing methods and justifications for further extending VAKG. We also illustrate the potential impact of using VAKG by interviewing two experts: one in the food-business field who routinely performs manual financial data analysis for decision making and another who manages a fraud detection and credit analysis area within a large bank in Sec. 6. Finally, in Sec. 7 we discuss VAKG current limitations and the next steps within our research plan and in Sec. 8 we draw our conclusions.

This section presents a general view of knowledge models and taxonomies, including existing literature in VA ontologies, data provenance, user-interactivity tracking, behavior analysis, and intersecting works that use temporal knowledge graphs, and compares them to our objectives. After presenting VAKG, we discuss in Sec. 5 the connections of these works with VAKG itself.

Knowledge and Feedback. The literature in knowledge modeling methodology defines the existence of a flow of insight and knowledge generation caused by the user's interactivity with the computer (Fig. 1(A) ). Since the origin of VA, significant work has been done to demonstrate the breadth and depth of knowledge within VA [39] .

Federico et al. [20] , for instance, describe in great detail many of such works and also define ways to interpret VA research in light of the knowledge model. These works may differ when describing VA's subcomponents, such as data mining or visualization, but common to them is VA's purpose of creating insight or knowledge at the end of some workflow [11, 20] , which is done through interactivity between the user and computer. Our work does not try to redefine any of these concepts but borrows the ones used by most [11, 20, 37] , translated, in general words, as pieces of concrete information inferred by experience or learned from an external source. On the other hand, these theoretical works [20, 37, 39] do not discuss how such models can be used to record and analyze the user's knowledge gathering process itself.

VA's knowledge model research has tackled the problem of Knowledge Gathering, that is, tracking the user's knowledge gain as a temporal sequence of insights or findings, in many different ways. However, knowledge gathering within these works and systems is seen only as a conceptual model. Federico et al. [20] lists many systems where a notable example is a work by Keim et al. [30] which creates an application-specific knowledge gathering process by utilizing automated analysis with human interaction. Furthermore, Federico et al. [20] also argues that since this knowledge-gathering loop is conceptual, it is "often inconsistently used", which shows a limitation of when this conceptual model is applied in practice.

Federico et al. [20] describe, however, a very well built taxonomy where we can see how the subsequent interactions and feedbacks between the user and the computer are related. For instance, they describe how automatic processes in data mining can generate new visualizations or how machine learning can help the user understand the data itself. Such examples are also expanded into their ontological design, such as how data provenance [18] , data mining [40] , ML explainability [44] , and the visualization, which shows how their results relate to each other. However, none of these works compares the temporal aspect of knowledge gathering with the workflow's space state. With VAKG, we attempt to fill this gap by designing an ontology that defines temporal sequences of knowledge gathering per user and explicitly links these sequences with an atemporal state space of the feedbacks users may receive from a VA workflow.

Interactivity Provenance. To better understand the concept and applicability of knowledge gathering, significant research has been done regarding knowledge provenance, which is an approach to model the knowledge gathering process as a graph network. For instance, many works within Semantic Web discuss ways to understand or structure knowledge provenance within the Web [19] .

Knowledge provenance is extracted from Data Provenance, describing the sequence of changes datasets may have had over time. With provenance, the tracking of knowledge can be modeled as a change in datasets [18] , updates in visualizations [5, 51] and extracted through graph network analysis [22] . Among such works, Chang et al. [9] attempt to use visual analysis within a Knowledge Base system, storing knowledge extracted from experts into a "compressed" format using a Knowledge Database. These works show examples of applying provenance to understand the knowledge gathering process. Still, although they describe ways to link knowledge gathering to user interactions, it is rare to see a differentiation between the historical sequences of user-generated events and the atemporal state space of the VA workflow. In other words, two concepts are normally merged in these works: the temporal aspect, which indicates what and when users executed VA tasks, and the atemporal aspect, which indicates what are the possible VA workflow states and how they transition between each other.

Beyond Models: Tracking and Analysis. User-tracking and behavior analysis research has also been active. The user-tracking taxonomy of von Landesberger et al. [47] models user behavior as a graph for analytical purposes. However, since these tools do not integrate directly with a knowledge model but only focus on solving a specific domain problem, the integration between tools and the existing knowledge models is rarely addressed. This causes issues when discussing the usage of these models as an ontology to store user-tracking data in a knowledge graph [21] , explore it [8] and perform analysis over it. For instance, the interactivity taxonomy of von Landesberger et al. [47] cannot be used within the descriptive ontology of Sacha et al. [40] . They also do not discuss how multi-user VA workflows connect to Battle and Heer [5] , nor allow for comparisons between the user's exploratory space when compared to their motifs [51] . VAKG solves this by a multi-layer architecture, allowing for differing taxonomies and ontologies to live and be analyzed as separate sub-components of the graph network.

On the other hand, existing VA systems use these taxonomies as a theoretical or conceptual background while using the user-tracking data only for specific domain use-cases. For instance, user's Tacit Knowledge, as defined by Federico et al. [20] , is tracked in VA by many different feedback methods, such as manual feedback systems [6, 33] , manual annotations over visualizations [43] , and inference methods that attempt to discover the user's insights by analyzing their interactivity patterns [5] . Nevertheless, they do not provide the user-tracking data as a solution to the bigger knowledge provenance problem, only to their specific domain's goals. Among these VA systems, InsideInsights [6] is the only one that gets close to addressing this limitation due to its unique and well-built approach to recording insights through annotations during the user's analytical process. However, the proposed solution is not able to track automatically-generated insights [44] nor account for automatic computer processes [20] . VAKG includes all these aspects in its ontology.

Additionally, though VA systems typically attempt to apply visualization and data mining processes to solve specific problems within a given domain, the research over domain knowledge and technical knowledge concepts usually only produces conceptual models [12] . Such conceptual research is instrumental for the field. However, the knowledge needed to build a VA system or tool (e.g., domain and technical knowledge) and the user knowledge model of Sacha et al. [39] are ontologically disconnected [12] . On the other side, research of ontologies that model VA in a graph form [11, 11, 40, 46, 48] does not solve the knowledge provenance problem. Therefore, although the current knowledge model framework described by Sacha et al. [39] and Federico et al. [20] are instrumental to understanding how current VA systems produce knowledge, it by itself is insufficient to provide insights into the ongoing aggregation of knowledge throughout a VA task and, consequently, is unable to be used as the means to store the tool's usage and user's behavior and knowledge gain. VAKG attempts to bridge this by providing a flexible architecture where other's theoretical work can be incorporated into our ontology, and by exposing it as a knowledge graph, VA systems may be able to analyze their own knowledge gathering process.

Knowledge Graphs. Ontological design of knowledge has been a central goal of the areas of Knowledge Bases and Knowledge Graphs (KGs) [21] . KG [13] is a widely used technique to store knowledge in a structured format and has shown many successful methods of how to interpret, store and query explicit knowledge, such as the information within Wikipedia [4] . KGs expand knowledge models' ontology by defining a specific way of structuring knowledge as relationships through a graph-based framework, which allows the ontologies discussed so far to use KGs as a means to store information within a graph database, such as neo4j [36] . Compared to typical databases, the structure of KGs focuses less on the usual transactional or row-based structure [8] . It focuses more on the types of knowledge properties and relationships between these properties. The KG ontology usually uses the OWL format [11, 40, 48] . However, one difference is in its focus to describe the Classes and Relationships in light of the expected use-case of the database.

KGs have many other parallels and connections to the overall graph research, including Graph Neural Networks (GNNs) [28] , graph visualizations [10, 25] and graph operations [27] , such as page rank and traveling salesman. For our use-case, a notable sub-type of KGs is the Temporal Knowledge Graph (TKG), whose ontology is the temporal relationship of the data such as "order of events" or "time difference between events" [23] . This technique creates a graph-based timeline of knowledge evolution. Different from the existing works, VAKG ontology is designed to focus on the analytical use-case of sequence analysis [51] , which can be done through TKGs and graph-theory abstractions of temporal graphs.

To arrive at VAKG's goals and ontology, we first formalize and describe the Visual Analytics (VA) knowledge model framework [20, 39] and its role in VA ontologies [40, 51] . Then, following the example of related works, we use this foothold to unfold the Human-Computer loop into a temporal sequence of states, as is shown in Fig. 1(B) . We then discuss VAKG goals in light of related work limitations, and, to achieve these goals, we expand the ontology and use a Temporal Knowledge Graph (TKG) structure to allow VAKG to store and analyze the user's knowledge gain process.

The theoretical background of VA's knowledge model is a foundation work for most, if not all, research within VA. The rather simplistic representation of this model shown in Fig. 1 (A) characterizes its two main actors: Humans and Computers. From Sacha et al. [39] , VA's cyclical process of human-computer interactions and feedback loops describes that knowledge is generated over time. Federico et al. [20] expand these concepts by describing the inner taxonomies of both Human and Computer sides and further describing that depending on the system and use-case, different styles of interaction loops can be described through these inner taxonomies. They also propose a mathematical interpretation called "Conceptual Model of Knowledge-Assisted VA" to describe a base knowledge model, which is then used t construct many derived models, such as knowledge generation, conversion, internalization, externalization, and exploitation. For our purposes, the essential information extracted from these works is that VA workflows have many well-defined inner taxonomies, which may vary depending on the application and use case. Still, there always exists an overarching human-computer interactivity loop.

As discussed previously, such theoretical models are also applied in practice either by using the theory as an inspiration for design guidelines or through formal ontologies. For instance, Sacha et al. [40] describes an ontology of the knowledge model as what they call a "diamond feedback loop", and Battle and Heer [5] uses the knowledge model as an inspiration to record user behavior as a graph network for analysis. In all cases, the interactivity loop between Human and Computer is always present.

To better understand VAKG's goals, we extract the underlying ontology of the knowledge model from these existing works following the Web Ontology Language (OWL) [12] . By pairing the temporal aspect of the human-computer interactivity loop from knowledge models [20, 39] and the "diamond rule" of Sacha et al. [40] , we define the sequences of State classes (green circles) with two main sub-types: Human States (red outline) and Computer States (black outline), as is seen in Fig. 1 

Next, we incorporate the temporal aspect into the ontology. For this, we follow the current conceptual work of defining and characterizing VA workflows (see 2). Although many specific ontologies of relationships have been designed recently, we first return to the roots of VA and focus our macro design on the temporal relationship between the two aforementioned classes as the design core of VAKG. Formally speaking, we define two relationships that connect the aforementioned two state types: interact-Human-Computer and feedback-Computer-Human, similar to [40, 42] , as is seen by the red-dashed and black-dashed lines of Fig. 1(B) . These relationships are analogous to has-IO-Entity-Successor, and has-Process-Successor of Sacha et al. [40] and, just as they argue, this allows for "directed connections explicitly define the predecessor-successor and action-actor relationships within any workflow". Therefore, by just extracting the descriptors from existing works [20, 40] , VAKG ontology should include:

• Human Class (H): All human-related information and changes, such as tacit knowledge, newly gained insights and findings, perception/cognition of the user, user's will or wishes while performing any task, any questions or goals the user may have and all demographic information on the user.

• Computer Class (C): All computer-related information and changes, such as datasets, metadata, visual and interface state at a given time, machine learning state, automated processes, system specifications, system configuration, and available explicit knowledge.

• Relationship interact-Human-Computer: What and how has the user interacted or changed within the interface (Computer) and why has the user interacted with the interface (Computer) at a given time.

• Relationship feedback-Computer-Human: How was the state of the Human prior to the changes in the interface at a given time. So far, we have defined what already exists in the literature [20, 40] . However, as we have discussed in Sec. 2, the related works have an ambiguous definition of the two classes and two relationships: whether the knowledge gathering process is to be considered temporal or not. A temporal process is a provenance workflow that considers the sequences of interact-feedback loops as sequences of events over time, which can then be modeled through taxonomies [20, 37, 39, 47] or systematically used for tracking or analysis [5, 6, 22, 51] . On the other hand, an atemporal process would instead consider the state space of the VA workflow.

We consider a state space a set of some or all possible configurations of a system or entity and all the immediate adjacency relationships between such configurations. For example, a state space can be a graph network with a node for every possible state a VA tool can be and a link between every two nodes that are a single event or interaction away from each other. Interpreting the ontology described above as a state-space has been considered in some examples of theoretical works [40, 48] and analysis-focused works [5, 15] . However, even when both temporal and atemporal aspects are used, it is ambiguous or uncertain how the two are differentiated and whether they are connected in any way, as we have discussed in Sec. 2. Furthermore, the inner elements of the Human side versus the Computer side of VA are not always separated within these works, which creates further ambiguity. For instance, while theoretical works [20, 39] clearly separate the two in their models, works describing VA ontologies [7, 40] , user-tracking [6] and behavior analysis [5, 51] usually merge the two sides or ignore one side completely.

With this, we have our core goal stated: VAKG is an architectural framework designed with ontologies that formally models the temporal sequences and the state space of both the Computer and Human sides of a VA workflow, and when used to record VA workflow sessions, it directly allows for provenance and state space analysis. Note that VAKG should be understood as an ontology architecture, in the sense that it is not a static ontology but is expected to be expanded and specialized for different use-cases. With that in mind, our objectives while designing VAKG are to: G1: Create an ontology that describes and links all four VA workflow aspects: G2: Architect such ontology so that it can be used to record users executing a VA workflow and allow its analysis. G3: Have the ontology able to optionally incorporate lower-level ontologies and models for specific analytical use-cases.

So far, we have extracted an ontology from the literature [20, 40] and discussed its ambiguities regarding whether the knowledge gathering process is temporal (e.g., provenance) or not (e.g., state space). To solve them, VAKG expands the ontology described above by explicitly separating the concepts of temporal and atemporal relationships. This is done by interpreting the two relationships, namely feedback-Computer-Human (G1.2) and interact-Human-Computer (G1.3), as atemporal relationships and adding two more relationships for temporal connections:

• Relationship update-Computer-Computer: Time-stamped indication of What changed in the interface and why did this change happen (G1.1).

• Relationship insight-Human-Human: Time-stamped indication of what, how and why has the user aggregated as new insights/findings from the new information (G1.4).

Our expanded version of the VA workflow ontology shown in Fig. 1 (B) attempts to reduce, if not eliminate, this ambiguity. The figure shows the two temporal relationships as the light-red and gray diamond arrows. In contrast, the atemporal relationships are the red and black dotted arrows. A parallel mathematical formalization of the same ontology is discussed in the supplementary material.

Therefore, although the basic architecture of VAKG follows existing research, it also attempts to solve existing temporal versus atemporal issues. Even though analysis of the temporal aspect of VA, such as knowledge gathering or data provenance, is at the center of VAKG's goals, the ontology so far does not fully depict a timeline of events separate from a timeless ontology of possible events (G1). For this and for VAKG to also allow for analytical tasks to be done (G2), we propose that the design of VAKG follows existing research on Knowledge Graph (KG) .

A KG is a graph structure where knowledge reasoning is modeled as connections between classes or properties, such as "George Washington is a human" and "Canada is a country". A Temporal Knowledge Graph (TKG), however, models these connections as the temporal relationship between the classes or properties. Many different types of TKGs exist. For each, this temporal connection has a different meaning. For instance, the most used version of TKGs uses time as the connection, where two connected nodes represent two events that co-occurred. An example of such a KG would be all purchases done between different businesses within a supply chain, where the product "Mayonese" may have been bought by a store "Wallmart" from the seller "Hellmann's" on "25/06". In this TKG, the connection between the three nodes: Wallmart, Hellmann's, and Mayonnaise, would be "25/06". To design such structures, one must use Web Ontology Language (OWL) or a similar ontology language [12] . By designing VAKG's ontology with KGs and TKGs in mind, we enable easier ways for both temporal and atemporal visualization and analysis to be done with VAKG through existing works [8, 13, 23, 24, 49] .

VAKG is arguably more complex than the KG examples given so far. When also considering the four different aspects listed within G1, we describe VAKG as a 4-lane KG, with one lane for each of the four aspects of G1. However, our ontology of Fig. 1(B) does not match this expectation. For that, we create a new ontology by enhancing the two temporal relationships into its own class which represents an Update process ( Fig. 1(C) ). We, therefore, modify the relationship between two States of the same type (e.g., Human-Human) to become a new class of type Update and, from this, three new relationships are created: does-State-Update, leads-Update-State, and follows-Update-Update, which forms the 4-lane TKG architecture of Fig. 1(C) . With this, the Human Update inherits the definition from insight-Human-Human, and the Computer Update inherits the one from update-Computer-Computer. A representation of how the classes relate to each other can be seen in Fig. 2 

So far, we have focused on the VAKG structure. Still, an integral part of our proposal is to record users executing a VA workflow and allow its usage for analysis (G2). While the usual way of thinking on ontology design is to focus on OWL classes and their relationships, with VAKG we are also utilizing OWL class properties, sometimes called data properties or, when used within the context of KGs, property-maps. This design pattern uses the idea that every class in OWL can contain inner properties with attached data. The property-map design patterns is, however, interchangeable with the design pattern of pure OWL classes and relationships [35] , which removes any potential limitation on interconnecting our design with other ontologies.

As per our design, each node within VAKG holds a property map of all the node information. Fig. 3 shows a simple example of what this property map is when using VAKG to model an Exploratory Data Analysis (EDA) task. However, how much of the recorded information should be stored in the property map? Although theoretically, one could store all information related to a given state down to the exact bits within the Computer State or a brain scan of a Human State, we understand that it is not reasonable to expect that every VA workflow requires such an amount of information. Therefore, we define as part of VAKG that the property map of a State should, at the very least, uniquely identify that specific State within the state space of VAKG.

Through this definition, we can see that a given Computer or Human States can repeat if the same conditions occur multiple times. For instance, if the user within an EDA task performs a sequence of interactions that returns all the visualizations to an already explored state, the VAKG's Computer States would describe the state transitions that the user took until he returned to that state. Though less common, this property is shared by the Human States, where the user could have had an insight previously, and after further EDA interactions, s/he simply had the same insight once again. This, however, is not a property shared by the Computer or Human Update sequences. By applying the same definition to the two Update classes, their expected property-map would uniquely identify the changes between the two Computer or two Human States, which, for VAKG, would include the timestamp of when the change occurred.

Therefore, it is important to note that the Feedback and Interactivity lanes pf Fig. 1(C) are not temporal because their connection is not temporally dependent but only dependent on the adjacency of their inner property-maps. Nevertheless, we still describe VAKG as a TKG as a whole due to the other two lanes, which are themselves a temporal sequence of events. With this, the example of an EDA task shown in Fig. 3 can be better understood through this 4-lane interpretation of VAKG.

VAKG itself does not strictly design the property-map typology because each use case will have different requirements (G3). For instance, Vis4ml [40] focuses on designing an ontology which describes the workflow of VA-assisted ML. As we discussed before, although Vis4ml achieves very well its own goals, its ambiguity of temporal and atemporal aspects limits its usage when compared to VAKG. However, our design also expects that Vis4ml may use their work to extend VAKG's ontology through, for instance, defining the content of all property-maps within each State. As an example, Vis4ml could specify a property-map which describes the exact state of "Prepare-Data" at a given time, including all statistical profiles, data processing, and annotation. Another possibility is to instead of defining property-maps, they could upgrade the Computer States into their own Vis4ml ontology (G3).

Using an existing ontology or taxonomy map, such as Vis4ml, as a relationship model of the property-map within a Class is interesting in some use cases. For instance, by following the previous example, we know that Vis4ml [40] has a relationship model within all "Prepare-Data" nodes. Therefore, if a single Computer state can define the state space of an entire "Prepare-Data" process, the user of VAKG can choose to expand the property-map into a sub-KG containing a full ontology within. This same concept can be applied to Human or Update States, where complex knowledge models from Federico et al. [20] can be used. Further examples and applications will be discussed in Sec. 5.

The examples of VAKG given so far through Fig. 1 show a new Human update for every new Computer update. In other words, these examples expect users to have a new insight/finding at every single interaction, forcing these Update states to align perfectly. This, however, does not reflect the current literature, as is shown in manual and automatic annotations research [31, 33, 34, 52] . In reality, users may have one intention or insight that causes multiple interactions within the VA tool. After all these interactions, the user finally generates a new piece of knowledge. Users may also have multiple insights or intentions after performing a single interaction. With this in mind, VAKG does not require exact parity between the Computer and the Human timeline. In place of this, VAKG interprets the Update states as the summary of all This loosens the definition of the relationships between the Update classes. More specifically, we extend the definition of the knowledge lane to be related to any number of updates in the tool. Similarly, a single interaction can also be related to any number of new insights of the knowledge lane. This forces VAKG to handle new updates separately between the Computer and Human classes. Still, the resulting KG may be closer to how exactly the users' knowledge gain happened. We define this aspect of VAKG as the summarization or interpolation of Update nodes. Arguably, this also reduces the amount of information within VAKG since when multiple interactivity Update classes link to a single knowledge Update class, the ontology will not retain how each of the interactions uniquely impacted the property-map of the knowledge Update. Although this limits certain aspects of VAKG, as we have discussed before, specific use-cases can upgrade the property-map of a summary Update into an inner ontology which enables further information to be modeled within the TKG.

The highly connected ontology of VAKG allows us to easily contextualize how it can be used to model and analyze VA workflows as sequences of events (G2). VAKG's resulting graph network effectively describes a timeline of multiple users performing VA-related tasks. For instance, the example of Fig. 5 , showing VAKG's ability to store any undo/redo operations or loops. In essence, VAKG allows for storage and analysis of a cohort of users performing specific VA workflows. And with this resulting graph, graph-based analytics can be applied.

To better contextualize VAKG in the field, this section explains the usage of VAKG as a data structure for provenance and behavior analysis. First, we discuss that if one desires to use a more descriptive data structure within VAKG, they may extend VAKG by using external ontologies as sub-components (G3). Next, we discuss how existing methods of graph network analysis can be used to analyze a graph network generated by recording VA workflows with VAKG's ontology (G2).

It is essential to note that VAKG by itself does not attempt to solve the entire knowledge modeling problem but uses an existing VA knowledge model as an architecture foundation to build a TKG for analytical purposes. And for VAKG to be generally applicable in VA, we propose several ways for VAKG to use existing works as potential extensions of parts of our ontology. In short, VAKG incorporates these external works by modeling them as a "Sub-Component". VAKG's entire design is shown in Fig. 2, where we see the focus on the temporal aspect of VAKG and how it can connect to existing works, which can be either through each Class' property-maps, by upgrading Classes or properties into their sub-graphs or by creating new relationships which links VAKG Classes with external ones.

Ontologies as Sub-graphs. VAKG can successfully record and display the unfolded knowledge model of Sacha et al. [39] since we fully model the Human and Computer interactions through States where their respective Updates describe the insight, finding, and/or knowledge aggregated at every step. VAKG can similarly be extended to implement the ontology of many different knowledge models used throughout other publications (G3). For example, just as we have upgraded a relationship to describe the Update process, we can now also upgrade a property of a Class or the entire Class itself to a sub-graph (see Sec. 3.2.3). In the case of Federico et al. [20] , these sub-graphs would be the description of all Computer's and Human's inner components, and the links of these nodes would be new relationships across State and Update Classes.

Dataspaces. Within the traditional model of Fig. 1(A) [39], we see that there already are well-established sub-divisions between the two main states of Computer and Human. First, regarding Data, we discussed the concept of data provenance in Sec. 2, which describes taxonomies and models that aid in tracking data origin and all changes done to a dataset. Even other works, such as Vis4ml [40] go to great lengths to keep the data-mining workflow modeled within their ontology. Although VAKG already gives a way to preserve workflows such as these [18] as property-maps, our model goes one step further. In light of recent research on Dataspaces [17] , the importance of not just tracking changes to a dataset but all interactions and origin within a corpus of interlinked datasets, also known as a dataspace, is clear. Therefore, we expand the Computer State's property-map definition to encompass both intra-dataset operations, such as Vis4ml's "Prepare-Data" [40] , and inter-dataset operations, including the full spectrum of Data Integration operations which, for our architectural design, is generalized and included in the property-map of the Computer State.

Sub-tasks. Another aspect of VAKG is the abstraction of sub-tasks. Following the examples used so far, Vis4ML [40] describes the sub-task of "Prepare-Data" as the data preparation or data-preprocessing task, and it is designed as a sub-graph within their ontology. Other works also do similar abstractions [47] , and so does VAKG. However, unlike all other works, VAKG does not interpret these sub-task sub-graphs as something separated from the main 4-way lane but as a section of the whole graph. This is done again through property-maps, where VAKG expects one property to indicate if any of the existing classes are part of one (or more) of such abstractions. Although this seems to be a simple addition, the impact on the analytical capabilities of VAKG is immense, as we will discuss in Sec. 5.

Graph Networks and Knowledge Graph analytics have been attracting lots of attention. VAKG attempts to leverage them by providing a structure where multiple kinds of graph analysis can be done. To apply analysis to VAKG, one must first execute a VA workflow and record its information following VAKG's ontology. If VA tools use VAKG to model and track users' exploration and insights within VA workflows, the KG generated would contain a large and sprawling graph network available for analysis. For instance, by applying PageRank over VAKG's Computer States, we can discover the state most frequented by users. Similarly, by applying a shortest-path algorithm over the Human Update nodes, we can define the fastest way users were able to discover a particular finding (Fig. 6) . We can then iteratively remove the computer states while using the shortest-path algorithm over the Human Update nodes at every iteration to analyze which computer states were least important for the user cohort to discover that specific finding. In this context, Xu et al. [51] describes very well how one goes through the process of deciding how and why to do such analyses.

There are tools available [8, 24, 27, 36 ] that simplify the process of 6 . VAKG allows graph analysis possibilities. One example is the shortest path analysis through Dijkstra. If two users attempt to reach the same goal through different interactivity paths, VAKG will display the minimum interactivity path. Similarly, suppose two users progressively gain new insights to reach a common new knowledge. In that case, VAKG will display the minimum path of knowledge gain.

using these graph analysis techniques, including graph summarization processes [22] to simplify large graphs, exploration of KGs through VA [8] and KG embedding [50] . To use KG embedding, one must embed the classes and relationships of a KG into a continuous vector space, which allows for further operations over the KG, such as KG completion and relation extraction. These operations typically focus on KG without temporal aspect, so we expect usages of KG embedding to focus on the two-State lanes, namely Computer States and Human States, since these two lanes match the expected structure for such analysis. KG embedding would therefore allow users of VAKG to, for instance, analyze whether the search space of an EDA application was fully explored by its user though KG completion [32] or to verify if different users may have had similar insights while building a machine learning model [40] . Similarly, user behavior analysis [5, 51] would focus on the two Update lanes. Specific examples will be discussed in the next section.

As we know from model design, there are many ways to represent a workflow. We understand that, for particular use-cases, researchers prefer to define their workflow either as descriptively as possible or by following certain well-tested processes within each field. VAKG differs due to the focus on separating the Human and Computer parts of the workflow and their temporal sequences versus their atemporal state space. Though VAKG does not replace an existing ontology by providing a more descriptive ontology, it attempts to allow for better temporal analysis of the generated KG when tackling the process of knowledge gain. Following, we compare a few existing works with VAKG by modeling VA tasks and discussing the differences found.

Our first discussion of potential applications of VAKG will focus on Exploratory Data Analysis (EDA) tools. The existing Knowledge Model framework [20] describes why such EDA tools perform well by providing a platform for users to utilize their tacit knowledge and visualizations to harness and gather new insights over time. As was already noted, most of such tools do not record the interaction of users, and even fewer record users' new insights and knowledge. However, by using VAKG, this gap can be filled. For instance, ExPatt [14] is an EDA tool developed for the analysis of heterogeneous linked data, or more specifically: concomitant analysis of GapMinder data and Wikipedia documents. By following the example where a user investigates the USA and Russia's life expectancy indicators, VAKG constructs a graph network which is represented in Fig. 3 as was discussed in Sec. 3. Fig. 3 shows the four sequences described in Sec. 3.1, where at first, the user does not have any insight and the VA tool has some visualizations. However, as the user's interest is in comparing Russia and USA, the Computer Update sequence shows the interaction performed by the user. In contrast, the Computer State sequence only describes what is being displayed by the VA tool at each point in time.

Similarly, the two Human sequences on the bottom of Fig. 3 show the user's knowledge gain while user interests cause interactions with the VA tool. From this point, VAKG would include many other users' interactivity and knowledge gained in the same KG. The resulting graph is difficult to understand as-is. Still, by applying graph mining, we can find which and how many of the users performed the patterns of Fig. 5 . Also, by verifying the length of Human Update sequences, we could validate the complexity of the tool, the complexity of the tasks given to the user, and the users' performance distribution compared to their demographics. Some of these EDA user behavior analyses were also exemplified in related works [5, 20, 51] . Nevertheless, their analysis of VA tools and user behavior was limited compared to VAKG due to the lack of proper separation and linking among all four lanes of VAKG (G1).

Similarly, Battle and Heer [5] have used a graph-based analysis to verify what is their users' behaviors when analyzing datasets in Tableau. Their workflow built a analytical provenance graph divided into Behaviour Graphs and Interaction Sequences, which is a graph similar to VAKG where Computer and Human States were merged as Behaviour Graphs, and Computer and Human Updates were merged as Interaction Sequences. They go to great lengths describing and citing how they harnessed many insights from their user's behavior with just these two sequences, such as user personality traits, task performance, and user interaction prediction. We argue that if they were to split the Computer and Human lanes and included property-maps with sub-task identification, as was discussed in Sec. 3, even more insights would have been possible, such as whether a user's task performance is related to an ability to gain knowledge faster through the same behavior graph or if the users have entirely different behaviors to justify this difference.

Using VAKG to store and analyze a sequence similarly applies to MLrelated tasks. Vis4ML [40] describes a form of modeling in detail such tasks. Still, they claim that the entire ontology is valid for both Computer and Human components. VAKG instead focuses on the division between the two and how this division allows for the analysis of an executed task. For instance, their first example of a workflow as ontology pathways tackle ActiVis [29] , which is a system for assisting in learning large-scale deep learning models through the visual exploration of neuron activation among possible classification targets. The ontological pathway presented shows how parts of Vis4ML describe the workflow. In VAKG, however, we first focus on splitting the workflow's Human and Computer interaction. Therefore, understanding activation patterns and revising the CNN based on such observations would constitute a user harnessing information from a Computer State. With the knowledge from it, the user executes a single interaction. Graph-wise, VAKG would be very small, but all this information would be stored within the class' property-map, including the CNN configuration and what caused the user to perform such change. VAKG can replace this property-map with an inner KG following the Vis4ML ontology, allowing us to use Vis4ML's descriptive power.

Finally, we know from ActiVis [29] that this workflow is in the midst of many others, which can be executed multiple times by the same or by different users. VAKG's power is to record all of the CNN creation and tuning provenance and tag all sub-tasks of the workflow through properties. Finally, users can then use the resulting graph network to analyze the workflow again. For instance, if two users created the same CNN, did they do the same operations? Or, if a third user tries to execute the same ML workflow, VAKG's analysis would provide the ideal workflow path by merging the workflow paths of the previous two users. Following the same process, we see that VAKG can similarly model all examples from Vis4ML [40] to provide a provenance graph that includes the user's knowledge, all pre-processing data tasks, and the full machine learning creation and modification workflows which can then be analyzed.

Since VAKG shares large similarities with the provenance and userbehavior research [5, 47, 51] , many comparisons can be made. However, since VAKG follows the generic knowledge architectural model [39] , it can also describe the process used by provenance designers themselves as well. By analyzing a VAKG graph network of how VAKG itself was built, including all researchers who discussed and gave feedback and how the related works modeled our goals, we could analyze whether VAKG is not as descriptive as other ontologies [20, 40] and how similar it is to existing provenance works [33, 51] . VAKG flexibility and its state space KGs differentiate our work from existing provenance works. But in the same token, VAKG proposes ways to improve provenance analysis by linking provenance to state space graph networks.

Furthermore, developing a provenance workflow is in itself a workflow. Tracking the origin, any changes, and links among a collection of datasets is a complex task that is being well researched by the Data Integration and VA communities. We have compiled a survey on Table 1 analyzing most of the literature cited so far and how they compare to VAKG goals.

VAKG is also a good fit for VA systems where the user has access to a dashboard of linked visualizations to perform simple data analysis, such as GapMinder [38] , Covid-19 [16] , national census websites [45] or other similar websites [41] . Workflows such as these would result in a graph containing all possible states of the dashboard as Computer States, the possible user interactions as Human States, and the sequences of user sessions using the dashboard as the two Update sequences. Similar to before, graph analysis can be used for insights [5] , and extra subontologies can be used to model how the linked visualizations interact and influence each other [42] . Another similar example is the usage of VAKG within statistical analysis tools [1] , where datasets are loaded, processed, and through statistical analysis, the user can extract whether a particular hypothesis is valid or not. In addition to the same VAKG modeling as before, any dataset(s) changes over time due to dataprocessing operations are also tracked on the Computer entities through sub-ontologies such as Vis4ML [40] . Also, analytical examples from user behaviour [5] and user motif [47] can use sub-tasks abstractions through property-maps as means to perform such analysis. In other words, VAKG systematically generalizes existing works, and by using property-maps and sub-components, the resulting VAKG graph can be transformed and used as any of the previous works' behavior or provenance graphs.

Additionally, there are other examples outside the realm of EDA and even VA where VAKG can be used as a TKG for analytical purposes. For instance, we have considered VAKG to store and analyze the concept of strategy within sports and eSports. For instance, within a soccer match, VAKG could store all possible strategies a team can do Table 1 . VAKG compared to the literature. Although related works use parts of the information or data similar to VAKG, some are part of a single united ontology or structure (green rectangles spanning columns), some have disconnected concepts (green squares without a link), and some are separated linked structures like VAKG. Also, only some apply the theoretical model as a graph network. And some discuss analysis as a focus of their work (blue squares) or in a limited fashion (yellow squares).

as the Human States and all physical configuration of the match in the Computer States. The Update lanes could then define how a specific match went from beginning to end. By analyzing multiple matches, one could harness insights about the team. Similarly, by storing and analyzing the strategies and the TKGs of real-time strategy eSport games, the players can take similar insights and even find undiscovered strategies.

During the development of VAKG, Janio was kind enough to give inputs related to the usefulness of tracking user interactivity for knowledge gain. Janio is an experienced entrepreneur and business owner in the international food supply chain as an international food importer and wholesaler, and regional restaurant chain ownership and management with over 30 years of experience. The hour-long interview focused on his methods to store information and extract knowledge from Enterprise Resource Planning (ERP) and Accounting systems. In summary, he said the usual process depends heavily on training employees to record sales manually and restocking data in a system. Once the data is stored and structured into a timeline of events, he generates reports and uses his own business experience to discover and explain insights from the data. We have included a small excerpt of the interview below.

Q: Can you give examples of when and how you use provenance and analysis? One example is how I analyze the product purchase behavior of select customers on a month-to-month basis and the overall monthly product-based sales compared to previous years. I do this by extracting all reports from my accounting software and using MS Excel to merge them and analyze the result, sometimes on a month-by-month basis or a year-over-year basis, depending on what I am analyzing. (...) Among the main difficulties within the process is the required expectation of correct imputation by the employees and the amount of external knowledge I need to bring to the report so that analysis can be meaningful. In our restaurants, employees would input the payment type of a purchase, such as cash or credit card. However, once the customer makes the payment, sometimes the type would swap, but the system is not updated, causing many discrepancies in the monthly accounting process.

Q: How external knowledge was used and was it included in the reports? (...) managers typed in manually certain events, such as heavy rain or large soccer match, while inputting the weekly restocking information. (...) however, most of the time, the external knowledge I use is either only manually typed during the analysis itself after the generated report or not included at all, just coming from my head, experience, and the internet.

We also interviewed Thiago, a Data Science General Manager of Credit Recovery in a large bank. Thiago has a Ph.D. in Machine Learning applied to Natural Language Processing and has over a decade of industry experience in applied data science. In summary, we discussed in his interview the past and present use of provenance analysis within decision-making processes.

Q: Can you give examples of when and how you use provenance and analysis? Bank transaction provenance, such as payments and money transfers, is extensively used for flagging fraudulent transactions. (...) My area then uses machine learning to predict the tendency of credit repayment. The treatment of temporal relationships of the data, such as order of transactions, is sometimes part of an embedding step or forwarded to a machine learning model which can handle temporal data. A new project in my area uses provenance to better design a data science workflow. Currently, our bank does not have a well-optimized data analysis workflow. Still, by tracking the data analysis, we plan to record the work effort of each task and, in the future, analyze the results to improve the overall workflow or help us plan the future by predicting how long data analysis tasks will take.

Q: Would a temporal knowledge graph structure, such as the one from VAKG, aid in such applications? How? A co-worker and I have been investigating using a knowledge graph representation of customers that could describe behavioral patterns for better predictions during fraud analysis. (...) Another large intersection I see is by using such a structure to record and analyze our Data Science pipeline. Using such recording, we envision predicting a more optimized sequence of tasks required for a new data science project given a description of goals or problems we want to solve.

We use these interviews to check in the wild the significance of our motivation, goals and get insights into future works for VAKG when applied to real business analytical processes. For instance, Janio confirmed that his experience with ERP and accounting software essentially had data provenance but rarely tracked any external information at the time when the data was changed, such as the reasoning of why his employee changed the data (G1). Janio said he manually had to track down these extra pieces of information through an involved and timeconsuming cross-check process and use external software to merge and analyze it (G2 & G3) .

Similarly, Thiago's usage of provenance requires significant work by data scientists for analysis to process the data (G2 & G3), and his wish to better understand this data science pipeline through provenance and knowledge graphs indicates both the lack and the potential of using VAKG (G1). These examples show that VAKG's focus on tracking the user's knowledge gain in tandem with user interactivity aims to aid in this problem: the analysis of data-oriented workflows that requires or would benefit from the connection between data provenance, external data (e.g., supply chain or bank transactions) and user knowledge gain (e.g., intentions and behavior).

Although we argued that VAKG is more capable than other existing ontologies, we know that this is achievable only because VAKG uses these existing ontologies, such as [17, 20, 40, 47, 51] as sub-components.

VAKG's focus is also not on its descriptive power, as is the case of other ontologies [40] , but on its division of the VA workflow into the 4 main lanes of Fig. 1(C) . VAKG is a model, not a framework. Therefore it does not solve the issue of how to perform user-tracking [33] nor expands the analytical arsenal of user behavior or provenance [5, 18] , but it does provide an ontology which these works can use to perform analysis. VAKG also easily generates very complex TKGs, which are hard to visualize and may cause issues related to storage space if used indiscriminately, but this complexity depends on what workflow VAKG is used to model.

Of course, we have framed VAKG as a theoretical temporal architecture framework that does not replace existing works but provides a way to use many of them while separating the computer/human aspects and provenance/state space aspects for analytical purposes. That said, we believe that such limitations need to be addressed separately in a framework or application-level use-cases. Results and evaluation of these future works will be driven by their own use-cases, which do not fit within the theoretical contribution presented in this paper. Similarly, broader surveys of existing works which can be compared to VAKG or used as VAKG sub-components to solve specific use-cases would be desirable. Still, once again, this is better treated as a specific use-case survey. Our contribution, however, sets the foundation for these future works.

The most critical limitation of VAKG is that user-tracking has largely been seen very negatively. User protection laws and initiatives, like Europe's GDPR and Apple's "Ask not to track" features, are just a few of many examples. VAKG depends on users' consent to be tacked and that their data be analyzed, which is undoubtedly a concern. This limitation is not new and is shared by all related works. However, in many cases, the users of VAKG are also the ones whose behavior is being tracked, which means that they would accept and welcome the tracking necessary for VAKG to work. For instance, if VAKG is used in conjunction with Vis4ML [40] to model how data scientists perform ML-related tasks in Kaggle, the results would be useful for these same data scientists to learn and perfect their practice. Further study is needed to analyze how impactful this would be if VAKG is implemented as a system. Still, until then, we believe that many use-cases would welcome our approach.

We have presented VAKG, a theoretical temporal architecture framework that generalizes existing knowledge models and ontologies and structures them as a 4-way temporal knowledge graph describing user behavior and knowledge gain during the execution of a VA workflow. VAKG's structure allows for graph-based analytics of domain-specific processes (e.g., EDA), usage patterns, and user knowledge gain performance. We propose that by collecting the user interactivity, intents, and states of a VA workflow, VAKG can define and generate a graph representing the temporal sequence of events and the relationship of states of the workflow in question, which are contained in the Update TKGs and State KGs respectively. Additional data and its relational ontology is incorporated as sub-components of these existing entities through property maps, property tags, or sub-graphs.

By collecting information from multiple users or multiple VA workflows, VAKG's resulting graph represents an overview of the VA workflows' usage and the collective knowledge generated by their users. This loosely coupled architecture allows VAKG to be easily extensible and adaptable to various situations and domains, including its extension to use other models or ontologies. Using VAKG as a provenance architecture, the data gathered can also be easily used with existing graph analytics techniques, such as visualizations, shortest path analysis, recommendation systems, link prediction, temporal relationship analysis, and page-ranking.

We acknowledge and thank Janio and Thiago for their time and great inputs and insights. Authors acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC). , which is shown in Fig. 1(A) . This first knowledge model partitions the knowledge space into human and machine sections. The full mathematical model of this partition was proposed by Federico et al. [? ] . As per their work: "The model is divided into two spaces (machine and human) and describes knowledge generation, conversion, and exploitation within the VA discourse, in terms of processes: analysis A , visualization V , externalization X , perception/cognition P , and exploration E ; containers: explicit knowledge K , data D , specification S , and tacit knowledge K ; and a non-persistent artifact: image I". By analyzing their representation of these concepts within VA, we find that the circle nodes {V, P, E, X, A} represent elements that cause changes within a VA tool. For instance, the visualization V resides in the machine space, and it causes changes over the perception/cognition of the user P, providing new insights. Similarly, the exploration task executed by the user E, which is in the human space, can update the VA tool's specification S, which may update the data or visualization being shown within the VA tool. We also see rectangles in their representations, which are nodes that define the knowledge model's information types. For example, elements such as data D, specification S and tacit knowledge Kt represent the fact that there is information within the , we identify all the moving parts within the knowledge model iterative loop. The first contribution of VAKG is to define the conversion of this iterative loop into a mathematical model. First, let us write all equations described in their work. Since circle nodes are changes over time, we mapped them into mathematical operations, which are applied to the rectangle nodes:

Each equation represents one of the square nodes of Fig. ? ?. Eq. 1 uses the nodes of all arrows directed into the node K T , Eq. 2 uses the nodes of all arrows directed into the node K ε , Eq. 3 uses the nodes of all arrows directed into the node S and Eq. 4 uses the nodes of all arrows directed into the node D. From them, we have the list of equations that define the knowledge model interactive loop, where the state t + 1 of every square element is defined as a function of the previous state t, and all operations (circle nodes) are applied to these states at time t. For instance, if we consider the start of the analysis of user U at t = 0, the user's tacit knowledge {K T 0 } can change due to any new perception on the user's part {P t=0 }, causing the VA tool to reach t = 1 through Eq. 1. From this point on, either the user executes one or more of the operations {X, P, E}, or an automatic process within the tool So far, VAKG has only extended the mathematical formalization of Sacha et al. [? ] and Federico et al. [? ] into a set of equations that identifies the transitions of states over time C t and the update operations U t performed over the states. However, we have not yet addressed how one utilizes them to solve the proposed problem: retrieving, storing, and providing analytical capabilities of the user's ongoing knowledge gain. For this, VAKG maps these equations into a TKG.

As was discussed previously, KG is a graph structure where knowledge reasoning is modeled as connections between entities or properties, such as "George Washington is a human" and "Canada is a country". A TKG, however, models these connections as the temporal relationship between the entities or properties. Many different types of TKGs exist. For each of them, this temporal connection has a different meaning. For instance, the most used version of TKGs uses time as the connection, where two connected nodes represent two events that co-occurred. An example of such a KG would be all purchases done between different businesses within a supply chain, where the product "Mayonese" may have been bought by a store "Wallmart" from the seller "Hellmann's" on "25/06". In this TKG, the connection between the three nodes: Wallmart, Hellmann's, and Mayonnaise, would be "25/06". However, as we have seen through the formalization of Sec. 1, the primary relationship used by VAKG between nodes is whether an event happened immediately before or immediately after the event. Therefore, the TKG used by VAKG uses the order of operations as the main connection between nodes.

VAKG, therefore, is a graph that defines the sequence of states and updates where the main attribute which connects the nodes of this graph , changing the state of the VA tool C va t . In parallel, the user started with previous tacit knowledge C u 1 and through a sequence of findings and insights U u t , they reached the end of the analysis at C u 4 . Similarly, in (B) two different users performed the exact same interactions through the same sequence of findings and insights as each other.

P will follow Eq. 11 and the update operation node U u t within the KG should also contain information about this new knowledge, such as its origin or description. If thoroughly applied, these ontologies would allow the connection of VAKG to other databases, such as Dbpedia, external resources with urls, and also to VAKG itself as one would expect from a KG. All this usage information is used by VAKG to generate a TKG with four sequential lanes, which is exemplified by Fig. 3(A) through a simple graph visualization generated from the equations from Sec. 1.

In addition, Fig. 3(B) shows VAKG when two users have the same interactions with a VA tool, where user one is colored as yellow and user two as brown. In Fig. 3(B) , it is possible to see that the sequence of states in green C va t and C u t are the same for both users, while the update nodes in yellow and brown are separated per user. By following the process outlined before, the resulting TKG has this property: while the update nodes track each individual user as a sequence of interactions, the state nodes are indifferent to the interactions themselves, instead of tracking the changes over time of either the VA tool or the user's knowledge. In other words, if two users were using a VA tool and followed the same process of interactions and insights, the VA tool state sequence C va t would record the visualizations being shown to the users over time. However, the update sequence U va t would record the specific timestamp of the user interaction, any browser information, such as user country or which browser the user is using, among others. Of course, the amount of information tracked by VAKG depends on what the VA tool provides. Similarly, since in this scenario the knowledge gained by each user was exactly the same, the state sequence C u t will record the insights and knowledge gained, and the update sequence U u t will record interests and goals of each user, their reasoning for such interests, any demographic information of the user's part which may influence these interests, and so on.

This hypothetical scenario of an exact match between two users where they execute perfectly the operations to gain relevant knowledge is certainly not something we expect to see in practice. Among many of the possibilities of user interaction and knowledge gain patterns, Fig. 2 shows several examples we have encountered while testing VAKG. Namely, the pattern (A) of Fig. 2 was always observed when allowing for the VA tool to be used by multiple users, patterns (C) and (D) were observed multiple times during a real EDA investigation which normally would span up to 50 state nodes, and pattern (B) happened many times during user evaluations of one of our EDA tools, where a questionnaire gave users a specific goal, and though they took different paths, they reached the goal with the same final answer to the survey.

Comparing four contemporary statistical software tools for introductory data science and statistics in the social sciences

Viewing visual analytics as model building

A graph is worth a thousand words: Telling event stories using timeline summarization graphs

Dbpedia: A nucleus for a web of open data

Characterizing exploratory visual analysis: A literature review and evaluation of analytic provenance in tableau

Comparing visual-interactive labeling with active learning: An experimental study

A multi-level typology of abstract visualization tasks

Cava: A visual analytics system for exploratory columnar data augmentation using knowledge graphs

Defining insight for visual analytics

Appgrouper: Knowledge-graph-based interactive clustering tool for mobile app search results

An ontological framework for supporting the design and evaluation of visual analytics systems

Pathways for theoretical advances in visualization

A review: Knowledge reasoning over knowledge graph

Explainable patterns: Going from findings to insights to support data analytics democratization

Advanced web metrics with Google Analytics

British columbia covid-19 dashboard

Real-time linked dataspaces: Enabling data ecosystems for intelligent systems

Towards a taxonomy of provenance in scientific workflow management systems

Knowledge provenance infrastructure

The role of explicit knowledge: A conceptual model of knowledge-assisted visual analytics

Introduction: what is a knowledge graph

Concise provenance of interactive network analysis

Eventkg+ tl: creating cross-lingual timelines from an event-centric knowledge graph

Chronos: a graph engine for temporal graph analysis

Aloha: developing an interactive graph-based visualization for dietary supplement knowledge graph through user-centered design

Graphical histories for visualization: Supporting analysis, communication, and evaluation

Kgtk: a toolkit for large knowledge graph manipulation and analysis

Gnnvis: A visual analytics approach for prediction error diagnosis of graph neural networks

Activis: Visual exploration of industry-scale deep neural network models

Mastering the information age: solving problems with visual analytics

Answering questions about charts and generating visual explanations

Embeddingvis: A visual analytics approach to comparative network embedding inspection

Insideinsights: Integrating data-driven reporting in collaborative visual analytics

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

Mapping er schemas to owl ontologies

Neo4j -the world's leading graph database

Viso: A shared, formal knowledge base as a foundation for semi-automatic infovis systems

Data -gapminder.org

Knowledge generation model for visual analytics

Vis4ml: An ontology for visual analytics assisted machine learning

Engage nova scotia -quality of life survey

Investigating visualization ontologies

Vista: A visual analytics platform for semantic annotation of trajectories

explainer: A visual analytics framework for interactive and explainable machine learning

Census of canada

Design space of origindestination data visualization

Interaction taxonomy for tracking of user actions in visual analytics applications

Informed machine learning-a taxonomy and survey of integrating knowledge into learning systems

Ripplenet: Propagating user preferences on the knowledge graph for recommender systems

Knowledge graph embedding: A survey of approaches and applications

Survey on the analysis of user interactions and visualization provenance

Flowsense: A natural language interface for visual data exploration within a dataflow system

executes one or more of the operations {V, A}, which by following the equations above, any of these operations will cause at least one of the states {D, S, K ε , K T } to be updated from t = 1 to t = 2. This transition happens on every new update or interaction with the tool, creating the interaction loop we have discussed previously. Now, if we define the set of all information nodes {D, S, K ε , K T } (squares in [? ] ) in any given time t as C t = {D t , S t , K ε t , K T t }, and similarly the set of all changes U t as an update operation of any of the possible changes over time {X, P, E,V, A} (circles in [? ] ) between the time t = n and t = n + 1 as U t = {X t , P t , E t ,V t , A t }. We conclude that in a generalized manner, C t+1 = U t (C t ) is the simplified interaction loop defined by Eqs. (1), (2), (3), and (4) .Expanding in this same concept, we also define the sub-sets C u t , C va t , U u t and U va t as the subsets of C t and subsets of U t divided between machine va and user u respectively. For instance, the subset of U t which pertains to the addition of tacit knowledge is U u t {V t , A t } and the subset which defines the changes within the tool's state, including data usage, filters, selections, and other similar information is U va t {X t , P t , E t }. Therefore, when accounting to this division, the final set of equations isFinally, VAKG defines with these equations the sequence of update operations U which are applied to the states C as a temporal finite state machine (t-FSM), where given a time t there exists a container state C t = C u t ∩C va t which stores all the related data of the VA tool C va t and the user's tacit knowledge C u t . VAKG also defines an update transaction U t = U u t ∩U va t which stores the explicit relationship between C t and C t+1 as C t+1 = U t (C t ). Visually, VAKG's definition can be interpreted as a set of intertwined directed graphs as shown in Fig. 1(C) , where the lower half of the figure pertains to the sequence U va t in yellow and C va t in green of the tool's updates over time, and the upper half pertains to C u t and U u t generated during the user's analysis and knowledge gain over time. These two halves are also connected externally through the TKG formalization, which will be discussed next.is the sequence of events over time. As is shown in Fig. 1(C) , the graph nodes are connected by their respective temporally precedent node (e.g., C u t connects to C u t+1 and C u t−1 ), which forms 4 temporal sequences: C u t , C va t , U u t and U va t . The two other connections also present in Fig. 1 (C) are the update-state relationship (e.g. C u t+1 = U u t (C u t )) and the synced links between va and u nodes for a given time t (e.g. C u t C va t ). Another way of seeing these connections is: the states C u t and C va t and the update operations U u t and U va t are connected in 4 separate temporal sequences and also connected to each other following the equation C t+1 = U t (C t ).Now, in order to formalize this transition from the set-theory equations of Sec. 1 and the TKG, we first consider that the graph generates links between the entities through the property of time and formally define VAKG as any subset of the full TKG model of Sec. 1. Why a subset? In order to construct the full TKG, VAKG would be required to know all tacit knowledge K T from the user's part, all information used by the VA tool, including datasets D, domain knowledge K ε and all other concepts defined in [? ] . We recognize that this is unfeasible for a real VA tool, and as always, certain conditions or restrictions are needed to utilize a theory in practice. Therefore, we define that VAKG does not necessarily know all the information, or in other words, we modify the equations from Sec. 1 toWhere the information within any update U t or state C t is only a subset of the entire corpus of information which would fully describe the transition C t+1 = U t (C t ). Still, the property map of each node, which is the data stored within each node of a KG, must be tied to the definitions given so far. For example, if at time t the dataset D changes due to an automatic process A, we can follow Eq. 12 to know that the node U va t will need to have the information regarding this automatic process A t and, with this information, the state C va t+1 will be different from C va t due to this update process. This example is once again the application of C t+1 = U t (C t ).A concrete example could be: if the data D is the current weather status of a city, it will constantly update. Therefore any new data arriving will update D due to an automatic process A at time t. This VA tool update U va t should have a link or some information regarding this dataset change. With such information, VAKG defines the new state C t+1 by the means of C t+1 = U t (C t ). Similarly, if the user is navigating the web searching for something, any updates to the user's knowledge