key: cord-0611593-kbvmeti4
authors: Purich, Joanna; Mahmood, Hira; Chou, Diana; Udeze, Chidi; Battle, Leilani
title: An Adaptive Benchmark for Modeling User Exploration of Large Datasets
date: 2022-03-29
journal: nan
DOI: nan
sha: 2046014356d58a23b76fe957b165e884ca36be6e
doc_id: 611593
cord_uid: kbvmeti4

Interactive analysis systems provide efficient and accessible means by which users of varying technical experience can comfortably manipulate and analyze data using interactive widgets. Widgets are elements of interaction within a user interface (e.g. scrollbar, button, etc). Interactions with these widgets produce database queries whose results determine the subsequent changes made to the current visualization made by the user. In this paper, we present a tool that extends IDEBench to ingest visualization interfaces and a dataset, and estimate the expected database load that would be generated by real users. Our tool analyzes the interactive capabilities of the visualization and creates the queries that support the various interactions. We began with a proof of concept implementation of every interaction widget, which led us to define three distinct sets of query templates that can support all interactions. We then show that these templates can be layered to imitate various interfaces and tailored to any dataset. Secondly, we simulate how users would interact with the proposed interface and report on the strain that such use would place on the database management system.

Interactive dashboards provide and accessible and efficient way for users to examine data, whether that entails analyzing business performance metrics, measuring impacts of war on the economy, or tracking political registration patterns across the country. The interactive elements of these dashboardsvisualizations and widgets -allow users to explore data and generate insights. As users manipulate widgets in the interface using clicks, hover, or drag-and-drop, the display updates to show the requested visualization. Each interaction in the interface triggers a request to the backend database management system (DBMS) for the data needed to produce the updated visualization, and the design of interactive relationships between these widgets and visualizations have an impact on the performance of the DBMS. Users interact directly with a frontend interface containing one or more visualizations and widgets. Widgets include radio buttons, sliders and checkboxes which are separate from the visualization, but trigger updates to the visualization, as well as interactions that are embedded in the visualization itself, such as highlighting or brushing. The frontend interface is supported by a database management system (DBMS) that houses the data that is used to create the visualizations. As the user interacts with a widget, their selections are passed to the backend which translates the selections into a query. This query is executed against the DBMS and returns an updated dataset to the frontend, which renders the data as a fresh visualization. During investigation, each refinement interaction can generate a single fetch to the DBMS or a cascade of requests that refresh multiple visualizations that are linked together. Users can quickly overwhelm the DBMS by generating too many requests in a short window. Quickly moving a slider widget back and forth can create hundreds of queries per second or thousands of queries if the slider acts upon multiple visualizations . Instead of a near-instantaneous update to the visualizations, the interface lags, disrupting the user's thought process. Latency as small as 500ms has been shown to decrease user performance, although analysts might be more sensitive to certain widget interactions over others (13) . Existing analytic database benchmarks such as TPC-H 1 provide guidance on performance expectations, but are not tailored to the specific query loads generated by visual analysis systems-bursty queries inter spaced with periods of thinktime [2, 14] , large percentage of simultaneous queries (from linked visualizations) and iteratively refined queries (). The aim of our research is to create a benchmark that can be used to characterize the expected load of an interactive visualization interface for a DBMS. Such a benchmark will allow developers to predict the DBMS query load before implementing an interface, identify pain points, and identify potential optimizations on the DBMS or interface to improve performance of both the system and the analyst. We envision a system that takes a visualization system, dataset and intended DBMS as input and generates a realistic query workload of how users would be expected to interact with the visualization. The workload would be executed against the supplied DBMS and statistics captured to benchmark performance. In summary, we contribute: C1 A specification language for describing interactive visual interfaces and usage patterns C2 A benchmark system specialized for visual interfaces C3 Models for generating user behavior for static and dynamic interfaces C4 Prototype visual interface systems to demonstrate our benchmark's capabilities.

In this section we review relevant research in the areas of precision interface design, visualization benchmarks, and the effects of latency in interactive data exploration systems.

A. Interactive Analysis Benchmarks. User studies are expensive (6) and cannot be conducted for every unique interface; some level of modeling is required to develop realistic workloads. Eichmann et. al. developed the DBMS benchmark IDEBench to evaluate interactive visualization workloads (4) . In IDEBench, workloads are modeled using a configurable Markov Chain model trained on user-traces from previous user studies (3, 5) . However, this model does not consider differences in the interaction capabilities across dashboard designs. Dasbhoards vary widely in the types of interactions they support (9), leading to differences in how users interact with these dashboards, and in turn differences in the resulting query workloads (2) . Given the limitations of the simulated interactions, dashboard designers may not yet be able to take full advantage of benchmarks such as IDEBench. Our benchmark is configurable to the capabilities of the interactive analysis system and supports both observed and simulated workloads.

Battle and Heer characterize exploratory visual analysis (EVA) as, "the subset of exploratory data analysis where visualizations are the primary output medium and input interface for exploration (8) ." Battle and Heer observe that analysts iteratively set analysis goals, which are refined and answered through visualization development. Analyst search trees are depthoriented and when the task is help constant, they tend to converge to similar paths across users. These observations suggest that user behavior can be modeled, but a wide array of user objectives should be considered.

C. Evaluating the effects of latency in interactive visualization systems. Liu et. al. (? ) observe that high system latency in interactive visualization systems can limit exploration and reduce analysis outcomes, even with latencies of 500ms. Furthermore, the impact of latency is disproportionate across interaction types. Battle et. al. (1) find that while latency is widely believed to be a strong predictor of user behavior, in some cases, depending on the task type, task complexity, etc., latency can be statistically insignificant when it comes to predicting user behavior. This finding provides a more nuanced view of the role of latency in visualization systems, which is important for our research given its commitment to using developing a benchmark that can be used to improve the design of DBMS systems with the goal of reducing latency.

In order to capture interactivity usage and change over time, we develop a specification language inspired by Vega Lite (10) to express the capabilities of a visual interface and interaction sequences. A specification standard allows us to benchmark dashboards written in all languages, but not yet including all possible visualizations. While Vega Lite is capable of capturing visualizations, it does not support capturing the linkages between visualizations. It also assumes the database is a flat file, limiting what dashboards can be represented. Our interface specification captures the display (visualization) and interactive (widget) aspects of an interface as well as their relationships with each other. A complimentary data specification expresses the structure of one or more datasets used in the interface. We do not assume the database is a flat file, allowing us to express more complicated architectures.

A. Motivating Example. Suppose that a dashboard allows users to explore the spread of COVID-19 across Maryland, USA. Users have access to a radio button widget which can be used to select their metric of interest: positive cases or deaths. Three visualizations are shown on the screen: a line graph which displays the selected metric over time, a heat map of Maryland and a text box that shows the positive cases as of the current date. When 'Positive Cases' is selected with the radio button, then the line graph is updated to show positive cases over time and the heat map is updated to show the geographic spread of positive cases. These two visualizations are linked; when the radio button selection is changed it triggers an update for each visualization and two requests are sent to the DBMS.

B. Database Specification. The database specification is the JSON representation of the database that supports the visualizations. The structure, which is heavily inspired by IDEBench, follows a simplistic format containing only the data necessary to run the benchmark. The specification defines one or more data sets, and each set contains a list of key-value pairs, where a key is the name of a metric and the C Interface Specification value is an identification of the corresponding metric as either "numerical" or "categorical". The metric identifications are used when constructing the SQL statements used to update the UI graph upon an interaction with a widget. Due to it's simple and high-level format, our database specification is easily defined and portable across a variety of databases.

In the case of the COVID-19 example, the data is represented in a single table. Our dashboard is able to represent multiple tables. Within the table object, an example key could be "positive_cases", with a value of "numerical". Likewise, the key "county" would have a value of "categorical".

C. Interface Specification. The complexity of the user interface is expressed via the Interface Specification. Three sections, Visualizations, Widgets and Relationships are used to define the elements that are present in the interface and the dependencies between the elements. Our specification supports the expression of rigid interfaces, in which the underlying data changes, as well as flexible interfaces in which visualization can be permuted and new relationships between UI elements declared. We support both visualizations and widgets-interactive elements of the interface that act on visualizations. A widget can be a radio button or, in the case of a brush filter, a visualization is categorized as both a widget (source of interaction) and visualization (target of interaction Relationships not only create a mapping between a widget and visualization, they explicitly state the data attributes that are shared between the two. The parameters taken in include the name you want the relationship to be identified as, the source which identifies which widget or visualization is being extracted from, the attribute being a list of metrics one would want the changes to be impacted on, and then targets, which is composed of a type and name of the connection being made. The relationship component of this schema demonstrates a one-to-many relationship where you can have one source mapping to many targets. The connection of the interaction specification format and the relationship component of the visualization is that an interaction is an instance of a relationship. An example of this is demonstrated below where we have relationship_1 that is extracting information from widget_1 (source) and is manipulating the type through a series of spec manipulations on the attribute color, these changes are then reflected on the corresponding target viz_1, which will display the manipulations made. Similarly, the relationship identified as brushfilter1 is demonstrating a visualization to visualization connection where viz_2 is the source that had a type of manipulation defined to be a data manipulation on the attributes longitude and latitude where the changes are then reflected on the target viz_3.

Our interaction specification language is heavily inspired by the VegaLite syntax, but is more limited to allow for a simpler interaction definition. The interaction specification format is a JSON format where each interaction event is expressed as a dictionary containing two keys: relationship and parameters. A relationship must be one of the relationships that is defined in the interface specification. The parameters defined in interaction correspond to those listed in the corresponding VizSpec.relationships.attribute. When you create the interaction you should expect to see each one of those parameters in the visualization specification to be listed in the interaction and in the case that the parameters do not have all the corresponding VizSpec.relationships.attribute, the format will then be deemed ill. The goal of this interaction specification format is to allow our application to generate information in a more human readable format. This way, users can use our application to create logs and pass them to the benchmark, or can provide their logs in whatever format they have written in, and we can run it through our benchmark to give them the desired output.

Walking through the example in Figure 3 , for the interaction specification we are defining the relationship as being the brush filter whereas the parameters are longitude and latitude, the items being manipulated. This output is that which is displayed by our current logs.

We created a logging system to capture details about interactions that occur between users and widgets. Every interaction with a widget is recorded as a log object and outputted into a JSON Lines (.jsonl) file. Each log object contains the following 3 elements:

• "relationship": the relationship between a widget and the visualization affected

• "timestamp": the timestamp of the interaction • "parameters": the parameters of the data queried A log consists of a hierarchical JSON structure defined by the string "relationship", and contains the details of the user's activity with a widget, including the widget or relationship name. The timestamp is sent from the web-client to the server using fetch API calls, and consists of a number representing the milliseconds elapsed since the UNIX epoch. The parameters for the interaction log follow the format of Vega-Lite Predicates, which are defined by a logical composition of:

(1) a Vega expression string, (2) one of the field predicate properties: equal, lt, lte, gt, gte, range, oneOf, or valid, (3) a parameter predicate, which define the names of a selection that the data point should belong to (or a logical composition of selections). These Vega-Lite predicates define how the data was queried in order to populate the visualization. 

We created a python script that parses the Tableau interaction logs from the Tableau dashboards and outputs corresponding logs in our benchmark logging syntax. With this log parser, we are able to run our benchmark on Tableau dashboard interactions.

The script takes as input the raw Tableau logs and the interface specification file for the given dashboard. We iterate through each of the log records to parse and filter them. For each log, we parse the log names and the parameters, which represents the field being interacted with for each action. With this field name, we iterate through the interface specification file to identify which relationship is being triggered with the interaction recorded in the log record. The source widget for the corresponding relationship must contain the parsed field from the tableau logs in it's "attribute" value in the interface specification. Some of the Tableau log names correspond to particular types of widgets, and in these cases, we use this information to narrow down which possible widgets could correspond to the interaction before iterating through the interface specification. For example, the log name "tabdoc:quantitative-quick-filteredit" represents a slider widget type. The parser is able to identify which fields are interacted with and which relationship from the interface specification is triggered, but it is not always able to identify which value the interacted field is set equal to for some of the interaction types.

In these cases, we hard-code the values into the outputted benchmark logs before running them against the benchmark.

Given a set of specification documents, the benchmark system produces query workloads, executes the workloads and generates performance reports. From the set of specification documents, the dashboard interface is rendered as a graph with widgets expressed as nodes and interactivity relationships as edges. Interaction sequences are played across the graph to determine which nodes are activated as a result of an interaction and will therefore require refreshed data from the DBMS. For example, selecting a new item in a dropdown results in an update to the dropdown itself as well as the filtering of any target visualizations. The interface specification is initialized as a graph that is traversed or redrawn for each interaction read from the interaction specification. For every interaction, a workload is derived using the SQL compiler.

A. Interface Graph. The current state of the interface is represented in a directed graph. Each node is a widget or visualization and each edge is a relationship that originates at a widget node and terminates at a visualization node. In the case of zoom where the visualization acts on itself as a widget, we create a single node in the graph that inherits both visualization and widget characteristics. Properties such as fields, aggregation operators and filters are stored at graph nodes. These properties are compiled into SQL statements that produce a snapshot of the visualization base dataset. If the interface contains wildcards, then interface manipulations can occur that alter the UI graph. For example, encoding a new attribute to a visualization or removing a relationship. When an interface manipulation occurs, we first check that the manipulation is allowed by the wildcards in the interface specification. If allowed, the graph is updated to reflect the new state and a new visualization specification document is logged. Finally, we identify any nodes whose properties or relationships changed. These nodes are compiled to produce a SQL query and append the query to the workload. Each item in the interaction specification is either an interface manipulation or a data manipulation -applying a filter by selecting value 'BWI' for attribute airport code. Each data manipulation triggers a traversal of the interface graph to identify linked nodes. For every linked node, we update the node properties to reflect the newest data manipulation, compile the node and append the query to the workload.

We assume the worst case for each interaction, each interface interactions triggers a request to the database. Each visualization is supported by a single query that returns only the fields currently displayed in the visualization, at the level of aggregation that they are displayed. Whenever a node is identified as the direct or indirect recipient of an interaction, the SQL compiler formats a query to request data for the modified state of the node. Visualization nodes are initialized from the interface graph with properties containing the fields used in the visualization and their aggregation. As data manipulations occur in the interface, filters are applied to or removed from nodes. If the visualization supports specification manipulation, then the fields and aggregation properties can also change. When a property change is detected on a node, the compiler concatenates the node properties into a revised SQL statement. All fields are added to the SELECT statement and numerical fields are aggregated as specified. If aggregations are present, the categorical fields are also added to a GROUP BY. Since we do not currently support joins (only single tables) we create a FROM statement using the table name from the first attribute in the visualization specification. Finally, any filters are added to the WHERE statement.

After queries are produced for all nodes touched by the interaction, we add the timestamp of the interaction to the batch to indicate that they should be executed simultaneously and append them to the workflow.

During visual exploration, users iteratively form questions and answer them by manipulating an interface into configurations that display data in a way that allows them to draw a conclusion. Reasoning about which end state matches a user's query is the goal of visualization recommendation systems and NLP-toviz systems. These systems take an utterance (11) , dataset (7), or partially specified visualization (12) , infer the intent of the user, and return one or more visualizations that are deemed a good fit. These systems output discrete visualizations, but in our case we seek to find the state of interconnected visualizations in a dashboard. In this work, we hold the structure of the interface graph constant and assume that a user can reach an appropriate end state for their query through data manipulations such as brushing, filtering or zooming. E. Report on Performance. For our benchmark to be relevant, it must be able to support the wide variety of DBMS systems used as the backend for visualization interfaces. IDEBench provides an extensible driver template for users to revise to support their specific DBMS system (4). In addition to the prebuilt MonetDB and approXimateDB drivers, we provide additional drivers for three common databases: PostgreSQL, SQLite and DuckDB. We chose these systems because they are open source, widely implemented in the visualization community, and previously used for visualization system benchmarking (2).

We developed a proof of concept interface leveraging Python, JavaScript and PostgreSQL that implements solitary widgets so that their performance may be studied individually. This proof of concept can be extended to include more complex interfaces that leverage multiple widgets and visualizations in the future. Zooming. These widgets require data to be fetched at various levels of aggregation simultaneously. While it is sometimes possible for this data load to be collapsed into a single query, more often than not, data for each level of aggregation is sourced separately. The backend server is implemented in Flask and uses psy-copg2 to fetch data from a local PostgreSQL server. The frontend uses CSS and HTML for formatting and D3.js and JQuery for scripting. This proof of concept allowed us to study the implementation of each widget and better understand the demands they place on the DBMS. We instrumented each widget to collect behavioral data on how it is used, how long it takes to execute on the DBMS and the amount of data returned. Logs were collected from users interacting with the widgets at a casual pace as well as an unrealistic load test.

The largest portion of the remaining work is understanding how users interact with visualization systems and developing workflows to mimic that behavior. We expect bursty behavior, alternating between two stages of heavy use goal refinement and static think-time. In each goal refinement period we expect to see repeated use of a small subset of widgets that are available in the interface. However, it is unclear how small that subset is and the speed at which the user interacts with the interface to revise their visualization and arrive at their goal. Each of these parameters is important for benchmarking because they can result in both different query types and necessary speed of return. In the former, caching could yield substantial performance increases. Thus far, we have collected usage heuristics from single widget interfaces. In order for our benchmark to be relevant to most real world scenarios, we will need to be able to support designs with an ensemble of widgets that operate in combination on the same target or independently. This will require the development of an interface specification flexible enough to capture multiple widgets and their targets, as well as further log collection and modeling to recreate the behavior of users alternating between multiple widgets to complete a task. We will need to address behavioral patterns that might arise from the positioning of widgets in an interface, the total number of widgets in the interface and the task at hand. Furthermore, we have identified widget interactions that were missing from our original survey (sort, annotate, upload) and will need to define query templates for each interaction.

The Role of Latency and Task Complexity in Predicting Visual Search Behavior

Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data

The case for interactive data exploration accelerators (IDEAs)

IDEBench: A Benchmark for Interactive Data Exploration

Vistrees: fast indexes for interactive data exploration

A benchmark for enduser structured data exploration and search user interfaces

Vizml: A machine learning approach to visualization recommendation

Characterizing Exploratory Visual Analysis: A Literature Review and Evaluation of Analytic Provenance in Tableau

What Do We Talk About When We Talk About Dashboards?

Vega-Lite: A Grammar of Interactive Graphics

Eviza: A natural language interface for visual analysis

Voyager 2: Augmenting visual analysis with partial view specifications

The Effects of Interactive Latency on Exploratory Visual Analysis